问题1/1

复制

提问

学术

大模型发展历史上最重要的一系列论文

1个月前

全网

文档

学术

百科

知识库

来源0

同主题0

来源引用的0

引用来源的0

doi

信任

复制引用信息

大模型时代的自然语言处理:挑战、机遇与发展

打开网页

Wanxiang Che

Zhicheng Dou

Yansong Feng

Tao Gui

Xianpei Han

Baotian Hu

Minlie Huang

Xuanjing Huang

Kang Liu

Ting Liu

Zhiyuan Liu

Bing Qin

Xipeng Qiu

Xiaojun Wan

Yuxuan Wang

Ji-Rong Wen

Rui-Xiang Yan

Jiajun Zhang

Min Zhang

Qi Zhang

Jun Zhao

Xin Zhao

Yanyan Zhao

12023-05-01发表SCIENTIA SINICA Informationis

doi

信任

复制引用信息

译

How (not to) Run an AI Project in Investigative Journalism

打开网页

有源文件

M. Fridman

R. Krøvel

F. Palumbo

32023-09-04发表Computer ScienceJournalism Practice

semanticscholar

信任

复制引用信息

译

Deep Learning

打开网页

Yann LeCun

Yoshua Bengio

Geoffrey E. Hinton

Machine-learning technology powers many aspects of modern society: from web searches to content filtering on social networks to recommendations on e-commerce websites, and it is increasingly present in consumer products such as cameras and smartphones. Machine-learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products with users’ interests, and select relevant results of search. Increasingly, these applications make use of a class of techniques called deep learning. Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input. Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure. Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years. It has turned out to be very good at discovering intricate structures in high-dimensional data and is therefore applicable to many domains of science, business and government. In addition to beating records in image recognition and speech recognition, it has beaten other machine-learning techniques at predicting the activity of potential drug molecules, analysing particle accelerator data, reconstructing brain circuits, and predicting the effects of mutations in non-coding DNA on gene expression and disease. Perhaps more surprisingly, deep learning has produced extremely promising results for various tasks in natural language understanding, particularly topic classification, sentiment analysis, question answering and language translation. We think that deep learning will have many more successes in the near future because it requires very little engineering by hand, so it can easily take advantage of increases in the amount of available computation and data. New learning algorithms and architectures that are currently being developed for deep neural networks will only accelerate this progress.

409692015发表Computer Science

查看更多来源（17）

理解问题大模型发展历史上最重要的一系列论文

已完成理解「大模型发展历史上最重要的一系列论文」

展开阅读网页

更专业一些

转笔记

专业

大模型发展历史上最重要的一系列论文

不在提醒

大模型发展历史上最重要的一系列论文涵盖了从深度学习的早期突破到现代大模型的多个关键阶段。这些论文不仅推动了技术的前进，还为后续研究奠定了基础。以下是一些最具影响力的论文，它们在不同阶段对大模型的发展产生了深远影响。

深度学习的早期突破

要点总结1：神经网络基础：《Deep Learning》（LeCun, Bengio, Hinton, 2015）3 这篇综述文章详细介绍了深度学习的基本概念和方法，包括卷积神经网络（CNN）和递归神经网络（RNN）。文章强调了深度学习在图像识别、语音识别和自然语言处理等领域的应用，为后续大模型的开发提供了理论基础。
要点总结2：反向传播算法：《Learning representations by back-propagating errors》（Rumelhart, Hinton, Williams, 1986）3 这篇论文介绍了反向传播算法，这是训练多层神经网络的关键技术。反向传播算法通过计算误差的梯度来调整网络权重，极大地提高了神经网络的学习能力。

现代大模型的崛起

要点总结3：Transformer架构：《Attention is All You Need》（Vaswani et al., 2017）3 这篇论文提出了Transformer架构，彻底改变了自然语言处理领域。Transformer通过自注意力机制（self-attention）实现了并行处理和长距离依赖建模，为后续的BERT、GPT等大模型奠定了基础。
要点总结4：BERT模型：《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》（Devlin et al., 2018）3 BERT模型通过预训练和微调的方式，在多项自然语言处理任务中取得了显著的性能提升。BERT的双向编码器表示（Bidirectional Encoder Representations）使其能够捕捉上下文信息，成为大模型发展的重要里程碑。
要点总结5：GPT系列模型：《Language Models are Unsupervised Multitask Learners》（Radford et al., 2019）3 GPT系列模型，特别是GPT-3，展示了大规模预训练语言模型的强大能力。GPT-3通过1750亿参数的训练，在多种任务上表现出色，推动了大模型在实际应用中的广泛采用。

大模型的应用与挑战

要点总结6：大模型的可解释性：《Interpretable Machine Learning》（Boehmke, Greenwell, 2019）4 随着大模型的广泛应用，可解释性成为一个重要研究方向。这篇论文探讨了如何使深度神经网络（DNNs）更易于理解，提出了如TCAV（Testing with Concept Activation Vectors）等方法，为大模型的透明性和可解释性提供了新的思路。
要点总结7：大模型的数据需求：《Data-centric Artificial Intelligence: A Survey》（Zha, Bhat, 2023）7 这篇综述文章讨论了大模型对数据的高度依赖性，强调了数据质量和数量对模型性能的重要性。文章提出了数据为中心的人工智能（Data-centric AI）概念，探讨了如何通过数据工程提升大模型的表现。

解决方案与未来方向

要点总结8：大模型的优化与部署：《Toward Deep Learning Software Repositories》（White, Vendome, 2015）18 这篇论文探讨了如何优化和部署深度学习模型，特别是在软件工程领域的应用。文章提出了多种优化策略和部署框架，为大模型的实际应用提供了技术支持。
要点总结9：大模型的伦理与安全：《Research Priorities for Robust and Beneficial Artificial Intelligence》（Russell, Amodei, 2016）10 这篇论文强调了大模型在伦理和安全方面的挑战，提出了确保AI系统稳健性和有益性的研究方向。文章呼吁跨学科合作，确保大模型的开发和应用符合社会利益。

通过这些关键论文，我们可以看到大模型从理论基础到实际应用的完整发展历程。这些研究不仅推动了技术的进步，还为未来的研究和应用提供了宝贵的经验和指导。

✏️展开

大模型的发展趋势是什么？

大模型在哪些领域有实际应用？

大模型的未来挑战有哪些？

学术

专业

以上内容由AI搜集生成，仅供参考

在线客服