Article Talk ² Read Edit source View history

Editing Large language model

Section: History and development · Edit the full article instead

Before saving: Verify all facts against reliable sources. Do not copy copyrighted text. All new claims require inline citations. See the content assessment talk for current improvement priorities.

Active talk page discussions 2

▲ collapse

Proposed restructure of "Training data" section 2 comments · last edited 6 hours ago

ResearchContrib88 03:14, 4 April 2026 (UTC)

The current "Training data" section feels fragmented — it mixes pre-training corpus details with RLHF specifics in a way that's hard to follow. I'd suggest splitting into two subsections: one for pre-training corpus composition and one for alignment techniques. This would match the structure used on the BERT and GPT-3 articles. Any objections?

NLPeditor_K 05:52, 4 April 2026 (UTC)

Agree with the split. The RLHF subsection should also reference the Constitutional AI paper (Bai et al., 2022) — it's currently missing from this section. Happy to draft the subsection if nobody else is actively editing it.

Fact check: founding date and parameter count in lead section 1 comment · posted 1 hour ago

FactualEditor2024 22:08, 4 April 2026 (UTC)

{fill}

== History and development ==

A '''large language model''' ('''LLM''') is a type of [[machine learning]] model trained
on large corpora of text using [[self-supervised learning]].{{efn|The term "large" is
informal; models are generally considered LLMs when they surpass approximately one billion
parameters.}} Early foundational work on [[transformer (deep learning)|transformer]]
architectures was published in the landmark paper "Attention Is All You Need" by Vaswani
et al. (2017).{{Cite journal|last=Vaswani|first=Ashish|year=2017|title=Attention Is All
You Need|journal=Advances in Neural Information Processing Systems|volume=30}} This
architecture became the basis for most subsequent LLM research and development.

=== Key milestones ===

The scale of LLMs grew substantially through the early 2020s:

* [[GPT-2]] (OpenAI, 2019): 1.5 billion parameters; initially withheld from public release
  due to misuse concerns{{Cite web|url=https://openai.com/blog/gpt-2-6-month-follow-up}}
* [[GPT-3]] (OpenAI, 2020): 175 billion parameters; demonstrated strong few-shot learning
* [[PaLM]] (Google, 2022): 540 billion parameters; trained on 780 billion tokens
* [[GPT-4]] (OpenAI, 2023): parameter count undisclosed by OpenAI
* [[Gemini (language model)|Gemini Ultra]] (Google DeepMind, 2023): multimodal; surpassed
  human-expert performance on MMLU benchmark

=== Training methodology ===

Modern LLMs typically follow a two-stage training process:

# '''Pre-training''': The model learns from a large, unlabeled text corpus using
  [[next-token prediction]] (autoregressive) or [[masked language model]] objectives.
# '''Alignment''': The pre-trained model is adapted to follow instructions and produce
  helpful, harmless outputs via [[reinforcement learning from human feedback]] (RLHF)
  or [[direct preference optimization]] (DPO).

Edit summary (briefly describe your changes)

/* History and development */

Briefly describe what you changed and why. How to write a good edit summary

This is a minor edit Watch this page

By publishing changes, you agree to the Terms of Use and irrevocably release your contribution under the CC BY-SA 4.0 License and the GFDL. A hyperlink or URL is sufficient attribution under the Creative Commons license.

Editing help · Wikipedia policies · Manual of Style · Citing sources · Accessibility guidelines