What are Language Models?
A language model (LM) is a tool that guesses the next word in a given sequence of terms.
Evolution of Language Models
The development of LMs can be broadly classified into five stages.
Rule-based LM
Statistical LM
Neural LM
Pre-Trained LM
Large Language Models (LLM)
Rule-based Language Models
Grammatical rules of a specific language were used to predict the next word in a sentence.
E.g., in English
I
will be followed byam
notare
, andThey
can be followed byhave
orare
like these grammatical rules.However, there are many exceptions, and handling all the language rules is tricky.
Statistical Language Models
In this method, a large set of texts was analyzed, and the word-level probability of a word after a bunch of words was determined statistically.
How many times does
am
appear afterI
that probability is compared with other words likeare
oris
.In an advanced SLM n-gram model, instead of finding probability from a previous single word, the last bi-gram (two words) and tri-grams (three terms) were used to find the possibility of the next word.
However, In English, a single word can have multiple meanings based on the context of the sentence. SLM can not able to determine the context of the sentence.
Neural Language Models
With Word2Vec (Word to Vector), these models calculate the probability of the following words by neural networks.
Example: RNN (Recurrent Neural Network), LSTM (Long Short Term Memory)
Pre-Trained Language Models
ELMo (Context-aware Word Embedding) and Self-Attention through Transformer architecture raised the performance bar of NLP tasks. Example: BERT and GPT-2
Models were trained with a large amount of text, and the context awareness increased.
Large Language Models (LLM)
There is a thin line between PLM and LLM.
Scaling model size and training data size of PLMs new emergent abilities of model discovered. Example: ChatGPT, LLaMA, Claude
LLM is different from PLM broadly in three ways:
Emergent abilities
Prompting/Conversational Interface
To attend the scale, Engineering and Research problems must be solved.