a subsequence of n items (Phonemes/Syllables/Letters/Words/Anything else depending on the application) from a given sequence
Example : input=”the dog smelled like a skunk”
- Unigram: n-gram of size 1
- Bigram: n-gram of size 2
Example : the, the dog, dog smelled, smelled like, like a, a skunk, skunk
- Trigram: n-gram of size 3
Example : “the dog”, “the dog smelled”, “dog smelled like”, “smelled like a”, “like a skunk” and “a skunk
How to predict next word : how likely is word x to follow word y
1)likelihood of x occurring in new text, based on its general frequency of occurrence
estimated from a corpus
“popcorn” is more likely to occur than “unicorn”
2)Condition the likelihood of x occurring in the context of previous words (bigrams, trigrams,…)
“mythical unicorn” is more likely than “mythical popcorn”
MORE : http://spring2015.cs-114.org/wp-content/uploads/2016/01/NgramModels.pdf