N-gram : using the previous N-1 words in a sequence we want to predict the next word

a subsequence of n items (Phonemes/Syllables/Letters/Words/Anything else depending on the application) from a given sequence

Example : input=”the dog smelled like a skunk”

  • Unigram: n-gram of size 1
  • Bigram: n-gram of size 2

Example : the, the dog, dog smelled, smelled like, like a, a skunk, skunk

  • Trigram: n-gram of size 3

Example : “the dog”, “the dog smelled”, “dog smelled like”, “smelled like a”, “like a skunk” and “a skunk

How to predict next word : how likely is word x to follow word y

1)likelihood of x occurring in new text, based on its general frequency of occurrence
estimated from a corpus
“popcorn” is more likely to occur than “unicorn”
2)Condition the likelihood of x occurring in the context of previous words (bigrams, trigrams,…)
“mythical unicorn” is more likely than “mythical popcorn”

MORE : http://spring2015.cs-114.org/wp-content/uploads/2016/01/NgramModels.pdf

Leave a comment