Skip to content

Natural Language Processing

Terminology

  • Stopwords: Commonly used words (determinants, conjunctions, prepositions, pronouns, auxillary verbs etc.) in a language but do not carry much meaning or significance.
  • Corpus: A large collection of text documents or spoken language data used for training and testing NLP models
  • Stemming / Lemmatization: Reducing words to their base or root form to handle variations of the same word, such as singular/plural forms or verb tenses

Bag Of Words

A simple text representation model in NLP that converts a document into a set of its constituent words, disregarding grammar and word order