Natural Language Processing

Terminology

Stopwords: Commonly used words (determinants, conjunctions, prepositions, pronouns, auxillary verbs etc.) in a language but do not carry much meaning or significance.
Corpus: A large collection of text documents or spoken language data used for training and testing NLP models
Stemming / Lemmatization: Reducing words to their base or root form to handle variations of the same word, such as singular/plural forms or verb tenses

Bag Of Words

A simple text representation model in NLP that converts a document into a set of its constituent words, disregarding grammar and word order