Natural Language Processing
Terminology
- Stopwords: Commonly used words (determinants, conjunctions, prepositions, pronouns, auxillary verbs etc.) in a language but do not carry much meaning or significance.
- Corpus: A large collection of text documents or spoken language data used for training and testing NLP models
- Stemming / Lemmatization: Reducing words to their base or root form to handle variations of the same word, such as singular/plural forms or verb tenses
Bag Of Words
A simple text representation model in NLP that converts a document into a set of its constituent words, disregarding grammar and word order