NLP - Using spaCy library

Natural language processing (NLP): is a discipline where computer science, artificial intelligence and cognitive logic are intercepted, with the objective that machines can read and understand our language for decision making [1].

spaCy: features fast statistical NER as well as an open-source named-entity visualizer [2].

Example with a document in Spanish

Step 1 - Read natural text from a book

Step 2 - Create a NLP model

- Vocabulary: unique words of the document.

- Stopwords: refers to the most common words in a language, which do not significantly affect the meaning of the text.

- Entity: can be any word or series of words that consistently refers to the same thing.

Step 3 - Working with POS, NER and sentences

- POS: the parts of speech explain how a word is used in a sentence.

- Sentences: a set of words that is complete in itself and typically containing a subject and predicate.

- NER: Named Entity Recognition.

Reference

[1] Wikipedia - Natural language processing.
[2] spaCy website.


« Home