Natural Language Processing

by Camilla Lalli

13 Jul '22

Natural language processing (or NLP) is a branch of artificial intelligence that gives machines the ability to read, understand and derive meaning from human languages. It can allow computers to communicate with people using a human language in both written and spoken form.


NLP draws from several disciplines – including Linguistics and Computer Science – to decipher language structure and to build models. These systems are able to comprehend, break down and separate significant details from text and speech.


Every day humans interact with each other through social media, transferring vast quantities of freely available data to one another. This data can be extremely useful in understanding human behaviour and customer habits. Data Analysts and machine learning experts utilize this data to give machines the ability to mimic human linguistic behaviour.


There are three different types of NLP:


NLP Pipeline

Figure 1. NLP Pipeline


A typical NLP pipeline consists of several steps as outlined below:

Figure 2 shows the result of applying coreference resolution to a text.



NLP has a huge variety of business applications. Here is an overview of the most important ones:




NLP originated in the 1940s when scientists started working on algorithms that would allow machines to perform translations from one language to another. One of the first researchers to work on machine translation was Warren Weaver, an American mathematician.


However, they soon realised that the task was more complex than they expected, and that they lacked the right technological resources and linguistic theoretical framework to perform such a task. Several changes needed to take place to build a machine to perform translations or to communicate in a more human fashion.


Those changes occurred in the past 60/65 years. Firstly, American linguist Noam Chomsky developed an abstract, mathematical theory of language known as transformational grammar in his seminal work: Syntactic Structures published in 1957. Transformational grammar is important in NLP because it introduced a formalism which converts natural language sentences into a format which can be used by machines.


Secondly, in the 1980s two major changes occurred; the first one is the increase in computational power which allowed to perform more and more complex operations, the second one is the shift to machine learning algorithms which rely heavily on statistical models.


Finally, in the 2010s, the deep learning revolution occurred and deep neural network-style machine learning methods became widespread in NLP. Up to the 1980s most NLP systems were based on complex sets of hand-written rules, the major advantage of machine learning is that it calls for using statistical inference to automatically learn such rules through the analysis of large corpora of typical real-world examples (usually corpora of texts or speech).


All these theoretical and technological advancement facilitated the creation of more sophisticated translation software. Text comprehension and speech processing technologies also improved which ultimately led to the becoming of virtual assistants today which are able to understand human language and provide responses.


Twitter logo icon LinkedIn logo icon