The NLP Journey

A Blog About Learning Natural Language Processing Concepts

The Starting Line

20 Mar 2022 - Ryan Llamas

“Never stay up on the barren heights of cleverness, but come down into the green valleys of silliness."

— Ludwig Wittgenstein

Drag Race

Start Your Engines…

I have often asked myself these past days why I choose to pursue Natural Language Processing (NLP) as a road of study in the vast space of fields present in Computer Science. The truth is: I am still trying to find that answer.

One thought that arose was my captivated interest when I took my first attempt to build a chatbot. The project afterall, transfixed my mind on the subject, as it now begs for another attempt to rebuild it and explore its unfound improvements.

Another thought as to why is because I am truly fascinated by the idea of interpreting human language through computation. The very thing we as humans use, a human design from the very beginning of man, is able to be programmed and understood from the very machines we make is truly a fascinating subject worth pursuing.

Maybe it is a combination of both these points, for my hope is exploring the horizons of what the field has to offer compels me to find meaning in it all.

I hope you dear reader are as captivated as I am, as we sail into the field of NLP to learn and find new ideas, theories, and practical usages that improve the lives of those around us.

What is NLP

NLP is by definition “the ability of a computer program to understand human language as it is spoken and written” according to TechTarget. How you interpret the language and use its contents for analysis varies depending on the solution you are aiming to achieve.

For example, like mentioned above, one could make a chatbot to inform consumers of a particular topic. You can even go further, and try to make a chatbot that is conversational in nature. Many enterprises attempt this with many prototypes being made available (Some even remembering the context of a conversation and applying it for later use!)

Another example is the analysis of a large body of text or a corpus. One could for example identify the use of a word in various sentences of a corpus, and identify the genre of the corpus if it were a book, or how the word was used based on when it was published. After all, the use of a word changes based on the time period it is being used.

These are the types of problems I want to try to solve. Not only do I want to craft the solution, I want to share it as well! No one can learn in a silo, so I want my notes and solutions to be viewed with critical thought and I beg to hear your opinions as well. Only then, I will be able to see the error and at least see the other side of the fence!

The Point of Beginning

Every journey has a point of beginning. Mine was at the Internship program “Code Orange” at Discover Financial offered at my college, NIU. For three semesters, I was tasked with one assignment and charged to build it from the ground up. A chatbot that is able to interpret user responses and give an appropriate answer back.

How intimidating it was to be charged with such an assignment for my first internship. To interpret any text and ensure it returns a proper response was challenging to conceptualize at the beginning and required much trial and error when put to practice.

Discover NIU Location

The Internship Location I Worked At

But what kept me going was finding out what my bot would return to me and why its decision was made. The victory laps I would do around the workplace when it gave me the proper response was as satisfying as hearing a firstborn call you “Dad”, knowing you yourself fostered that response.

But those days are behind me. While implementing a hacky patch solution in the span of a semester certainly opened doors of thought and inspiration, now I truly wish to understand its full capabilities and theory. But that begs the question; where to start?

The Hill to Overcome

The stepping stone I decided to use to further my pursuit into the field is a book called Foundations of Statistical Natural Language Processing by Christopher D. Manning and Hinrich Schütze.

NLP Book

Foundations of Statistical Natural Language Processing: Christopher D. Manning, Hinrich Schütze

The reason I choose this book is because reviews say it covers foundational concepts in NLP that are still widely applied today. Not only that, but the fact that it delves into the mathematics of the subject further intrigues me and gives me the ability to practice an old skill.

The only thing to be careful of while studying this book is being wary of its age (it’s as old as me in terms of when it was published). However, learning outdated practices has never been a concern, as long as one provides critical thinking and validates points with modern practices while reading.

Having already read fourteen pages of the book, I am already committed to seeing this one through and cannot wait to share the thoughts and findings it showcases.

As you may guess dear reader, this book may have a lasting impact on my perception on NLP, so I hope your engagement keeps me on the right track as I beg to be provoked and challenged by your ideas. Only then can I see things from a different point of view.

Checkpoint

To be honest, I had anticipated writing more on what those fourteen pages had shown me, but that will have to wait for next time.

What I hope to see out of this blog are theories and ideas brought from NLP and this book to be discussed to the very fine grain. Also, to implement practical usages of the theory to truly understand its effect.

Revisiting and showcasing Python to accomplish practical problems will be ideal considering the support and flexibility the language offers for the subject. Packages like Tensorflow, MatPlotLib, and NLTK I feel will be greatly used to showcase implementations, and I may discover more along the way!

I hope this will be the first of many blogs and I hope you stick with me on this journey, as we both have much to learn.