Here, I elaborate about a short project which I pulled over last few days. I was revising some concepts in Machine Learning, when this idea struck me. This software deals with predicting the mood of the person based on what he/she is talking. The user would be speaking into the microphone of the device and the output would be the sentiment; positive, neutral or negative. I have used a Naïve Bayes classifier to form my prediction model. Unfortunately, I could not provide a working demo since I am using a shared hosting and the company doesn’t have nltk or python 2.7+ installed.
Now we click on the mic button and say the sentence.
Once we click the submit button, this text is passed to the python file in the backend. I decided to send it as a command line argument using exec function in PHP. Other methods may include writing to a flat file and then processing, and so on. But since I had to analyze sentences few words long, I found flat file method not a good way.
In my python code, I have placed checks to strip off any punctuations, undesirable words etc., so that if the user tries to enter something malicious, it can be taken care of. This list of undesirable words also contains words such as ‘the’, ‘a’, ’is’ and so on, which play little role in determining the sentiment of the speech. Hence, these are removed too. A training corpus was saved as a csv file in the directory. The processed sentence is then used as a test input in a Naïve Bayes classifier to tell if it is a neutral, positive or negative speech.
Once we get to know the sentiment of the text, we can modify the response to suit the same. This could help in online chat bots or even blind chatting sites where one would like to combine two users based on what kind of emotion they are in and other parameters.
Expand the sentiment from neutral, positive and negative to happy, sad, distressed, euphoric and neutral.
Try to include a sarcasm level. This would be a long term goal, and I guess a challenging one too, since it would vary person to person. I have something like, “sarcasm percent: 80%” in my mind, similar to one in the movie Interstellar. This sarcasm feature vector could be set against a word (like ‘awesome’) or a group of words (‘what a lovely day’) and then our model would learn and predict accordingly.
I plan to carry this project further and analyse it from some more angles not discussed while incorporating some more features. I would read more on Pattern recognition and Natural Language processing. Two books for the purpose I have ordered are; ‘Pattern Classification (English) 2nd Edition’ by Richard Duda, Peter Hart, David Stork and Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition by Jurafsky
If the illustrations don’t help and you would like a video demo, I would be happy to record one. Let me know in the comments below.