Today in the digital age, we spend a lot of time on social networks. We share a lot of our personal moments on various social networks. Whether we are traveling to an adventurous location or having dinner in an exotic restaurant, we share those moments and emotions on social networks. Not only that, we also share our opinions and views towards a specific topic on social networks.
Just imagine the amount of information, these social networks have. They know everything about us including what we love, what we hate, where we live, our friends, interests etc. Now just think about what would happen if we build a platform where we can analyze these data and begin to understand what millions of people like or hate, what they are talking about, their opinions etc. For example, let’s say a company just launched a new B2C product and wants to know what the public is thinking about their product. With the ever-increasing abundance of data from various social networks, they can do it. What they need is a proper analysis tool.
Problem Actualization and Solution
One of our clients was a News Search and Analysis agency from Singapore, who wanted to analyze different posts on various media in order to understand what people think about different keywords or topics. The client not only wanted to analyze data from various online platforms like Facebook, Twitter, Instagram etc. but also wanted to analyze data from different printed media like magazines, newspapers etc.
The content from these platforms comes in different languages and hence the client wanted them to be translated into English, Malay, Tamil, and Chinese language for every post, which necessitated the building of a language translation system.
For language translation, we used Google Translate, which can translate over 100 languages. The Translation API can automatically detect the input language and there is no limitation on the language used in these platforms, which is an added advantage.
With the Translation API, we could translate the textual information of each post to English, Malay, Tamil, and Chinese languages.
Some of the posts from various websites like Blogger.com are very long and reading those long posts take a lot of time, which necessitated the summarization of these posts into small summaries with the help of advanced Deep Learning algorithms.
Two other reasons for summarization were
1) To enable Sentiment Analysis, for which brief texts are always ideal and
2) We are using AWS Comprehend for this, which has a limit of 5000 words.
For summarization, we used Summa, which is a python library used for generating summaries. We built an AWS Lambda function that summarize each post.
What is Sentiment Analysis?
To understand and analyze the posts from various platforms, we need some type of system that can automatically detect the opinion of a person towards a specific topic. The process by which we achieve this is called Sentiment Analysis.
According to Wikipedia
“Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.”
Sentiment Analysis is a very important step in this project. It gives us true power to understand various posts from different platforms.
In online platforms, people use various cryptic languages, emojis etc. Luckily AWS Comprehend supports cryptic dialogue, sarcasm, irony, and even emojis.
The biggest issue that we faced with the system was scaling. As thousands of posts were coming into our system every minute, it was extremely hard to process each and every post. Hence, we had to spend a substantial amount of time in optimizing our codes so that it could process everything efficiently.
The advantage with the different services we used like the Translation API, AWS Comprehend, is that they are highly scalable. These services can handle millions of requests without any issue. As billions of people use different online platforms like Facebook, Twitter etc. every day, this level of scalability is mandatory and beneficial. Another important point is that other than the social platforms mentioned above we can also include LinkedIn, Reddit, etc.