Challenge at hand
A Malaysian news agency wanted to automate insight-mining from enormous amounts of data collected from various disparate sources, so they could speed up generation of useful reports on key societal trends. Their reports were often used by the government to understand trends such as drug usage. Currently, data was being captured from social media, newspapers and physical scanning of documents, before being manually stored in spreadsheets and analysed for trends.
The current process was labor-intensive, time consuming and inefficient. It was also prone to errors and was not very scalable.Technical challenges in existing solution
Manual data entry: Data from different sources was manually entered into spreadsheets.
No standardization: Data came in different formats, sizes and languages, and needed to be harmonized.
Limitations on number of requests handled: Current AWS Lambda services had limitations on the number of requests processed at any given time.
Lack of customization: No access control through role management.
Reports lacked insights: No visualization of data to report high level trends.
The TechVariable Solution
TechVariable built a platform that automated data collection and segregation, harmonized different file types into a cohesive set of data files, and enabled the generation of reports with data visualization in real-time. With an easy-to-use, interactive and customized user interface solution, we created three automated data pipelines:
We created three automated data pipelines:
Pipeline 1: A custom scraping engine collects data from all online sources, extracts data from hardcopy formats, PDFs and jpegs using Optical Character Recognition (OCR), and summarizes lengthy articles according to identified keywords.
Pipeline 2: Video transcription using Google APIs to process video and audio files from YouTube or otherwise.
Pipeline 3: : A custom-built mechanism for online sources that segregates hashtags, keywords, stories, etc. for further processing.
Social Media Aggregation
Data from multiple social media platforms such as Facebook, Twitter, Instagram, YouTube and Reddit are aggregated for relevant hashtags and keyword inputs.
Transcription & Translation
With this feature, the client can translate or generate transcripts of videos from YouTube or offline sources in different languages. Lengthy articles are summarized. A data warehouse was created and data translated into four languages, as per the client’s request. A combination of third-party APIs and a few custom-built APIs have been used for data synchronization and processing.
An elastic search-based system, including Fuzzy search, was built on top of the database. This functionality allows the user to add more keywords as they go, thus enhancing their search experience.
The architecture was designed in such a way that the portal can run for multiple clients on multiple servers.
Reporting & Dashboard
The solution includes an interactive dashboard built on Tableau that provides the client a single-window view of important parameters in real-time. Auto visualization and reporting was done for batches of old data and real time data.
Auto tagging has been enabled using Natural Language Processing (NLP) to segregate and analyze data. The data is then assigned positive, negative or neutral scores.
Need a custom software application for your buisness?
We at TechVarible do acknowledge that one size will not fit all. Hence, we work in collaboration with you to identidy, analyze & then develop a solution that fulfils your needs.
Either we will define the functional scope of your project to estimate the timeline and budget or you can create your own agile team from among our recources.
High level design architecture
The new platform leverages next-generation technologies to perform data collection, processing and analysis in real time.
Improved efficiency, accuracy and scalability as a result of automated data collection.
Ability to conduct sentiments analysis on more than 100,000-150,000 posts a month on various parameters.
Reduced the number of resources/man-hours required.
Improved the ability to handle spikes in data generation and volumes of requests.
Enabled customization with user access control, advanced filters and search capabilities.
Enabled auto-visualization and reporting in real-time.
Product Lifecycle Management
Angular, JAVA (Springboot), Angular.js, AWS, Elastic Search, j, Neo4, Node.js, Python
Customized Customer Relationship Management
Apache, MongoDB, Nodejs, React.is
Write to us
Our well-designed processes, protocols and best practices ensure that security and compliance requirements are adhered to, irrespective of client location and project size.