In the modern age, data analysis has crept up to the top of the Digital Transformation paradigm and Data Analyst is becoming one of the most important roles in the industry as they are involved in key business decisions and strategies because of their insights and reporting capabilities.

But what is Data Analysis? In simple words, Data Analysis is a process of extracting useful information from data using different Statistical and Mathematical methods. But Data Analysis alone cannot help. It is because after performing Data Analysis on some data, we usually store the information in CSV file, some databases, or in some other format and it is very difficult to read data from these formats, especially for non-technical people. Also, it is very unintuitive to read data from those files. To solve this problem, we perform Data Visualization. Data Visualization is the presentation of data in pictorial and graphical format. These pictorial and graphical formats are very easy to understand for even non-technical people.

To prove this point, let’s assume an excel sheet having thousands of rows of sales data. Now let’s say we just look at the data for 30 seconds, then someone asks a question like which product is giving maximum profit or which product is giving minimum profit, it will be very difficult to provide him with a satisfactory response as it is hard to get this type of information from those data just by looking at it.


Today in the digital age, everyone has access to gadgets like a mobile phone or a professional camera that can take a picture or record audio and video of different types of incidents, events etc. that occur in our life. There are so many online platforms like YouTube, Instagram etc. where we share these types of files. Just imagine the amount of data we are talking about and all kinds of meaningful information we can extract out of these through proper analytical tools. To perform Data Analysis, it is important to understand that these recorded images, videos, or audios are very complex and unstructured. But there are tools available by which we can perform the analytics to extract information from these types of data

Problem Actualization

According to the requirement of the client, the analysis tool that we have built needed to have the ability to analyze these data and extract meaningful information from these data. They wanted a system to upload different images, pdf documents, audio files, and video files to the system for analysis.

For images and pdf documents, the client wanted the textual information from those files. That means they needed an OCR system.
According to Wikipedia
“Optical character recognition or optical character reader, often abbreviated as OCR, is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast).”


Let’s assume that we want to analyze how people think about a specific topic or maybe how they react to a certain incident, or what are their opinions about a particular topic.

For this, we can use different Machine Learning and Deep Learning algorithms. For the purpose of analysis, first, we need data- data about the people, their opinions, their ideas etc. But where do we avail those data from?

Well, the answer is Social Networking Platforms like Facebook, Twitter, Instagram etc. Social Networking platforms are used by billions of people these days. Nowadays it is a part of our day to day life. We spend a huge amount of time on Social Networks and also share our personal moments on social networks. In other words, we generate a lot of data on those platforms. Now just imagine the amount of information those social networking sites have about us. They know everything about us. If we can perform any kind of logical analysis on those data, we can understand these people.

In Data Analysis, the first step is to aggregate data. Without proper data, we cannot do any analysis. In different cases, the sources of the data are different and the process of aggregating them is different too. In this case, we want to aggregate data from various social media networks, which means we need a Social Network Aggregator.

Social Network Aggregation is a process by which we collect data from various social networks like Facebook, Instagram, Twitter etc. for various analysis and other tasks. Basically, social network aggregator is a tool that has a social network aggregation process for collecting data.


Today in the digital age, we spend a lot of time on social networks. We share a lot of our personal moments on various social networks. Whether we are traveling to an adventurous location or having dinner in an exotic restaurant, we share those moments and emotions on social networks. Not only that, we also share our opinions and views towards a specific topic on social networks.

Just imagine the amount of information, these social networks have. They know everything about us including what we love, what we hate, where we live, our friends, interests etc. Now just think about what would happen if we build a platform where we can analyze these data and begin to understand what millions of people like or hate, what they are talking about, their opinions etc. For example, let’s say a company just launched a new B2C product and wants to know what the public is thinking about their product. With the ever-increasing abundance of data from various social networks, they can do it. What they need is a proper analysis tool.

API’s play a very important role in how our application will run and documenting the API’s in a format that everyone in the organization can understand as what is happening in the application. We will discuss one of the most used API documenting specifications called OpenAPI specifications.
OpenAPI specification is a language agnostic format which is used to describe RESTful web services. The resulting files can be interpreted by applications to generate code, produce documentation and create virtual simulations of the services they describe.
Let’s understand the OpenAPI specification with a example API document that describes the API service that allow us to retrieve list of users.

Data preprocessing is one of the most important steps in Machine Learning. This step cannot be avoided especially if data is in unstructured form. In this post, I’ll discuss the different steps using Scikit-Learn and Pandas.

“I’m assuming that you have some basic knowledge of Numpy and Pandas. If you don’t know Numpy and Pandas then first learn these topics.”
Data analytics is the process of extraction of meaningful information from data, increasingly with the aid of specialized tools and techniques. Data analytics help organizations and scientists make more informed business decisions.
Python has been around since the late 1980’s but has only really started making its presence felt in the data science community recently.
Agile SDLC model could be a combination of unvarying and progressive method models with specialize in method ability and client satisfaction by fast delivery of operating wares. Agile strategies break the merchandise into tiny progressive builds. These builds square measure provided in iterations. Scrum is a subset of Agile. It is a light-weight method framework for agile development, and also the most widely-used one. A “process framework” may be a explicit set of practices that has got to be followed so as for a method to be the framework. (For example, the commencement method framework needs the employment of development cycles referred to as Sprints, the XP framework needs combine programming, and then forth.) “Lightweight” implies that the overhead of the method is