What is data science?
Let us try to understand what did we do before the advent of data science and machine learning? Is it a completely new field which appeared from no where? An alternative terminology for data science that was being used earlier business analytics. So what where we doing earlier before the advent of data science. We were doing descriptive analysis, reporting and dashboarding. We were building models without using it for prediction by that I mean we were finding out as to why this particular event happened without using this to help predict what will happen in the future. That was the state of affairs.
So what is data science? Data science is “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual business phenomena” with data. Data science helps you generate insights from the raw data. So the central aspect is the “Data” and all the questions surrounding the data. What are the different types of data? When to capture and analyze the data? How to analyse the data? Who analyzes the data and for whom? Why to store and analyze the data?
First, This implies one needs to have a good knowledge of mathematics and statistics.
Second, The domain knowledge that is understanding of the business problems from the functional point of view i.e. Marketing, Risk, finance, Human Resources and operations etc. Sectoral / industry knowledge like retail, telecom, financial services, manufacturing, and insurance services.
Third, an analytical tool / software which involves some computer programming to build the models. Therefore data science becomes an interdisciplinary field.
Data science is all about solving a business problem with the help of data
Data science is all about solving a problem be it related to business, policy or society. So you start with a business problem, let us say you have business problem to identifying the right price of a resale car. You start looking for data which might help in predicting the right price of the resale car. You start identifying the variables influecing the price of a resale car. These variables could be Manufacturer, Brand, Year of manufacturing, Kms driven, mileage, fuel type. You model these variables with the price of a resale car.
Did you hear about linear regression or did you see an equation like y = mx + c, this equation in its simplest form is considered as machine learning algorithm. So an equation which has both dependent variable (Price of resale car) and independent variable (Kms driven and other variables) will be considered as supervised machine learning algorithm.
So to predict the price of the car, you come up with an equation
Y(Price of the resale car) = m1 * Manufacturer + m2 * Brand + m3 *Year of manufacturing + m4* Kms driven + m5* mileage + m6 * fuel type + C
The process of finding out this equation and then predicting the price of the resale on the basis of this equation when you have data about Manufacturer, Brand, Year of manufacturing, Kms driven, mileage, fuel type is the final objective of data science. This is how cars24.com has disrupted the resale car market in India.
I hope you enjoyed reading the article. Please subscribe to recieve such articles in your inbox.