The 30%,40% and 30% of Data Science

Aashray Saini
6 min readDec 7, 2020
Photo by Franki Chamaki on Unsplash

Nearly 2 years in the corporate sector and almost a year and 5 months in a data centric role, to say that my whole perception of looking at data science has significantly changed is an understatement, and it is exactly what I’ll emphasize on in this article, along with how we data scientists are perceived as

So you’re a young aspiring data scientist — mainly because you’re proficient in programming and its scope is booming exponentially, after all it’s coined as ‘The most important skill of 21st century’. You absolutely love mining data, discovering pattern and developing ML algorithms. Your resume would be filled with top notch ML libraries in existence right from sklearn to tensorflow to horovod and what not. You know 10 types of classification, regression or clustering methods, feature engineering libraries, optimization methods using convex / linear programming. Your top notch projects might be but not limited to implement natural language semantics in conjunction with deep learning, or trained a bot using reinforcement learning. Inherently and rightly so there is a sense of accomplishment

But what is lacking?

Well, your hiring manager would probably say ‘All this is very impressive but can you solve a business problem for me? The company XXX would like to improve on its inventory measures to ensure timely stocking up for supplies to prevent any delay in logistics and transportation. Can you propose a solution for me describing the process of solving this. Assume ideal conditions of having any data source’

Woah! Okay, umm I did not see that coming. Wait I thought I was just applying for a job where I would be just writing code, where is all this coming from?

And this is where your journey as a data scientist begins. Hence, the title of this article.

THE 30%

The Fundamental Business Equation— Know your company, know its strategies and/or products. In any organization (well almost all), data scientists work directly with higher management to solve a business problem. The problem will be as simple as a one liner or a paragraph at max. Its your job to dissect it, plan it, propose a strategy and communicate it back to the management in the language they can comprehend. In short, convert a business problem into a data science problem by asking the right questions(KPQs)

This process is very daunting and certainly overwhelming as almost everyone will be breathing down your neck to come up with a solution. The questions you would need to be looking at would be something like

  • What are we trying to measure/predict? What can be the potential risks that can originate and how to factor in for that?
  • Who are the stakeholders?
  • What will be classified as a good PoC?
  • What is the direct impact on customers due to this issue?
  • What is the optimal success/evaluation criteria suitable for this problem?(the KPIs)
  • What other teams are involved?
  • What are the key deliverables / milestones we’re looking at?

Having all this in mind gives a much needed kickstart for your data science journey! At any phase of the project, it becomes important to go back to the fundamentals that you describe here.

THE 40%

Photo by Luke Chesser on Unsplash

The technical aspects — Here you plan all your pre ML activities. Before even getting started with analytics/visualization, it is crucial to ask the some of following questions

  • What type of data am I looking at? What are the formats?
  • How do I interpret the initial raw data ? (Data domain knowledge)
  • What data sources I would need which are relevant to my analysis?
  • How clean is the data?
  • What key insights can I present that can give a clear picture of what is currently happening in the context of this problem? (strategic decision making)

In this stage you get into the “Science” of Data Science i.e. experiments, which is an iterative process consisting of:

Coming up with a hypothesis →Generate an insight →Conduct a statistical test →Demonstrate a probabilistic solution

This aspect will kick in your analytical skills (SQL is your bread and butter for this), which are extremely essential but underrated at the same time. A large chunk of business decision making process is dependent on analytics/insights/stories.

Having an analytical mindset not only helps you to answer critical business question but also paves a clear path for scope of implementation of ML. There is no one correct way for analytics but speaking in terms of cumulative, percentiles, deciles, trends, running totals/averages etc in comparison with a population are captivating and impactful.

Subsequently, you have to know which charts can be most compelling to tell a story — box plot / violin/ scatter/ histogram, as well as how to interpret your findings and what it means for the business

You have to, have to, and just have to spend significant amount time here playing around and manipulating the data and connecting the dots from various data sources!

This 40% will account for nearly 90% of your feature selection process in the ML step

THE 30%

The model development — finally the moment we’re waiting for! Here’s what happens in this step

  • Basic prototyping typically done in on cloud servers to run computations on big data
  • Setting up instances/GPUs needed to run your model
  • Optimal splitting of data set into train and test
  • Feed the model with key input variables from the previous step
  • Use the essential libraries — pandas, numpy sklearn, keras, pytorch etc and algorithms according to the type of problem you’re aiming to solve (forecasting, classification, prediction etc…)
  • Set up an error metric and accuracy metric
  • Present early findings to the team/stakeholders. Incorporate debugging and optimization techniques
  • Non ML simulations — like monte carlo, HMM to account for risks and invariabilities as per requirements

Okay, seems like the data science journey is done after all. I got my model to train and predict with a desirable accuracy. Wheww!!

Well no, you’re far from done. How do you plan to release your solution to your stakeholders? They will mostly be non technical folk who’ll just be the consumer of your service just like how we are the consumers of technology.

The answer lies in this term

PRODUCTIONALIZE!

Now what the heck is this? This is a separate article altogether. For now, this is the end of this article.

Note that application of data science can be used for mainly three purpose

  • Launch a new product
  • Improve an existing product
  • Business process automation

Point being, the fundamental journey more or less remains the same (well, unless you’re research oriented and into scientific studies like CV, NLP, RL etc).

IMO, a vision to see the bigger picture, going beyond only algorithms and code, and having a proposition for the above 3 pointers takes you to places!

If this resonated with you, go ahead and give it a clap and follow for more :)

--

--