Machine learning models are increasingly used to make decisions or to inform decisions. For e.g. A model might influence a decision for approval of a loan, screening candidate resumes for a job application, etc. Such decisions are crucial and we need to be confident that our models don’t discriminate against ethnicity, gender, age, or any such factors. Many machine learning models can often contain unintentional bias that could result in unreliable and unfair outcomes. Building and evaluating a good machine learning model requires doing more than just calculating loss metrics. …

I‘ve had almost 8 years of professional experience now and 5 years specifically in the field of data science and machine learning. In my current role, my team and I design and build predictive machine learning models and promote emerging technologies within the department. This article describes my top three learnings as a data scientist that I hope will help the inspiring data scientists to get a gist of what’ inside the real world of data science and what they should expect.

The **Pareto principle/80–20 rule **states that for many outcomes, roughly 80% of consequences are a result of 20%…

Maths and statistics are powerful tools in the world of data science. Math and Statistics are essential because these two fields form the basics of all the machine learning algorithms. And in order to succeed as a Data Scientist, you must know your basics.

Statistics is the use of maths to perform technical analysis on the data to gain meaningful insights. With statistics, we can operate on the data in an information-driven and targeted manner.

So, how is data science different from statistics? While the fields are closely related in the sense that both data scientists and statisticians aim to…

An activation function is an internal state of a neuron that converts an input signal to an output signal.

Basically, a neuron calculates the weighted sum of its inputs, adds the bias, and then inputs the values to the activation function which decides whether it should spit an output or not. Activation functions provide non-linear properties to the neural network. Without the activation function, the output values from the neurons can range between (- infinity) to (+infinity).

We are all aware of feature scaling and why it’s done. Feature scaling is performed during data pre-processing and is done to normalize/standardize…

`Polynomial Regression | Data Science | Machine Learning`

In this article we will learn what is Bayesian Information Criterion (BIC) and how it is used to choose the degree of a polynomial in a Polynomial Regression.

Sometimes R2 values vary slightly across two different degrees of polynomials. i.e. comparing a R2 score = 88.3% to R2 score = 88.4%. Also, how do we know which is better. R2=88% or R2=90% ?

Let’s study this by creating some dummy data:

Let’s fit the model with Ordinary Least Square (OLS). This package provides detailed stats summary like AIC, BIC etc.

If you fail to plan, you plan to fail. Every project requires planning. Building a machine learning model is no different. In this article, we will learn how to plan your data mining activities and what are the steps you should perform during Exploratory Data Analysis (EDA). This article is not a ‘how-to’ guide but a reference checklist for data analytics professionals. It will provide you with a list of considerations when building a machine learning model.

We have all heard about CRISP-DM: Cross Industry Standard Process for Data Mining. …

**Simple linear regression suffers from two major flaws:**

- It’s prone to overfitting with many input features and,
- It cannot easily express non-linear/curvy relationships.

One way to tackle these issues is by increasing the model complexity. Model complexity can be increased by using Decision trees and Polynomial regression to represent non-linear relationships.

These algorithms are also prone to overfitting due to increasing complexity. Therefore, in order to represent non-linear functions without overfitting, we make use of regularization techniques.

Regularization techniques are used to calibrate the linear/non-linear regression models in order to minimize the adjusted loss function and prevent overfitting.

The two…

Advances in smart assistants like Alexa and Google have brought remarkable convenience into our day to day lives. e.g. seeking a quick weather report, translating languages, listening to world news, and today you can also send virtual hugs to your Alexa contacts. With recent Artificial Intelligence (AI) breakthroughs like AlphaGo, IBM Watson, self-driving cars, and many more, the concern of AI taking over our jobs is real.

Can you imagine the impact of these applications on humans as they advance? Eventually, everything would be done for you by an AI. Now, the question is, what value would you be adding…

In this article we will find answers to the following questions:

- What is a Z-score — Formula and definition.
- How to use Z-score using a toy example.

History: The letter **‘Z’ **in z-score stands for **Zeta** (6th letter of the Greek alphabet) which comes from the Zeta Model that was originally developed by **Edward Altman **to estimate the chances of a public company going bankrupt. Z-scores exist in zones of probability, which indicates the likelihood of a public company going bankrupt.

- z < 1.81 - Distress “Zone”
- 1.81 < z< 2.99 - Grey “Zone”
- z > 2.99 - Safe “Zone”

…

Normalization vs Standardization

In this article we will discover answers to the following questions:

- What is feature scaling and why it is required in Machine Learning (ML)?
- Normalization — pros and cons.
- Standardization — pros and cons.
- Normalization or Standardization. Which one is better.

First things first, let’s hit up an analogy and try to understand why we need feature scaling. Consider building a ML model similar to making a smoothie. And this time you are making a *strawberry-banana* smoothie. Now, you have to carefully mix strawberries and bananas to make the smoothie taste good. If you just mix *one*…

Data Scientist and Project Management Professional at Government of Canada. Visit https://swapnilklkar.github.io for more.