Classification Algorithm in Machine Learning

In machine learning and statistics , classification is a supervised learning approach in which the computer program learn form input data and then uses this learning to classify new observation .

Example : This chart shows the classification of the iris flower dataset into its three sub-species indicated by codes 0 ,1 and 2

Types of Classification Algorithms

Let’s have a quick look into the types of Classification Algorithm below.

Linear Models

  • Logistic Regression
  • Support Vector Machines

Nonlinear models

  • K-nearest Neighbors (KNN)
  • Naïve Bayes
  • Decision Tree Classification
  • Random Forest Classification

1. Logistic Regression :

  • This refers to regression model that is used for classification .
  • This method is widely used for binary classification problem .
  • Here , the dependent variable is categorical {0,1}
  • A binary dependent variable can have only two values like 0 or 1 , win or loss , pass or fail etc.
  • The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve):

Advantages:

Logistic Regression is designed for this purpose (classification ), and is most useful for understanding the influence of several independent variable on a single outcome variable.

Disadvantages:

Works only when the predicted variable is binary , assumes all predictors are independent of each other and assume data is free of missing values.

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(train_x , train_y)
y_pred = lr.predict(test_x)

2. Support Vector Machines

  • Support vector machine are another group of algorithm used for classification and sometime regression tasks. SVM is great because it gives quite accurate result with minimum computation power.
  • The goal of the SVM is to find a hyperplane in an N-Dimensional space(Where the N corresponds with the number of feature) that distinctly classifies the data points.
  • we should find a plane that has the maximum distance between data points of both classes.
  • This hyperplane is graphically represented as a line that separates one class from another. Data points that fall on different side of the hyperplane are attributed to different classes.
  • Note that the dimension of the hyperplane depends on the number of features .
  • if the number of inputs features is 2 , then the hyperplane is just a line . if number of inputs features is 3 , then the hyperplane becomes a two- Dimensional plane. it becomes difficult to draw on a graph a model when the number of features exceeds 3 . so in this case , you will be using Kernal type to transform it into a 3 — dimensional space.
  • Why is this called a Support Vector Machine ? Support vectors are data points closest to the hyperplane .

Advantages :

  • Effective in high dimensional spaces and uses a subset of training points in the decision function so it is also memory efficient .

Disadvantages:

  • SVM algorithm is not suitable for large data sets.
  • SVM does not perform very well when the data set has more noise i.e. target classes are overlapping.
from sklearn.svm import SVC
svm = SVC()
svm.fit(train_x , train_y)
y_pred = svm.fit(test_x)

3. K-nearest Neighbors (KNN)

  • K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning technique.
  • K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.
  • K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.
  • K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.
  • K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
  • It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.
  • KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data.

The K-NN working can be explained on the basis of the below algorithm:

  • Step-1: Select the number K of the neighbors
  • Step-2: Calculate the Euclidean distance of K number of neighbors
  • Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
  • Step-4: Among these k neighbors, count the number of the data points in each category.
  • Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
  • Step-6: Our model is ready.

Advantages:

  • This algorithm is simple to implement, robust to noisy training data, and effective if training data is large

Disadvantages :

  • Need to determine the value of K and the computation cost is high as it needs to compute the distance of each instance to all the training samples.
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(train_x , train_y)
y_pred = knn.predict(test_x)

4. Naïve Bayes

  • Bayesian algorithm are a family of probabilistic classifiers used in ML based on applying Baye’s theorem.
  • Naive Bayes classifier was one of the first algorithms used for machine learning. It is suitable for binary and multiclass classification and allows for making predictions and forecast data based on historical results.
  • Using Bayes’ theorem, it is possible to tell how the occurrence of an event impacts the probability of another event.

Multinomial Naive Bayes

  • Apart from Naive Bayes classifier, there are other algorithms in this group. For example, Multinomial Naive Bayes, which is usually applied for document classification based on the frequency of certain words present in the document.
  • Bayesian algorithms are still used for text categorization and fraud detection. They can also be applied for machine vision (for example, face detection), market segmentation, and bioinformatics.

Advantages:

  • This algorithm requires a small amount of training data to estimate the necessary parameters. Naive Bayes classifiers are extremely fast compared to more sophisticated methods.

Disadvantages:

  • Naive Bayes is is known to be a bad estimator.

5. Decision Tree Classification

  • A decision tree is a simple way to visualize a decision-making model in the form of a tree. The advantages of decision trees are that they are easy to understand, interpret and visualize. Also, they demand little effort for data preparation.
  • However, they also have a big disadvantage. The trees can be unstable because of even the smallest variations (variance) in data. It is also possible to create over-complex trees that do not generalize well. This is called overfitting. Bagging, boosting, and regularization help to fight this problem. We are going to talk about them later in the post.

The elements of every decision tree are:

  • Root node that asks the main question. It has the arrows pointing down from it but no arrows pointing to it. For example, imagine you are building a tree for deciding what kind of pasta you should have for dinner.
  • Branches. A subsection of a tree is called a branch or sometimes a sub-tree.
  • Decision nodes. These are the subnodes for the root node that can also be splitting into more nodes. Your decision nodes can be “carbonara?” or “with mushrooms?”.
  • Leaves or Terminal nodes. These nodes do not split. They represent final decisions or predictions.

Also, it is important to mention splitting. This is the process of dividing a node into subnodes. For instance, if you’re not a vegetarian, carbonara is okay. But if you are, eat pasta with mushrooms. There is also a process of node removal called pruning.

Decision tree algorithms are referred to as CART (Classification and Regression Trees). Decision trees can work with categorical or numerical data.

  • Regression trees are used when the variables have numerical value.
  • Classification trees can be applied when the data is categorical (classes).

Advantages:

  • Decision Tree is simple to understand and visualise, requires little data preparation, and can handle both numerical and categorical data.

Disadvantages:

  • Decision tree can create complex trees that do not generalise well, and decision trees can be unstable because small variations in the data might result in a completely different tree being generated.
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(train_x,train_y)
y_pred = dtree.predict(test_x)

6. Random Forest

  • Random forest classifier is a meta-estimator that fits a number of decision trees on various sub-samples of datasets and uses average to improve the predictive accuracy of the model and controls over-fitting.
  • The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement.

Advantages:

  • Reduction in over-fitting and random forest classifier is more accurate than decision trees in most cases.

Disadvantages:

  • Slow real time prediction, difficult to implement, and complex algorithm.
from sklearn.ensemble import RandomForestClassifier
rfm = RandomForestClassifier()
rfm.fit(train_x ,train_y)
y_pred = rfm.predict(test_x)