Python and Machine Learning
Machine Learning
One of the branches of computer science is Machine Learning which deals with computer algorithms. It can develop itself by manipulating data sets over and over again. Machine Learning refers to a subtype of Artificial Intelligence. Moreover, Artificial Intelligence helps in improving machine learning algorithms. Machine Learning algorithms possess features such as detecting trends, answering business questions, data acquisition, efficient data handling, detecting unusual transactions, best for search engines and online shopping, and many others.
Python for Machine Learning
Python is the most potent and fifth most important language for Machine Learning and Data Science. Python has the following crucial features which make it the most functional language for data science:
- Packages: There are extensive packages in Python covering various domains. The packages like scipy, pandas, numpy, scikit-learn, and many others in Python are helpful for machine learning.
- Prototyping: Python provides easy and quick prototyping. It helps in developing new and customized algorithms for tackling complex problems.
- Collaboration: Python possesses numerous valuable tools which prove helpful in collaboration with data science.
- Multi-purpose Language: There are various domains in data science projects like data manipulation, data extraction, data analysis, modeling, feature extraction, evaluation, deployment, and updating. Python is a multipurpose language, that allows addressing all these domains.
Python Libraries for Machine Learning
Python provides a wide range of libraries to use in machine learning. Library refers to a set of functions and routines in a programming language. These libraries prove helpful in performing complex tasks. Machine learning relies heavily on mathematical optimization, probability, and statistics. Python libraries help in performing tasks efficiently. Following are some of the Python libraries helpful for machine learning:
- Pandas: It is a fast, flexible, and powerful open-source data analysis and manipulation tool. It helps in performing machine learning tasks using the Numpy package to support multidimensional arrays.
- Keras: It is a high-level deep learning API to implement neural networks quickly. Moreover, it helps in supporting multiple backend neural network computations.
- Matplotlib: It is an extensive library to create static, interactive, and animated visualization in Python. It is a cross-platform and graphical plotting library that uses NumPy.
- StatsModels: It is a Python library built on NumPy, SciPy, and matplotlib, which helps in statistical algorithms and data exploration.
Example
Python can perform numerously supervised and unsupervised learning algorithms, including linear regression, logistic regression, k-nearest neighbors, decision tree, random forest, support vector machine, dimension reduction, density estimation, market basket analysis, generative adversarial networks, clustering, and many others. Following is an example of simple linear regression using Python:
Linear Regression
Simple linear regression predicts the output or dependent variable based only on input features. Following are the steps to perform simple linear regression using sklearn in Python:
Import Libraries
Following is the code to import important libraries:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model
Read File
The next step is to check the first five rows of the dataset. Following is the example of a vehicle model:
data = pd.read_csv(“Fuel.csv”) data.head()
Feature Selection
In this example, the goal is to predict the value of co2 emissions from engine size in the data set.
data = data[[“ENGINESIZE”,”CO2EMISSIONS”]]
Plotting Data
Users can visualize the data on a scatter plot by the following code:
plt.scatter(data[“ENGINESIZE”] , data[“CO2EMISSIONS”] , color=”blue”) plt.xlabel(“ENGINESIZE”) plt.ylabel(“CO2EMISSIONS”) plt.show()
Data Division
The next step is to divide data into training and testing datasets to check the accuracy of a model. Training data helps in model training, and testing data helps in checking the accuracy of the model.
train = data[:(int((len(data)*0.8)))] test = data[(int((len(data)*0.8))):]
Model Training
Following lines of code train the model and find coefficients for the best-fit regression line:
regr = linear_model.LinearRegression() train_x = np.array(train[[“ENGINESIZE”]]) train_y = np.array(train[[“CO2EMISSIONS”]]) regr.fit(train_x,train_y) print (“coefficients : “,regr.coef_) #Slope print (“Intercept : “,regr.intercept_) #Intercept
Plot Best Fit Line
The next step is to plot the line:
plt.scatter(train[“ENGINESIZE”], train[“CO2EMISSIONS”], color=’blue’) plt.plot(train_x, regr.coef_*train_x + regr.intercept_, ‘-r’) plt.xlabel(“Engine size”) plt.ylabel(“Emission”)
Prediction Function
The next step is to use the prediction function for a testing dataset:
def get_regression_predictions(input_features,intercept,slope): predicted_values = input_features*slope + intercept return predicted_values
Predicting co2 Emissions
Following is the code to predict values of co2 emissions based on the regression line:
my_engine_size = 3.5 estimatd_emission = get_regression_predictions(my_engine_size,regr.intercept_[0],regr.coef_[0][0]) print (“Estimated Emission :”,estimatd_emission)
Checking Test Data Accuracy
Users can compare actual values with predicted values to check the accuracy of the model.
from sklearn.metrics import r2_score test_x = np.array(test[[‘ENGINESIZE’]]) test_y = np.array(test[[‘CO2EMISSIONS’]]) test_y_ = regr.predict(test_x) print(“Mean absolute error: %.2f” % np.mean(np.absolute(test_y_ — test_y))) print(“Mean sum of squares (MSE): %.2f” % np.mean((test_y_ — test_y) ** 2)) print(“R2-score: %.2f” % r2_score(test_y_ , test_y) )
Other useful articles:
- OOP in Python
- Python v2 vs Python v3
- Variables, Data Types, and Syntaxes in Python
- Operators, Booleans, and Tuples
- Loops and Statements in Python
- Python Functions and Modules
- Regular Expressions in Python
- Python Interfaces
- JSON Data and Python
- Pip and its Uses in Python
- File Handling in Python
- Searching and Sorting Algorithms in Python
- System Programming (Pipes &Threads etc.)
- Database Programming in Python
- Debugging with Assertion in Python
- Sockets in Python
- InterOp in Python
- Exception Handling in Python
- Environments in Python
- Foundation of Data Science
- Reinforcement Learning
- Python for AI
- Applied Text Mining in Python
- Python Iterations using Libraries
- NumPy vs SciPy
- Python Array Indexing and Slicing
- PyGame
- PyTorch
- Python & Libraries
- Python with MySQL
- Python with MongoDB
- Path Planning Algorithm in Python
- Image Processing with Python
- Python and Machine Learning