top of page

🔌 INTELLIGERE | How linear is your linear regression? 1/n


In the vast landscape of machine learning, Linear Regression stands as a fundamental and approachable model that lays the groundwork for understanding predictive analytics. Whether you're new to the world of data science or a non-techie looking to grasp the basics, let's dive into what Linear Regression is and how it's used in machine learning without getting tangled in complex technical jargon.



What is Linear Regression?


At its core, Linear Regression is a statistical method used for modeling the relationship between a dependent variable and one or more independent variables.


IMAGINE wanting to predict something based on known factors, i.e. predicting house prices based on factors like square footage, number of bedrooms, and location.


Linear Regression helps in understanding and quantifying the relationships between these variables.



How Does it Work?


The "linear" aspect in Linear Regression refers to the assumption that the relationship between the variables can be represented by a straight line. This model aims to find the best-fitting line that minimizes the difference between the predicted values and the actual data points.



Think of it this way: if you plotted your data points on a graph, Linear Regression seeks to draw a line that best represents the trend or pattern in your data, allowing you to make predictions based on this line.



Here’s a simple example of how to perform linear regression in Python using the scikit-learn library:

# Import necessary libraries
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import pandas as pd
import numpy as np

# Load your data
# df = pd.read_csv('your_data.csv')

# For simplicity, let's create a DataFrame
df = pd.DataFrame({
    'x': range(1, 11), 
    'y': np.random.randn(10) * 15 + range(1, 11)
})

# Reshape your data
X = df['x'].values.reshape(-1,1)
y = df['y'].values.reshape(-1,1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create a Linear Regression object
regressor = LinearRegression()  

# Train the model using the training sets
regressor.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = regressor.predict(X_test)

# Check the results
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Steps explained:

🔌STEP 1:

This code first imports the necessary libraries and loads the data.

🔌STEP 2:

It then reshapes the data and splits it into training and testing sets.

🔌STEP 3:

A Linear Regression object is created and trained using the training sets.

🔌STEP 4:

The model then makes predictions using the testing set, and the results are checked.


Please replace 'your_data.csv' with your actual data file path. Also, make sure that your data meets the assumptions of linear regression for accurate results.


Remember to install the necessary libraries using pip:

pip install numpy pandas scikit-learn

🔌 To be continued!

bottom of page