If you have any query feel free to chat us!
Happy Coding! Happy Learning!
In the previous implementations, we covered finding the coefficients, making predictions, calculating the Mean Squared Error (MSE), and implementing gradient descent for optimization. In this part, let's complete the implementation by adding data preparation and evaluating the model's performance on a test dataset.
pythonCopy code
def mean(values):
return sum(values) / float(len(values))
def simple_linear_regression(x, y):
# ... (same as previous implementation) ...
def predict(x, m, b):
return [m * xi + b for xi in x]
def mean_squared_error(y_true, y_pred):
n = len(y_true)
mse = sum((yi - ypi)**2 for yi, ypi in zip(y_true, y_pred)) / n
return mse
def gradient_descent(x, y, m, b, learning_rate, epochs):
# ... (same as previous implementation) ...
# Data preparation and model evaluation
def train_test_split(x, y, test_size=0.2):
split_index = int(len(x) * (1 - test_size))
x_train, x_test = x[:split_index], x[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
return x_train, x_test, y_train, y_test
# Example usage:
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 2, 3]
# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Find coefficients using simple linear regression on the training set
m, b = simple_linear_regression(x_train, y_train)
# Make predictions on the test set
predictions = predict(x_test, m, b)
# Calculate Mean Squared Error (MSE) on the test set
mse_test = mean_squared_error(y_test, predictions)
print("Mean Squared Error (MSE) on Test Set:", mse_test)
# Apply Gradient Descent to further optimize the model on the training set
learning_rate = 0.01
epochs = 1000
m_optimized, b_optimized = gradient_descent(x_train, y_train, m, b, learning_rate, epochs)
print("Optimized Slope (m):", m_optimized)
print("Optimized Y-Intercept (b):", b_optimized)
In this final part, we added the train_test_split
function to split the data into training and testing sets. We used 80% of the data for training (x_train
and y_train
) and the remaining 20% for testing (x_test
and y_test
). After training the model on the training set, we evaluated its performance on the test set by calculating the Mean Squared Error (MSE) between the predicted values and the actual values.
By splitting the data into training and testing sets, we can assess how well the model generalizes to new, unseen data. This evaluation is crucial to ensure that the model's performance is not overfitting to the training data and is applicable to real-world scenarios.
It's important to note that this implementation is still a simplified version of linear regression. In practice, using libraries like scikit-learn provides more advanced features, optimizations, and robustness for handling various data and modeling tasks. Additionally, handling more complex scenarios, such as multiple linear regression or polynomial regression, may require additional adjustments to the implementation.
Comments: 0