ML Titanic Challenge - 3 Data Prep

Dear Sciaku Learner you are not logged in or not enrolled in this course.

Please Click on login or enroll now button.

If you have any query feel free to chat us!

Happy Coding! Happy Learning!

Lecture 75:- ML Titanic Challenge - 3 Data Prep

After performing data analysis, the next step is to prepare the data for training a machine learning model. Data preparation involves handling missing values, encoding categorical features, and splitting the dataset into features (X) and the target variable (y). Here's how you can prepare the Titanic dataset for machine learning:

 

pythonCopy code

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline # Load the dataset train_df = pd.read_csv('train.csv') # Drop unnecessary columns drop_columns = ['PassengerId', 'Name', 'Ticket', 'Cabin'] train_df = train_df.drop(columns=drop_columns) # Handling missing values train_df['Age'].fillna(train_df['Age'].median(), inplace=True) train_df['Embarked'].fillna(train_df['Embarked'].mode()[0], inplace=True) # Splitting features and target variable X = train_df.drop(columns=['Survived']) y = train_df['Survived'] # Splitting the dataset into training and validation sets X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=42) # Preprocessing: ColumnTransformer to handle numeric and categorical features numeric_features = ['Age', 'SibSp', 'Parch', 'Fare'] categorical_features = ['Pclass', 'Sex', 'Embarked'] numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())]) categorical_transformer = Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]) preprocessor = ColumnTransformer( transformers=[ ('num', numeric_transformer, numeric_features), ('cat', categorical_transformer, categorical_features) ]) # Combine preprocessing with a machine learning model from sklearn.ensemble import RandomForestClassifier model = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))]) # Train the model model.fit(X_train, y_train) # Evaluate the model accuracy = model.score(X_valid, y_valid) print(f'Model Accuracy: {accuracy:.2f}')

Here's a breakdown of the data preparation steps in the code:

  1. Drop Unnecessary Columns: Remove columns that are not useful for modeling.

  2. Handling Missing Values: Fill missing values in the "Age" and "Embarked" columns.

  3. Splitting Data: Split the dataset into features (X) and the target variable (y), and further split into training and validation sets.

  4. Preprocessing: Define a ColumnTransformer that applies scaling to numeric features and one-hot encoding to categorical features.

  5. Model Pipeline: Create a pipeline that combines the preprocessing steps with a machine learning model (Random Forest in this case).

  6. Train the Model: Fit the pipeline to the training data.

  7. Evaluate the Model: Calculate and print the model accuracy on the validation set.

This code provides a basic example of data preparation and model training for the Titanic challenge. You can further refine the preprocessing steps, experiment with different algorithms, tune hyperparameters, and explore more advanced techniques to improve your model's performance.

9. Projects

Comments: 0

Frequently Asked Questions (FAQs)

How do I register on Sciaku.com?
How can I enroll in a course on Sciaku.com?
Are there free courses available on Sciaku.com?
How do I purchase a paid course on Sciaku.com?
What payment methods are accepted on Sciaku.com?
How will I access the course content after purchasing a course?
How long do I have access to a purchased course on Sciaku.com?
How do I contact the admin for assistance or support?
Can I get a refund for a course I've purchased?
How does the admin grant access to a course after payment?