Decision Tree vs. Random Forest Regression: A Complete Guide with Python Examples

Home » #Technology » Decision Tree vs. Random Forest Regression: A Complete Guide with Python Examples

When working with regression problems in machine learning, choosing the right algorithm is critical for accuracy and performance. Two of the most popular approaches are Decision Tree Regression and Random Forest Regression. This tech concept will explain how these models work, their differences, and when to use them—with practical Python examples to help you implement them effectively. With ~20 years of experience in tech leadership role, I’ve helped businesses leverage such innovations to drive scalability and success.

What is Decision Tree Regression?

Understanding Decision Tree Regression

Decision Tree Regression is a supervised learning algorithm that predicts continuous values by recursively splitting the dataset into regions. Each split is based on a feature that minimizes the variance in the target variable.

How Decision Tree Regression Works?

The dataset is split into different branches using if-else conditions based on feature values.
Each branch leads to a leaf node that represents a predicted value.
The final output is the average of values in the leaf node.

Example: Predicting House Prices Using Decision Tree Regression

Let’s consider a dataset where house prices depend on:

Size of the house (sq ft)
Number of bedrooms

We’ll use Decision Tree Regression to predict house prices based on these features.

Python Code for Decision Tree Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample dataset: House Size (sq ft), Number of Bedrooms, Price ($)
data = np.array([
    [1000, 2, 200000],
    [1500, 3, 250000],
    [1800, 3, 280000],
    [2000, 4, 320000],
    [2300, 4, 350000],
    [2500, 4, 400000],
    [2700, 5, 450000],
    [3000, 5, 500000],
    [3500, 6, 600000],
    [4000, 6, 700000]
])

# Split features (X) and target variable (y)
X = data[:, :2]  # First two columns (Size, Bedrooms)
y = data[:, 2]   # Last column (Price)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor
dt_regressor = DecisionTreeRegressor(max_depth=3)
dt_regressor.fit(X_train, y_train)

# Predict on test data
y_pred = dt_regressor.predict(X_test)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Visualization
plt.scatter(X[:, 0], y, color="blue", label="Actual Prices")
plt.scatter(X_test[:, 0], y_pred, color="red", label="Predicted Prices")
plt.xlabel("House Size (sq ft)")
plt.ylabel("Price ($)")
plt.legend()
plt.title("Decision Tree Regression - House Price Prediction")
plt.show()

Advantages of Decision Tree Regression

Easy to interpret and visualise
Works well with small datasets
Can model non-linear relationships

Disadvantages

Overfitting when trees are deep
Sensitive to small changes in data

What is Random Forest Regression?

Understanding Random Forest Regression

Random Forest Regression is an ensemble learning technique that improves accuracy by combining multiple decision trees. Instead of using one tree, it trains several trees on different parts of the data and averages their outputs.

How Random Forest Regression Works?

The dataset is randomly divided into multiple subsets.
Multiple Decision Trees are trained on these subsets.
The final prediction is the average of all tree predictions.

Example: Predicting Car Prices Using Random Forest Regression

We will predict car prices based on:

Year of manufacture
Mileage (in km)
Engine capacity (in liters)

Python Code for Random Forest Regression

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
import matplotlib.pyplot as plt

# Sample dataset: Year, Mileage (km), Engine Capacity (L), Price ($)
data = np.array([
    [2015, 60000, 1.5, 12000],
    [2016, 50000, 1.6, 14000],
    [2017, 40000, 1.8, 16000],
    [2018, 30000, 2.0, 18000],
    [2019, 20000, 2.2, 22000],
    [2020, 15000, 2.5, 25000],
    [2021, 10000, 3.0, 30000],
    [2022, 5000, 3.5, 35000],
    [2023, 2000, 4.0, 40000],
    [2024, 1000, 4.5, 45000]
])

# Split features (X) and target variable (y)
X = data[:, :3]  # First three columns (Year, Mileage, Engine Capacity)
y = data[:, 3]   # Last column (Price)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train, y_train)

# Predict on test data
y_pred = rf_regressor.predict(X_test)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Visualization
plt.scatter(X[:, 0], y, color="blue", label="Actual Prices")
plt.scatter(X_test[:, 0], y_pred, color="red", label="Predicted Prices")
plt.xlabel("Year of Manufacture")
plt.ylabel("Price ($)")
plt.legend()
plt.title("Random Forest Regression - Car Price Prediction")
plt.show()

Advantages of Random Forest Regression

More accurate than Decision Trees
Less overfitting due to averaging
Works well with large datasets

Disadvantages

Slower than Decision Trees
Harder to interpret due to multiple trees

Decision Tree vs. Random Forest Regression: Key Differences

Feature	Decision Tree Regression	Random Forest Regression
Algorithm	Single decision tree	Multiple decision trees (ensemble)
Overfitting	High (if deep tree)	Low (averaging reduces overfitting)
Accuracy	Moderate	Higher
Interpretability	Easy to interpret	Harder (many trees)
Performance	Faster but less accurate	Slower but more accurate

My Tech Advice: Use Decision Tree Regression when interpretability and speed are important. Use Random Forest Regression when accuracy and robustness are priorities. Both models are powerful tools for regression tasks. Try them on your dataset and choose the best fit for your problem!
#AskDushyant

Note: The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.

#TechConcept #TechAdvice #AI #ML #Python #ModelTuning

What is Decision Tree Regression?

Understanding Decision Tree Regression

How Decision Tree Regression Works?

Example: Predicting House Prices Using Decision Tree Regression

Python Code for Decision Tree Regression

Advantages of Decision Tree Regression

Disadvantages

What is Random Forest Regression?

Understanding Random Forest Regression

How Random Forest Regression Works?

Example: Predicting Car Prices Using Random Forest Regression

Python Code for Random Forest Regression

Advantages of Random Forest Regression

Disadvantages

Decision Tree vs. Random Forest Regression: Key Differences

Section

Tags

Leave a Reply Cancel reply