Understanding Data Plotting and Dimensionality in Regression

Raghda Al taei
3 min readOct 11, 2024

--

Introduction

In the world of data analysis and machine learning, understanding how to visualize and manipulate data is essential. In this article, we will work with a dataset containing 25 one-dimensional points. The dataset comprises two columns: the first column represents the x-values, and the second column corresponds to the y-values. We will explore the process of plotting this data and discuss the significance of adding a new dimension to the x-values.

1. Plotting Data with y as a Function of x

The first step in our analysis is to visualize the relationship between the x and y values. A simple scatter plot is an effective way to represent this data. Below is the MATLAB code to achieve this:

code% Load the dataset
data = load('Data.mat'); % Make sure to adjust the filename
x = data(:, 1); % First column is x
y = data(:, 2); % Second column is y
% Plot the data
figure;
plot(x, y, 'b*'); % Blue stars for the data points
xlabel('X');
ylabel('Y');
title('Scatter Plot of Y vs X');
grid on;
% Save the plot
saveas(gcf, 'scatter_plot.png');

This code loads the data, extracts the x and y values, and plots them. The resulting scatter plot provides a clear visual representation of the data points.

2. Adding a New Dimension to the X Data

Next, we will enhance our x-values by adding a new dimension, where all values are equal to 1. This step can be implemented in MATLAB as follows:

% Add a new dimension to x
x_new = [ones(size(x)), x]; % New x matrix with a column of ones

Explanation of the Effect

Adding a column of ones to the x matrix introduces a bias term, which is crucial in regression models. Here’s why this transformation is important:

  • Bias Term: The column of ones allows the model to learn an intercept (or bias) in addition to the slope when fitting a line to the data. Without this term, the model will always assume that the line passes through the origin, which may not represent the data accurately.
  • Effect on Plot: Although this transformation does not change the scatter plot visually, it alters the model’s capacity to capture the underlying relationship more flexibly by accounting for a possible intercept.

In summary, adding a column of ones enables the model to fit the data better by considering a constant bias (intercept) alongside the input values. This adjustment is a fundamental aspect of linear regression and helps improve the accuracy of predictions.

For full code visit this page (https://github.com/raghdafaris/AUT-Machine-Learning/tree/main/1-%20SimplePlot-Overfitting)

--

--

Raghda Al taei
Raghda Al taei

Written by Raghda Al taei

Data Scientist/Analyst Specialist | Johns Hopkins University Master’s Degree in Computer Engineering (AI) | Amirkabir University

No responses yet