🧩 Unsupervised Learning

📘 What Is Unsupervised Learning?

Unsupervised Learning is a type of Machine Learning where the model learns from unlabeled data — meaning the input data has no predefined output.
The goal is to find hidden patterns, relationships, or structures within the data.

Key idea: The algorithm explores the data and organizes it based on similarities or underlying patterns.

Examples:

Grouping customers based on shopping behavior 🛒
Detecting unusual transactions (anomalies) 💳
Reducing high-dimensional data for visualization 📉

🔹 K-Means Clustering

Concept

K-Means is one of the most popular clustering algorithms.
It divides data into K clusters, where each data point belongs to the cluster with the nearest mean (centroid).

How It Works

Choose the number of clusters (K).
Randomly initialize K centroids.
Assign each data point to the nearest centroid.
Recalculate centroids based on current assignments.
Repeat steps 3–4 until centroids don’t change significantly.

Goal: Minimize the distance between data points and their assigned centroids.

Example in Python

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

# Example data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Create and fit the model
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)

# Get cluster centers and labels
print("Cluster centers:", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)

# Visualize clusters
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='red', marker='X', s=200)
plt.title("K-Means Clustering Example")
plt.show()

Choosing K:
Use the Elbow Method — plot the sum of squared errors (SSE) for different K values and look for the "elbow" point where improvement slows down.

Use Cases:

Customer segmentation
Market analysis
Image compression

🔹 Dimensionality Reduction (PCA)

Concept

Dimensionality Reduction simplifies large datasets by reducing the number of variables (features) while keeping important information.
This helps speed up algorithms, remove noise, and make visualization easier.

The most common technique is Principal Component Analysis (PCA).

What PCA Does

PCA transforms data into a new coordinate system where:

The first component explains the most variance.
The second component explains the next most variance.
And so on…

Essentially, PCA finds directions (principal components) that best represent the data.

Steps in PCA

Standardize the data.
Compute the covariance matrix.
Calculate eigenvalues and eigenvectors.
Select top components explaining most variance.
Transform data into new feature space.

Example in Python

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

# Example data
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9],
              [1.9, 2.2],
              [3.1, 3.0],
              [2.3, 2.7],
              [2, 1.6],
              [1, 1.1],
              [1.5, 1.6],
              [1.1, 0.9]])

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Transformed Data:\n", X_pca)

Visualizing PCA Results

import matplotlib.pyplot as plt

plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title("PCA - Dimensionality Reduction")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.show()

Use Cases:

Visualizing high-dimensional data
Reducing noise in datasets
Preprocessing for clustering or classification

🧠 Summary

Concept	Description	Use Case
Unsupervised Learning	Finds hidden patterns without labels	Customer segmentation, anomaly detection
K-Means Clustering	Groups similar data points into K clusters	Market segmentation, pattern discovery
PCA (Dimensionality Reduction)	Reduces features while preserving data variance	Visualization, noise reduction

Unsupervised learning helps uncover the hidden structure in data, making it easier to explore and interpret.
It’s especially useful when you don’t have labeled datasets but still want to understand relationships within your data.