GridscriptDonate

๐Ÿงฉ Unsupervised Learning

๐Ÿ“˜ What Is Unsupervised Learning?

Unsupervised Learning is a type of Machine Learning where the model learns from unlabeled data โ€” meaning the input data has no predefined output.
The goal is to find hidden patterns, relationships, or structures within the data.

Key idea: The algorithm explores the data and organizes it based on similarities or underlying patterns.

Examples:

๐Ÿ”น K-Means Clustering

Concept

K-Means is one of the most popular clustering algorithms.
It divides data into K clusters, where each data point belongs to the cluster with the nearest mean (centroid).

How It Works

  1. Choose the number of clusters (K).
  2. Randomly initialize K centroids.
  3. Assign each data point to the nearest centroid.
  4. Recalculate centroids based on current assignments.
  5. Repeat steps 3โ€“4 until centroids donโ€™t change significantly.

Goal: Minimize the distance between data points and their assigned centroids.

Example in Python

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

# Example data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Create and fit the model
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)

# Get cluster centers and labels
print("Cluster centers:", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)

# Visualize clusters
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='red', marker='X', s=200)
plt.title("K-Means Clustering Example")
plt.show()

Choosing K:
Use the Elbow Method โ€” plot the sum of squared errors (SSE) for different K values and look for the "elbow" point where improvement slows down.

Use Cases:

๐Ÿ”น Dimensionality Reduction (PCA)

Concept

Dimensionality Reduction simplifies large datasets by reducing the number of variables (features) while keeping important information.
This helps speed up algorithms, remove noise, and make visualization easier.

The most common technique is Principal Component Analysis (PCA).

What PCA Does

PCA transforms data into a new coordinate system where:

Essentially, PCA finds directions (principal components) that best represent the data.

Steps in PCA

  1. Standardize the data.
  2. Compute the covariance matrix.
  3. Calculate eigenvalues and eigenvectors.
  4. Select top components explaining most variance.
  5. Transform data into new feature space.

Example in Python

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

# Example data
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9],
              [1.9, 2.2],
              [3.1, 3.0],
              [2.3, 2.7],
              [2, 1.6],
              [1, 1.1],
              [1.5, 1.6],
              [1.1, 0.9]])

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Transformed Data:\n", X_pca)

Visualizing PCA Results

import matplotlib.pyplot as plt

plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title("PCA - Dimensionality Reduction")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.show()

Use Cases:

๐Ÿง  Summary

ConceptDescriptionUse Case
Unsupervised LearningFinds hidden patterns without labelsCustomer segmentation, anomaly detection
K-Means ClusteringGroups similar data points into K clustersMarket segmentation, pattern discovery
PCA (Dimensionality Reduction)Reduces features while preserving data varianceVisualization, noise reduction

Unsupervised learning helps uncover the hidden structure in data, making it easier to explore and interpret.
Itโ€™s especially useful when you donโ€™t have labeled datasets but still want to understand relationships within your data.