Discovering Dominant Colours in Images through K-mean Algorithm


In the realm of deep learning, it is crucial to have a comprehensive understanding of the dataset, encompassing its size, statistical characteristics, and technical intricacies. To address color variability within the dataset, several techniques can be employed. One such approach involves utilizing histogram matching, whereby one image is mapped to another within the dataset. This technique proves effective in harmonizing color distributions across images.

Another powerful method to tackle color variability is the K-means algorithm. This algorithm involves iteratively exploring different values of K, which denotes the number of clusters or classes, to identify the optimal fit for the dataset’s images. By replacing less frequent colors with dominant colors, the K-means algorithm effectively mitigates color discrepancies, resulting in a more uniform representation of colors across the dataset.


K-mean Algorithm Code :

import numpy as np
import cv2
from sklearn.cluster import KMeans
from google.colab.patches import cv2_imshow

# Load the image
image = cv2.imread("image.jpg")

# Reshape the image to a 2D array of pixels
pixels = image.reshape(-1, 3)

# Convert the pixel values to floating-point
pixels = pixels.astype(float)

# Perform K-means clustering
kmeans = KMeans(n_clusters=5) # Specify the number of clusters
kmeans.fit(pixels)

# Get the cluster labels and cluster centers
labels = kmeans.labels_
centers = kmeans.cluster_centers_
print(centers)

# Replace each pixel value with its corresponding cluster center
segmented_image = centers[labels].reshape(image.shape)

# Convert the segmented image back to the original data type (uint8)
segmented_image = segmented_image.astype(np.uint8)

# Display the original and segmented images
cv2_imshow(image)
cv2_imshow(segmented_image)



Result of the k-mean algorithm :


1) Original Image : 




2) Resultant Image : 



In the pursuit of robust deep learning models, these color standardization techniques play a pivotal role in ensuring consistent and reliable training data, ultimately enhancing the model’s ability to generalize and perform well on diverse inputs.

Comments