Supervised Vs. Unsupervised Learning
Unsupervised machine learning is separated into two types. Cluster analysis.
1. Throw a bunch of data at the machine and wait for it to come up with an answer for you
2. Throw a bunch of data at the machine and specify a number of X categories
ex. throw a bunch of pictures of males and females, and have the machine separate out into different genders. This is an example of flat clustering.
Flat vs Hierarchical.
Hierarchical means throwing the data at a machine then having the machine figure out how many categories are possible.
Male vs Female face would use flat clustering.
Hierarchical clustering would be used in genomics because we don’t really know how the genes work with each other, and gather insights.
How do we weigh each of these features in importance? We can simplify into 2-3 features that are important.
import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use(“ggplot”) from sklearn.cluster import KMeans x = [1, 5, 1.5, 8, 1, 9] y = [2, 8, 1.8, 8, 0.6, 11] plt.scatter(x,y) plt.show() X = np.array([1, 2], [5, 8], [1.5, 1.8], [8, 8], [1, 0.6], [9, 11]) kmeans = KMeans(n clusters=3) kmeans.fit(X) centroids = kmeans.cluster_centers_ labels = kmeans.labels_ print(centroids) print(labels) colors = ["g.", "r.", "c.", "y."] for i in range(len(X)): print("coordinate:", X[i], "label:", labels[i]) plt.plot(X[i], X[i], colors[labels[i]], markersize = 10) plt.scatter(centroids[:, 0], centroids[:, 1], marker = "x", s=150, linewidths = 5, zorder = 10) plt.show()
Sometimes we use unsupervised machine learning to visualize a dataset with a lot of features to break it down simply and get the feeling that we are on the right track.
*we group the centroids according to variance. The centroid would be the ideal plot