Machine-Learning-using-R - old 2022/C2/Unsupervised-Learning/English
Title of the script: Unsupervised Learning
Author: Tanmay Srinath
Keywords: R, RStudio, machine learning, load libraries, ggplot2, mclust, unsupervised learning, classification, k-means clustering, iris dataset, adjusted rand index, video tutorial.
Visual Cue | Narration |
Show Slide
Opening Slide |
Welcome to this spoken tutorial on Unsupervised Learning. |
Show Slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show Slide
System Specifications |
This tutorial is recorded using,
It is recommended to install R version 4.1.0 or higher. |
Show Slide
Prerequisites |
To follow this tutorial, the learner should know:
If not, please access the relevant tutorials on R on this website. |
Let us now learn about Unsupervised learning. | |
Show Slide
Unsupervised Learning |
|
Show Slide
Types of Unsupervised Learning |
Types of Unsupervised learning.
|
Show Slide
k-means Clustering |
Now let us implement k-means clustering on the iris data set.
|
Show Slide
Download Files |
For this tutorial, we will use a script file Clustering.R.
|
[Computer screen]
Highlight Clustering.R and the folder UnsupervisedLearning |
I have downloaded and moved this file to the UnsupervisedLearning folder.
|
Let us switch to RStudio. | |
Double-click Clustering.R
Point to Clustering.R in RStudio. |
Open the script Clustering.R in RStudio.
|
[RStudio]
data(“iris”) View(“iris”) Click the Run button. Point to the iris data set.
|
Select the given commands.
|
[RStudio]
Highlight Iris in source window. Highlight Species column Scroll the table to show the species. Highlight 3 species |
Here we are using the labeled iris data set.
It contains 3 species - Setosa, Versicolor and Virginica.
|
Show Slide
Posing the Problem |
Can we group data based on sepal and petal dimensions?
If so, do these groups represent the original species label accurately? |
Show Slide
Solution |
The answer to this problem is to use a clustering algorithm. |
Show Slide
Finding Number of Clusters |
To know more about it, please refer to the Additional Reading Material. |
Let us switch to Rstudio. | |
Highlight Clustering.R in the Source window button | In the Source window, click on the script Clustering.R. |
[RStudio]
mclust install.packages() |
I will import the necessary packages.
As I have already installed these packages, I will directly import them.
|
Type
library(mclust)
|
At the top of the script, type the following commands.
|
Highlight
library(ggplot2) library(mclust) Click the Run button. |
Click the Run button to load these libraries.
|
[RStudio]
Highlight Iris data set in the source window.
|
Click on the iris data set.
|
Point to Sepal Length versus Sepal Width.
Type ggplot(iris,aes(x = Sepal.Length, y = Sepal.Width, col= Species)) + geom_point()
|
First, we will plot Sepal Length versus Sepal Width.
Type the following command. Save the script.
|
Highlight
ggplot(iris,aes(x = Sepal.Length, y = Sepal.Width, col= Species)) + geom_point()
|
Select this command and run it to get a plot. |
Drag Boundary | I will drag the boundary to see the Plot clearly. |
Highlight
Output in Plot Window |
As we can see, setosa is clearly distinguished.
|
Drag the boundary. | Drag the boundary to see the Source window clearly. |
Point to Petal Length versus Petal Width.
ggplot(iris,aes(x = Petal.Length, y = Petal.Width, col= Species)) + geom_point() |
Now we will plot Petal Length versus Petal Width.
|
Highlight
ggplot(iris,aes(x = Petal.Length, y = Petal.Width, col= Species)) + geom_point()
|
Save and run this command to see the plot. |
Drag boundary to see the plot window clearly. | Drag the boundary to see the Plot window clearly. |
Highlight Output in Plot Window | Notice that Petal Length and Petal Width clearly separate the 3 species of iris.
|
Drag the boundary. | Drag the boundary to see the Source window clearly. |
Highlight Iris in source window
|
In this data set we have three species of iris.
|
[Rstudio]
Type km <- kmeans(iris[,3:4],3,nstart = 20) |
Now let’s use the kmeans() function to perform k-means clustering.
|
Highlight iris[,3:4]
|
Let me explain the parameters of this function.
|
[Rstudio]
Click on Save and Run buttons.
|
Save the command and run it.
|
[RStudio]
Click the iris data set table. |
We need to analyze our model to determine its performance.
|
[Rstudio]
Type table(km$cluster,iris$Species) Click on Save and Run buttons. |
Type the following command.
Save the script and run the command. |
Drag boundary to see the console window. | Drag the boundary to see the Console window clearly. |
[RStudio]
Highlight Row and Column in output |
This table compares between predicted species and actual species.
|
[RStudio]
Highlight Output in the Console
|
50 Setosa samples have been clustered together.
|
Cursor in Rstudio window. | Now we will calculate the accuracy of the model.
|
Show Slide
Adjusted Rand Index |
|
Let us switch to RStudio. | |
[RStudio]
Point to mclust library.
adjustedRandIndex(km$cluster, iris$Species)
|
mclust library contains the function to calculate the adjusted RAND index.
|
Click on Save and Run buttons. | Save and run the command. |
[RStudio]
Highlight Output on Console |
The adjusted RAND index is very close to 1.
|
With this we come to the end of this tutorial. Let us summarize. | |
Show Slide
Summary |
In this tutorial we have learnt:
|
Here is an assignment for you. | |
Show Slide
Assignment |
|
Show Slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show Slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
|
Show Slide
Spoken Tutorial Forum to answer questions
|
Do you have questions in THIS Spoken Tutorial?
Please visit this site. Choose the minute and second where you have the question. Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion
|
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
|
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by the Ministry of Education Govt of India. |
Show Slide
Thank You |
This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay. Thank you for watching. |