<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="https://script.spoken-tutorial.org/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://script.spoken-tutorial.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ushav</id>
		<title>Script | Spoken-Tutorial - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="https://script.spoken-tutorial.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ushav"/>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Special:Contributions/Ushav"/>
		<updated>2026-04-09T03:39:16Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.23.17</generator>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C3/Bagging-in-R/English</id>
		<title>Machine-Learning-using-R/C3/Bagging-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C3/Bagging-in-R/English"/>
				<updated>2024-11-27T06:18:17Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Bagging Algorithm for Decision Tree using R  '''Author''': Debatosh Chakraboty and YATE ASSEKE RONALD RONALD.  '''Keywords''': R, RStudio, Bagging A...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Bagging Algorithm for Decision Tree using R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraboty and YATE ASSEKE RONALD RONALD.&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, Bagging Algorithm, machine learning, supervised, unsupervised, dataset, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{|border=1&lt;br /&gt;
|-&lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this Spoken Tutorial on '''Bagging in R.'''&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Bagging.&lt;br /&gt;
* Assumptions for Bagging.&lt;br /&gt;
* Advantages of Bagging.&lt;br /&gt;
* Implementation of Bagging using Decision Tree in R. &lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Limitations of Bagging.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
Basic programming in '''R'''.&lt;br /&gt;
Basics of '''Machine Learning'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Bootstrap aggregation (Bagging) '''&lt;br /&gt;
|| Now let us learn about '''Bootstrap aggregation '''or '''Bagging'''.&lt;br /&gt;
* Any classification model fitted across several training data subsets is desired to have consistent decision boundaries. &lt;br /&gt;
* Large variation in the decision boundaries indicate higher variability of the classification model.&lt;br /&gt;
* Bagging is a commonly used ensemble technique to reduce this variation.&lt;br /&gt;
* In Bagging, random subsets of the training data are repeatedly chosen to construct multiple classifiers.&lt;br /&gt;
* The Bootstrap classifiers constructed from chosen subsets are then aggregated.&lt;br /&gt;
* For bagging of the decision tree classifier, the aggregation is done by a majority vote of the class predicted by Bootstrap trees.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Bagging'''&lt;br /&gt;
* Each observation is independent.&lt;br /&gt;
* The assumption of the chosen classifier is satisfied.&lt;br /&gt;
|| Primarily, the assumptions of the chosen classifier must be satisfied for bagging.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Bagging'''&lt;br /&gt;
|| Advantages of Bagging include:&lt;br /&gt;
* Bagging reduces the variation of the chosen model.&lt;br /&gt;
* Bagging improves the performance (accuracy) of the decision tree classifier in general.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation of Bagging'''&lt;br /&gt;
|| Now we will perform '''Bagging of Decision Tree classifier '''on the '''Raisin''' dataset with two chosen variables.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files'''&lt;br /&gt;
|| For this tutorial, I will use a script file''' Bagging-Decision-Tree.R'''.&lt;br /&gt;
&lt;br /&gt;
'''Raisin Dataset 'raisin.xlsx'.'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|-&lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight '''Bagging-Decision-Tree.R''' and the folder &lt;br /&gt;
|| I have downloaded and moved these files to the '''Bagging '''folder.&lt;br /&gt;
&lt;br /&gt;
The '''Bagging''' folder is in the '''MLProject''' folder .&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Bagging''' folder as my working Directory.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|-&lt;br /&gt;
|| Double click '''Bagging-Decision-Tree.R''' in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to '''Bagging-Decision-Tree.R''' in RStudio.&lt;br /&gt;
|| Open the script '''Bagging-Decision-Tree.R'''. in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Bagging-Decision-Tree.R''' opens in '''RStudio'''.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ipred)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(cvms)'''&lt;br /&gt;
&lt;br /&gt;
'''library(rpart)'''&lt;br /&gt;
||  Select and run these commands to import the necessary packages.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(ipred)'''&lt;br /&gt;
&lt;br /&gt;
Highlight''' library(rpart)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(cvms)'''&lt;br /&gt;
||  The''' ipred '''library contains the '''bagging()''' function.&lt;br /&gt;
&lt;br /&gt;
The '''rpart '''library will be used to implement the decision tree model for bagging.&lt;br /&gt;
&lt;br /&gt;
We will use the '''cvms''' package for plotting the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|-&lt;br /&gt;
|| Highlight''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Run these commands to import the '''raisin''' dataset and prepare it for model building.&lt;br /&gt;
&lt;br /&gt;
Click on data in the Environment tab to load it in the Source Window&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| Type these commands in the source window to perform the train-test split&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight '''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||  Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
The data sets will be shown in the Environment tab.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let us now create our '''Bagging''' model.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''bagging_model &amp;lt;- bagging(class ~ ., data = train_data, coob = TRUE, nbagg = 200,control = rpart.control(cp = 0.00001, xval = 10, maxdepth = 2))'''&lt;br /&gt;
||  In the source window type these commands.&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''Bagging_model &amp;lt;- bagging(class ~ ., data = train_data, coob = TRUE, nbagg = 200,control = rpart.control(cp = 0.00001, xval = 10, maxdepth = 2))'''&lt;br /&gt;
|| '''bagging():''' The bagging() function is used to create a bagging ensemble model.&lt;br /&gt;
&lt;br /&gt;
'''class ~''' .: This formula indicates that the model should predict the 'class' variable.&lt;br /&gt;
&lt;br /&gt;
It uses all other variables in the train_data as predictors.&lt;br /&gt;
&lt;br /&gt;
'''data:''' The dataset used for building the model, is specified as train_data.&lt;br /&gt;
&lt;br /&gt;
'''coob:''' When '''coob''' is TRUE, it indicates out-of-bag (OOB) error estimate. &lt;br /&gt;
&lt;br /&gt;
OOB error is a technique to measure the error of the generated bootstrap classifiers.&lt;br /&gt;
&lt;br /&gt;
'''nbagg:''' Sets the number of bootstrap replicates for bagging. It is set to 200 in this case.&lt;br /&gt;
&lt;br /&gt;
The '''rpart.control''' argument allows to set up the hyperparameters of the base classifier. &lt;br /&gt;
&lt;br /&gt;
'''cp '''denotes the complexity parameter which is set to 0.00001.&lt;br /&gt;
&lt;br /&gt;
'''Xval''' is the number of cross-validations which is set to 10. &lt;br /&gt;
&lt;br /&gt;
'''Maxdepth '''is the maximum depth of any node of the final tree. It is limited to 2 in this case.&lt;br /&gt;
&lt;br /&gt;
Select and run the command to train the model.&lt;br /&gt;
|-&lt;br /&gt;
|| '''print(bagging_model)'''&lt;br /&gt;
|| In the '''Source''' window type and run this command.&lt;br /&gt;
|-&lt;br /&gt;
|| Point to the console window.&lt;br /&gt;
|| The output is shown in the console window.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the console window clearly.&lt;br /&gt;
|-&lt;br /&gt;
||  Highlight&lt;br /&gt;
&lt;br /&gt;
'''Out-of-bag estimate of misclassification error: 0.1746'''&lt;br /&gt;
|| We can confirm that our model is trained successfully.&lt;br /&gt;
&lt;br /&gt;
The training misclassification error of the model is 0.1746.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predictions &amp;lt;- predict(bagging_model, newdata = test_data, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
|| Let us now use our model for prediction.&lt;br /&gt;
&lt;br /&gt;
In the source window type and run the command&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predictions &amp;lt;- predict(bagging_model, newdata = test_data, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
Click on '''Save''' and '''Run '''buttons.&lt;br /&gt;
|| This command stores the prediction of the model bagging_model on test data in a variable '''predictions'''. &lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let's now evaluate our model.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix &amp;lt;- confusionMatrix(predictions, test_data$class)'''&lt;br /&gt;
|| Type this command in the''' Source''' window&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix &amp;lt;- confusionMatrix(predictions, test$class)'''&lt;br /&gt;
|| This command will create a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list will contain the different evaluation metrics.&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| Now, let us type this command.&lt;br /&gt;
&lt;br /&gt;
This command will display the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
It retrieves it from the confusion Matrix list created.&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|-&lt;br /&gt;
|| '''Highlight '''0.8407&lt;br /&gt;
|| We can see that our model has 84 percent accuracy&lt;br /&gt;
&lt;br /&gt;
Note that we can achieve higher accuracy by not manually specifying the max-depth attribute.&lt;br /&gt;
|-&lt;br /&gt;
|| '''confusion_table &amp;lt;- data.frame(confusion_matrix$table)'''&lt;br /&gt;
|| In the source window, type this command.&lt;br /&gt;
&lt;br /&gt;
This will create a data-frame of the confusion matrix table.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
Click on confusion_table in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Notice that it displays the number of correct and incorrect predictions for each class.&lt;br /&gt;
|-&lt;br /&gt;
|| Cursor in the source window.&lt;br /&gt;
|| In the source window, type these commands to plot the confusion matrix&lt;br /&gt;
|-&lt;br /&gt;
|| '''plot_confusion_matrix(confusion_table, '''&lt;br /&gt;
&lt;br /&gt;
'''target_col = &amp;quot;Reference&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''prediction_col = &amp;quot;Prediction&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''counts_col = &amp;quot;Freq&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''palette = list(&amp;quot;low&amp;quot; = &amp;quot;pink1&amp;quot;,&amp;quot;high&amp;quot; = &amp;quot;green1&amp;quot;),'''&lt;br /&gt;
&lt;br /&gt;
'''add_normalized = FALSE,'''&lt;br /&gt;
&lt;br /&gt;
'''add_row_percentages = FALSE,'''&lt;br /&gt;
&lt;br /&gt;
'''add_col_percentages = FALSE)'''&lt;br /&gt;
|| We use the '''plot_confusion_matrix '''function from the''' cvms '''package.&lt;br /&gt;
&lt;br /&gt;
We will use the created data frame '''confusion_table'''.&lt;br /&gt;
&lt;br /&gt;
'''Target_col '''is the column in the dataframe with the labels for reference.&lt;br /&gt;
&lt;br /&gt;
'''Prediction_col '''is the column in the dataframe with predicted labels.&lt;br /&gt;
&lt;br /&gt;
'''Counts_col''' is the column in the dataframe with the number of correct and incorrect labels.&lt;br /&gt;
&lt;br /&gt;
The palette will plot the correct and incorrect predictions in different colours. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
The output is seen in the plot window&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight output in '''plot window'''&lt;br /&gt;
|| 24 '''Besni''' samples have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
19 '''Kecimen''' samples have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only 43 samples.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let us plot our model decision boundary.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 200),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 200)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(bagging_model, newdata = grid, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
||  In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 200),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 200)) '''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;# Predict classes&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(bagging_model, newdata = grid, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
||  This code first creates a '''grid '''of points spanning the feature space.&lt;br /&gt;
&lt;br /&gt;
The '''Bagging '''model then predicts the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
ggplot() +&lt;br /&gt;
&lt;br /&gt;
geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +&lt;br /&gt;
&lt;br /&gt;
geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +&lt;br /&gt;
&lt;br /&gt;
geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),&lt;br /&gt;
&lt;br /&gt;
colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +&lt;br /&gt;
&lt;br /&gt;
scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +&lt;br /&gt;
&lt;br /&gt;
scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +&lt;br /&gt;
&lt;br /&gt;
labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Decision Boundary of Bootstrap Bagging&amp;quot;) +&lt;br /&gt;
&lt;br /&gt;
theme_minimal()&lt;br /&gt;
||In the'''Source '''window type thesecommands&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Decision Boundary of Bootstrap Bagging&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| We plot the decision boundary using predicted classes of the grid.&amp;amp;nbsp;&lt;br /&gt;
&lt;br /&gt;
This command creates decision boundary and distribution of data points with colors indicating the predicted classes.&amp;amp;nbsp;&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|-&lt;br /&gt;
|| Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight output in plot window&lt;br /&gt;
|| We observe that the model has separated most of the data points clearly.&lt;br /&gt;
&lt;br /&gt;
Note that after applying bagging to the decision tree classifier, the decision boundary looks similar to that of the decision tree.&lt;br /&gt;
&lt;br /&gt;
But it is more robust and complicated.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Limitations of Bagging'''&lt;br /&gt;
* Bagging is hard to interpret.&lt;br /&gt;
* Requires more computational time.&lt;br /&gt;
* Bagging doesn’t improve model bias.&lt;br /&gt;
|| These are the limitations of Bagging.&lt;br /&gt;
|-&lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to the end of this tutorial. &lt;br /&gt;
&lt;br /&gt;
Let us summarize. &lt;br /&gt;
|-&lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt about:&lt;br /&gt;
* Bagging &lt;br /&gt;
* Assumptions for Bagging&lt;br /&gt;
* Advantages of Bagging&lt;br /&gt;
* Implementation of Bagging using Decision Tree in R &lt;br /&gt;
* Model Evaluation&lt;br /&gt;
* Limitations of Bagging&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest the assignment for this Spoken Tutorial.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Apply Bagging using Decision Tree on '''PimaIndiansDiabetes''' dataset &lt;br /&gt;
* Install the '''pdp''' package and import the dataset using the '''data(pima)''' command&lt;br /&gt;
* Visualize the decision boundary and measure the accuracy of the model&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your time queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
||  '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
||This tutorial is contributed by Debatosh Chakraborty and Yate Asseke Ronald O from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Decision-Tree-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Decision-Tree-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Decision-Tree-in-R/English"/>
				<updated>2024-11-27T06:10:45Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Decision Tree in R  '''Author''': Debatosh Chakraborty and Yate Asseke Ronald Olivera  '''Keywords''': R, RStudio, machine learning, supervised, uns...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Decision Tree in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty and Yate Asseke Ronald Olivera&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, regression, decision tree, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{|border=1&lt;br /&gt;
|-&lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this Spoken Tutorial on '''Decision Tree''' '''in R.'''&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
|| In this tutorial, we will learn about:&lt;br /&gt;
* '''Decision Tree'''&lt;br /&gt;
* Assumptions for '''Decision Tree'''&lt;br /&gt;
* Advantages of '''Decision Tree'''&lt;br /&gt;
* Implementation of '''Decision Tree''' in '''R'''.&lt;br /&gt;
* Plotting the decision tree model&lt;br /&gt;
* Evaluation of the model'''.'''&lt;br /&gt;
* Visualizing the model decision boundary&lt;br /&gt;
* Limitations of '''Decision Tree'''.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
||This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11'''&lt;br /&gt;
* '''R '''version '''4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites'''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, learner should know:&lt;br /&gt;
* '''Basic programming in R'''&lt;br /&gt;
* '''Basics of Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''What is a Decision Tree?'''&lt;br /&gt;
||Let us see what a decision tree is?&lt;br /&gt;
* It uses a binary tree to split the feature space into several sub-regions&lt;br /&gt;
* The nodes of the tree are the locations at which the feature space splits&lt;br /&gt;
* Misclassification error, Gini index, and entropy aid in identifying ideal splits.&lt;br /&gt;
* The decision boundaries in the Decision Treemodel are nonlinear&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Decision Tree'''&lt;br /&gt;
* The root node of the tree consists of the entire training set.&lt;br /&gt;
* The model does not assume any specific distribution of features.&lt;br /&gt;
* Each observation is independent.&lt;br /&gt;
|| The assumptions of the decision tree model are as follows.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Decision Tree'''&lt;br /&gt;
|| The advantages of decision tree model include:&lt;br /&gt;
* It does not require feature variables to be necessarily continuous&lt;br /&gt;
* Decision trees are intuitive and easy to visualize&lt;br /&gt;
* When the response is continuous, the decision tree methodology can be easily implemented as a regression tree&lt;br /&gt;
&lt;br /&gt;
The regression tree method will be discussed in a separate tutorial.&lt;br /&gt;
|-&lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation of Decision Tree'''&lt;br /&gt;
|| Now we will construct a '''Decision Tree '''on the '''Raisin''' dataset with two chosen variables.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Download Files'''&lt;br /&gt;
|| For this tutorial, we will use&lt;br /&gt;
&lt;br /&gt;
A script file '''DecisionTree.R'''.&lt;br /&gt;
&lt;br /&gt;
Raisin Dataset 'raisin.xlsx'&lt;br /&gt;
&lt;br /&gt;
Please download these files from the '''Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|-&lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight '''DecisionTree.R'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''Decision Tree '''folder.&lt;br /&gt;
&lt;br /&gt;
We will create a Decision Tree classifier model on the '''raisin''' dataset.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''.&lt;br /&gt;
|-&lt;br /&gt;
|| Double-click '''DecisionTree.R''' on RStudio&lt;br /&gt;
&lt;br /&gt;
Point to '''DecisionTree.R''' in RStudio.&lt;br /&gt;
|| Open the script '''DecisionTree.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''DecisionTree.R''' opens in '''RStudio'''.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(rpart)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(rpart.plot)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''library(cvms)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
These packages will be used to aid thebuilding and evaluation of the classifier.&lt;br /&gt;
&lt;br /&gt;
We will use '''rpart '''package to create the decision tree classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''rpart.plot '''package for plotting the '''decision tree'''.&lt;br /&gt;
&lt;br /&gt;
We will use the '''cvms''' package for plotting the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| These commands will load the'''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|| Click on '''data''' in the '''Environment''' tab to load the dataset.&lt;br /&gt;
&lt;br /&gt;
Point to the Source window.&lt;br /&gt;
|| Click on '''data''' in the '''Environment''' tab to load the modified data in the '''Source''' window.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- createDataPartition(data$class, p = 0.7, list = FALSE)'''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands.&lt;br /&gt;
|-&lt;br /&gt;
||Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
|| This will split our dataset into training and testing data.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let us now create our '''Decision Tree''' model.&lt;br /&gt;
|-&lt;br /&gt;
||'''model &amp;lt;- rpart(class ~ ., data = train_data, method = 'class','''&lt;br /&gt;
&lt;br /&gt;
'''control = rpart.control(cp = .00001, xval = 10, maxdepth = 2),'''&lt;br /&gt;
&lt;br /&gt;
'''parms = list(split = &amp;quot;gini&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''summary(decision_model)'''&lt;br /&gt;
|| In the source window, type these commands.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight '''formula = class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''data=train'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''method = 'class''''&lt;br /&gt;
&lt;br /&gt;
Highlight '''parms = list(split = &amp;quot;gini&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''maxdepth = 2'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''xval = 10'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''cp = .00001'''&lt;br /&gt;
&lt;br /&gt;
Click the Run button.&lt;br /&gt;
&lt;br /&gt;
Point to Environment tab.&lt;br /&gt;
||  This is the formula we use for this model.&lt;br /&gt;
&lt;br /&gt;
'''class''' is taken as the dependent variable.&lt;br /&gt;
&lt;br /&gt;
The remaining attributes are independent variables.&lt;br /&gt;
&lt;br /&gt;
'''data=train_data''' uses the '''training''' partition of dataset to train our model.&lt;br /&gt;
&lt;br /&gt;
This tells our model that we are doing a classification task.&lt;br /&gt;
&lt;br /&gt;
This “Gini Index” will be used to determine the best splits of the nodes.&lt;br /&gt;
&lt;br /&gt;
This determines the maximum depth of the tree.&lt;br /&gt;
&lt;br /&gt;
This is the number of cross-validations for each split.&lt;br /&gt;
&lt;br /&gt;
The maximum loss of cross-validation which is desirable.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
The model data is shown in the '''Environment''' tab.&lt;br /&gt;
|-&lt;br /&gt;
||  '''Highlight''' CP&lt;br /&gt;
&lt;br /&gt;
'''HIghlight '''Node Information&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''n=630&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''class counts&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''probabilities&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''Predicted class&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''Primary splits&lt;br /&gt;
|| The summary of the model created is shown in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the console window clearly&lt;br /&gt;
&lt;br /&gt;
'''CP''' displays the complexity table for the trees created in the final model.&lt;br /&gt;
&lt;br /&gt;
This displays the information about each node created.&lt;br /&gt;
&lt;br /&gt;
This includes,&lt;br /&gt;
&lt;br /&gt;
Total observations used to create the node.&lt;br /&gt;
&lt;br /&gt;
The distribution of observations for each class in the node.&lt;br /&gt;
&lt;br /&gt;
The probability of each class.&lt;br /&gt;
&lt;br /&gt;
The class with the highest probability is the predicted class for the node.&lt;br /&gt;
&lt;br /&gt;
This denotes the split information for that particular node.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''rpart.plot(decision_model)'''&lt;br /&gt;
||Now let us visualize the decision tree model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source '''window type this command and run it.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the plot window clearly&lt;br /&gt;
|-&lt;br /&gt;
||  '''Hover '''Kecimen&lt;br /&gt;
&lt;br /&gt;
'''Hover''' 0.71&lt;br /&gt;
&lt;br /&gt;
'''Hover''' 48% 52%&lt;br /&gt;
||The trained decision tree model is shown in the plot window&lt;br /&gt;
&lt;br /&gt;
For each node,&lt;br /&gt;
&lt;br /&gt;
Predicted Class&lt;br /&gt;
&lt;br /&gt;
It’s probability&lt;br /&gt;
&lt;br /&gt;
And the Percentage of total observations is shown&lt;br /&gt;
&lt;br /&gt;
One must note that the modeled tree is interpretable.&lt;br /&gt;
&lt;br /&gt;
This is because the max depth of the tree is manually specified.&lt;br /&gt;
&lt;br /&gt;
But it comes at the cost of underfitting and an increase in misclassification error.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Now let us use the model to make predictions on the testing data partition.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predictions &amp;lt;- predict(decision_model, newdata = test_data, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
Select and run this command.&lt;br /&gt;
||In the source window type this command and run it.&lt;br /&gt;
&lt;br /&gt;
This command generates the predicted classes from the trained decision tree model.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Let's now evaluate our model.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix &amp;lt;- confusionMatrix(predictions, test$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Type this command in the '''Source''' window&lt;br /&gt;
|-&lt;br /&gt;
||Highlight&lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix &amp;lt;- confusionMatrix(predictions, test$class)'''&lt;br /&gt;
||This command will create a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list will contain the different evaluation metrics.&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
||Now, let us type this command.&lt;br /&gt;
&lt;br /&gt;
This command will display the accuracy of the model by retrieving it from the confusion Matrix list.&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|-&lt;br /&gt;
|| '''Highlight '''0.807&lt;br /&gt;
||We can see that our model has 80 percent accuracy.&lt;br /&gt;
&lt;br /&gt;
Note that the misclassifications are higher because of the manual specification of max depth attribute.&lt;br /&gt;
&lt;br /&gt;
Choosing a higher value will make the model less interpretable and will reduce the misclassification error.&lt;br /&gt;
|-&lt;br /&gt;
|| '''confusion_table &amp;lt;- data.frame(confusion_matrix$table)'''&lt;br /&gt;
||In the source window, type these commands.&lt;br /&gt;
&lt;br /&gt;
This will create a data-frame of the confusion matrix table.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
Click on confusion_table in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
We notice that it displays the number of correct and incorrect predictions for each class.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
|| In the source window, type these commands. To plot the confusion matrix&lt;br /&gt;
&lt;br /&gt;
It will represent the number of correct and incorrect predictions using different colors.&lt;br /&gt;
|-&lt;br /&gt;
||'''plot_confusion_matrix(confusion_table,'''&lt;br /&gt;
&lt;br /&gt;
'''target_col = &amp;quot;Reference&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''prediction_col = &amp;quot;Prediction&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''counts_col = &amp;quot;Freq&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''palette = list(&amp;quot;low&amp;quot; = &amp;quot;pink1&amp;quot;,&amp;quot;high&amp;quot; = &amp;quot;green1&amp;quot;),'''&lt;br /&gt;
&lt;br /&gt;
'''add_normalized = FALSE,'''&lt;br /&gt;
&lt;br /&gt;
'''add_row_percentages = FALSE,'''&lt;br /&gt;
&lt;br /&gt;
'''add_col_percentages = FALSE)'''&lt;br /&gt;
|| We use the '''plot_confusion_matrix '''function from the cvms package.&lt;br /&gt;
&lt;br /&gt;
We will use the dataframe '''confusion_table'''.&lt;br /&gt;
&lt;br /&gt;
'''Target_col''' is the column reference in the dataframe '''confusion_table''' with the labels for reference.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Prediction_col''' is the column Prediction in the dataframe '''confusion_table''' with predicted labels.&lt;br /&gt;
&lt;br /&gt;
'''Counts_col''' is the column Frequency in the dataframe '''confusion_table''' with the number of correct and incorrect labels.&lt;br /&gt;
&lt;br /&gt;
The palette will plot the correct and incorrect predictions in different colours.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the plot window&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight '''Output in Plot window.'''&lt;br /&gt;
||This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''Kecimen '''class''': 18 '''misclassifications&lt;br /&gt;
&lt;br /&gt;
'''Besni''' class: '''34''' misclassifications&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
||Now let us visualize the decision boundary of the model.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500))'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(decision_model, newdata = grid, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
||In the source window type these commands&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500))'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(decision_model, newdata = grid, type = &amp;quot;class&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class'''&lt;br /&gt;
|| This code creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It then uses the '''Decision Tree''' model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
||To visualise the generated data, type these commands&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
||This command creates the decision boundary plot of the decision tree model.&lt;br /&gt;
&lt;br /&gt;
It shows the distribution of training data points.&lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes using Ggplot2.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|-&lt;br /&gt;
|| Point to the plot&lt;br /&gt;
||It shows that the decision boundary of a decision tree model is non-linear.&lt;br /&gt;
&lt;br /&gt;
The complexity of the decision boundary increases with the complexity of the decision tree.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of Decision tree'''&lt;br /&gt;
* If the tree is too complex, it can overfit data.&lt;br /&gt;
* Small variations in data can result in a different tree.&lt;br /&gt;
* Large trees are difficult to interpret&lt;br /&gt;
* Noisy Data may cause inaccurate splits&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of '''Decision Tree.'''&lt;br /&gt;
|-&lt;br /&gt;
|| Only Narration &lt;br /&gt;
||With this we come to the end of the tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt about:&lt;br /&gt;
* '''Decision Tree'''&lt;br /&gt;
* Assumption of '''Decision Tree'''&lt;br /&gt;
* Advantages of '''Decision Tree.'''&lt;br /&gt;
* Implementation of '''Decision Tree '''in '''R'''.&lt;br /&gt;
* Plotting the decision tree model.&lt;br /&gt;
* Evaluation of the model'''.'''&lt;br /&gt;
* Visualizing the model decision boundary&lt;br /&gt;
* Limitations of Decision Tree.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest the assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
||&lt;br /&gt;
* Apply Decision Tree on '''PimaIndiansDiabetes''' dataset&lt;br /&gt;
* Install the '''pdp''' package and import the dataset using the '''data(pima)''' command&lt;br /&gt;
* Visualize the decision tree and measure the accuracy of the model&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
||The video at the following link summarizes the Spoken Tutorial project.&lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show slide'''&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
||We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
For more details, please contact us.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions'''&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
||Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|-&lt;br /&gt;
||  '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The '''FOSSEE '''team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The spoken tutorial project was established by the Ministry of Education government of India.&lt;br /&gt;
|-&lt;br /&gt;
||'''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
||This tutorial is contributed by Debatosh Chakraborty and Yate Asseke Ronald. O from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T10:23:05Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It does not involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate the performance matrix.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window.&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning &lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T10:19:39Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It does not involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate the performance matrix.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window.&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T10:06:46Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It does not involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate the performance matrix.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T09:57:37Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate the performance matrix.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T09:13:15Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate the performance matrix.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T09:08:46Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T09:06:50Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data ''' and '''train_data ''' to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:53:16Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''test_data ''' and '''train_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:51:08Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:45:03Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables or features to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:43:34Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm include&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:37:47Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the chosen dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T08:34:47Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of the dummy classifier&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T07:09:53Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn from data.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T07:08:27Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* To use GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn without being explicitly programmed.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-04T07:06:19Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Supervised and Unsupervised Learning&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* Using GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn without being explicitly programmed.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Introduction-to-Machine-Learning-in-R/English"/>
				<updated>2024-06-03T13:02:54Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Introduction to Machine Learning in R  '''Author''': Debatosh Chakraborty  '''Keywords''': R, RStudio, machine learning, supervised, unsupervised, v...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Introduction to Machine Learning in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Introduction to Machine Learning in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of ML CLassifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* Using GGPlot2 and dplyr package.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Machine Learning'''&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
&lt;br /&gt;
|| About machine learning&lt;br /&gt;
&lt;br /&gt;
* ML enables computers to learn without being explicitly programmed.&lt;br /&gt;
* ML algorithms automate the learning process from data through patterns.&lt;br /&gt;
* Their primary role is prediction, classification or clustering of data.&lt;br /&gt;
* ML algorithms are applied in several applications.&lt;br /&gt;
* For example Natural Language Processing, Image and speech recognition, etc.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Types of Machine Learning''' &lt;br /&gt;
|| ML algorithms include the following types and tasks: &lt;br /&gt;
* '''Supervised '''learning: Prediction and Classification''',''' &lt;br /&gt;
* '''Unsupervised '''learning''': '''Clustering''','''&lt;br /&gt;
* '''Semi-supervised '''learning&lt;br /&gt;
* '''Reinforcement '''learning'''.'''&lt;br /&gt;
&lt;br /&gt;
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning algorithms. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Supervised and Unsupervised Learning'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''   '''&lt;br /&gt;
|| Supervised learning: Labeled data &lt;br /&gt;
* ML algorithms predict labels for unseen features &lt;br /&gt;
* They predict based on given features and labels of data.&lt;br /&gt;
&lt;br /&gt;
Unsupervised learning: Unlabeled data&lt;br /&gt;
* ML algorithms develop a mechanism to group similar features into clusters.&lt;br /&gt;
* And label them for future analysis.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Classification and Regression'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
* Supervised learning consists of Regression and Classification.&lt;br /&gt;
* '''Regression''' is applied to predict and learn continuous-valued responses from features. &lt;br /&gt;
* Regression techniques include Linear, Spline, Ridge, Lasso, and others.&lt;br /&gt;
* '''Classification''' is applied to predict the class of a discrete (labeled) response from features. &lt;br /&gt;
* Classification techniques include Logistic Regression, Decision Tree, SVM, and others.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Workflow of an ML Classifier algorithm'''&lt;br /&gt;
|| The Workflow of an ML Classifier algorithm&lt;br /&gt;
* Feature Space: Collection of all possible values of the features.&lt;br /&gt;
* A classification algorithm partitions the feature space into a number of classes.&lt;br /&gt;
* Data is split into training and testing sets to learn and evaluate the algorithm.&lt;br /&gt;
* The model learns from the training data to create partitions of feature space.&lt;br /&gt;
* The model is evaluated on the test dataset through performance metrics.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Dataset'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use '''Raisin dataset '''with two chosen variables to understand a classification problem.&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please refer to Additional Reading Material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''Intro.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''Intro.R''' and the folder '''Introduction.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''Introduction '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Introduction''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will introduce classification on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click Intro.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to Intro.R in RStudio.&lt;br /&gt;
|| Let us open the script '''Intro.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
Script '''Intro.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages, I have imported them directly. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the '''Environment''' tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data.&lt;br /&gt;
&lt;br /&gt;
2 columns (&amp;quot;minorAL&amp;quot;, &amp;quot;ecc&amp;quot;) are chosen as features.&lt;br /&gt;
&lt;br /&gt;
The class column is chosen as a target variable.&lt;br /&gt;
&lt;br /&gt;
We convert the target variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We will now understand the feature space of this data.&lt;br /&gt;
|- &lt;br /&gt;
|| '''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_minor_al &amp;lt;- range(data$minorAL)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''range_ecc &amp;lt;- range(data$ecc)'''&lt;br /&gt;
|| These commands show the range of the feature variables '''minorAL''' and''' ecc.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The minimum and maximum value of the minor_al and ecc are shown in their range variables&lt;br /&gt;
|- &lt;br /&gt;
|| '''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
|| We will now use the range to generate grid points to construct the feature space.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(data$minorAL), max(data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(data$ecc), max(data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
HIghlight&lt;br /&gt;
&lt;br /&gt;
'''feature &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
|| This command generates a sequence of points spanning the range of '''minorAL '''and''' ecc'''.&lt;br /&gt;
&lt;br /&gt;
This command creates a cartesian product of the two features to create a feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We will now plot the feature space created&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| '''ggplot(data = data, aes(x = minorAL, y = ecc)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(aes(color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Feature Space&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
|| These commands plot the data points in the feature space.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|-&lt;br /&gt;
|  | Drag boundaries.&lt;br /&gt;
|| Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|  | [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''Intro.R''' in the Source window, and type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|  | Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment window clearly&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|-&lt;br /&gt;
|| &lt;br /&gt;
|| Here we try to partition the '''feature space''' to construct the classifier.&lt;br /&gt;
&lt;br /&gt;
To begin with, one might construct a '''heuristic '''line to build the classifier.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| In the Source window and type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''fit = function(x)((x * (-0.0021)) + 1.445)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''model_predict &amp;lt;- function(x){'''&lt;br /&gt;
&lt;br /&gt;
'''factor(ifelse(x$ecc &amp;lt; fit(x$minorAL), &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| Let us describe the steps of the classification algorithm.&lt;br /&gt;
&lt;br /&gt;
For that we will define a line to partition the data as a dummy classifier.&lt;br /&gt;
&lt;br /&gt;
It doesn’t involve training data so performance may be poor.&lt;br /&gt;
&lt;br /&gt;
We define a function that separates data points belonging to either side of the line.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Let’s use the line to classify the feature space and draw the decision boundary.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$class &amp;lt;- model_predict(feature)'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''feature$classnum &amp;lt;- as.numeric(feature$class)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
This command will use the line created to predict the class of every point in the grid of feature space.&lt;br /&gt;
&lt;br /&gt;
This command encodes the class string labels into numbers suitable for plotting&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''feature''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point to the data in the Source window.&lt;br /&gt;
|| Drag boundary to see the Environment window.&lt;br /&gt;
&lt;br /&gt;
Click on '''feature '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''feature set '''with the predicted classes loads in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| '''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data= feature, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_abline(slope = -0.0021, intercept = 1.445, size = 1.2)+'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;Data Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are visualising the feature space and the partition line using GGPlot2. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the plot window.&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Overall plot shows that the chosen line approximately separates the training data classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| Let us see how well the partition performs on the testing dataset.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''prediction_test = model_predict(test_data)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We predict the classes from testing data and store it in the '''prediction_test '''variable.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now measure the performance of the classification.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window, type the command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix &amp;lt;- confusionMatrix(test_data$class,prediction_test)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| We use the '''confusionMatrix''' function from the '''MASS''' package to calculate performance matrices.&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
|- &lt;br /&gt;
|| '''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$overall[&amp;quot;Accuracy&amp;quot;]'''&lt;br /&gt;
|| It fetches the accuracy metric from the list created&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Drag boundary to see the console window clearly&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Accuray'''&lt;br /&gt;
&lt;br /&gt;
0.6962963&lt;br /&gt;
&lt;br /&gt;
|| The accuracy of the testing dataset is 69%&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly&lt;br /&gt;
&lt;br /&gt;
Let us now view the confusion matrix of the testing dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''console''' window&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console window'''&lt;br /&gt;
&lt;br /&gt;
Reference&lt;br /&gt;
&lt;br /&gt;
Prediction Besni Kecimen&lt;br /&gt;
&lt;br /&gt;
Besni 50 82&lt;br /&gt;
&lt;br /&gt;
Kecimen 0 138&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the console window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
0 samples of class Besni have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
82 samples of class Kecimen have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
We can see that our partition line is skewed.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| For the same problem many partitions can be drawn.&lt;br /&gt;
&lt;br /&gt;
We can choose a complicated partition to reduce train misclassification error.&lt;br /&gt;
&lt;br /&gt;
But there will be no control on test data.&lt;br /&gt;
&lt;br /&gt;
We can aim to choose a classifier which is simple with a smaller test misclassification error.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Machine Learning&lt;br /&gt;
* Classification and Regression Problems&lt;br /&gt;
* Workflow of an ML Classifier Algorithm&lt;br /&gt;
* Visualizing Feature Space&lt;br /&gt;
* Constructing a dummy classifier&lt;br /&gt;
* Evaluation of an ML algorithm&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
*Use a vertical line as a classifier to partition the feature space.&lt;br /&gt;
* Plot the decision boundary for the same.&lt;br /&gt;
* Evaluate the classifier on the test dataset&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from our team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
R Activities&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the Textbook Companion, Lab Migration and the Case Study Projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit the website.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T10:31:10Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will now visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of the points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression Model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T10:25:47Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will now visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of the points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T10:19:19Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will now visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of the points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T09:17:24Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will now visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of he points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T09:14:24Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of he points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T09:06:03Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the '''Source '''window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of he points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T09:00:11Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder on the '''Desktop'''. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the '''Source '''window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of he points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Logistic-Regression-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Logistic-Regression-in-R/English"/>
				<updated>2024-05-31T08:55:38Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Logistic Regression  '''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty  '''Keywords''': R, RStudio, machine learning, supervised, un...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Logistic Regression&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Logistic Regression in R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about &lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us learn what '''logistic regression''' is&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression is a statistical model used for classification.&lt;br /&gt;
&lt;br /&gt;
It models the probability of success for the explanatory variable.&lt;br /&gt;
&lt;br /&gt;
* It predicts the probability, unlike the response in linear regression.&lt;br /&gt;
* The predicted probability is used as a classifier.&lt;br /&gt;
* The probability of success is modeled using the''' logit or (log odds) '''function.&lt;br /&gt;
* It is a linear classifier, as the logistic regression model has a linear logit.&lt;br /&gt;
* It is often used when the response variable is categorical.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The distribution of the dependent variable is Bernoulli.&lt;br /&gt;
* The data records are independent.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| The dependent variable's distribution is typically assumed to be a Bernoulli distribution in logistic regression.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Advantages of Logistic Regression'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It provides estimates of regression coefficients along with their standard errors.&lt;br /&gt;
* It also provides the predicted probability which in turn is used as a classifier.&lt;br /&gt;
* It doesn’t need explanatory variables to be necessarily continuous. &lt;br /&gt;
* In this sense, it is a more general classifier than LDA and QDA.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Logistic regression offers a significant advantage in that continuous explanatory variables are not a requirement.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of Logistic Regression'''&lt;br /&gt;
|| We will implement '''logistic regression''' using the '''Raisin '''dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The additional reading material has more details on the '''Raisin dataset'''.&lt;br /&gt;
&lt;br /&gt;
Please refer to it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''LogisticRegression.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Highlight LogisticRegression.R &lt;br /&gt;
&lt;br /&gt;
Logistic Regression folder.&lt;br /&gt;
|| I have downloaded and moved these files to the '''Logistic Regression''' folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject '''folder on the '''Desktop'''. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''Logistic Regression''' folder as my Working Directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let’s create a '''Logistic Regression''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click LogisticRegression.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to LogisticRegression.R in RStudio.&lt;br /&gt;
|| Open the script '''LogisticRegression.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LogisticRegression.R.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Script '''LogisticRegression.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(VGAM)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the necessary packages.&lt;br /&gt;
&lt;br /&gt;
The '''VGAM''' package contains the '''glm()''' function required to create our classifier.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight &lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin_Dataset.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data[c(&amp;quot;minorAL&amp;quot;,”ecc”,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight the commands.'''&lt;br /&gt;
|| These commands will load the '''Raisin dataset.'''&lt;br /&gt;
&lt;br /&gt;
They will also prepare the dataset for model building.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data '''on the Environment tab.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''data '''in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
It loads the modified dataset in the '''Source''' window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''trainIndex&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''trainIndex &amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''train &amp;lt;- data[trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''test &amp;lt;- data[-trainIndex, ]'''&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us create a '''Logistic Regression '''model on the '''training dataset'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Logistic_model &amp;lt;- glm(class ~ ., data = train, family = &amp;quot;binomial&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''summary(Logistic_model)$coef'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|-&lt;br /&gt;
|  | Highlight glm()&lt;br /&gt;
&lt;br /&gt;
Highlight '''class ~ .'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''family = binomial'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''train''' &lt;br /&gt;
|| The function glm() represents generalized linear models. &lt;br /&gt;
&lt;br /&gt;
Logistic regression is among the class of models that it fits. &lt;br /&gt;
&lt;br /&gt;
This is the formula for our model. &lt;br /&gt;
&lt;br /&gt;
We try to predict target variable '''class''' based on '''minorAL '''and '''ecc '''features.&lt;br /&gt;
&lt;br /&gt;
This ensures that our model predicts the probability for 2 classes.&lt;br /&gt;
&lt;br /&gt;
It ensures that, out of all the models in glm, the logistic regression model is fit.&lt;br /&gt;
&lt;br /&gt;
This is the data used to train our model.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Coefficients'''&lt;br /&gt;
&lt;br /&gt;
Highlight '''Pr(&amp;gt;|z|)'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Coefficients''' denote the coefficients of the logit function.&lt;br /&gt;
&lt;br /&gt;
That means the log-odds of class change by -0.04 for every unit change in minorAL.&lt;br /&gt;
&lt;br /&gt;
The lower p-values suggest that the effects are statistically significant.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''View(Predicted.prob)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight&lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob &amp;lt;- predict(Logistic_model, test, type=&amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight&lt;br /&gt;
&lt;br /&gt;
'''Type = “response” '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| This command provides the predicted probability of the logistic regression model on the test dataset.&lt;br /&gt;
&lt;br /&gt;
This command ensures the outcome is a probability.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| Point&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Value&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Predicted.prob '''stores the predicted probability of each observation belonging to a certain class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| In the source window type the following commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight &lt;br /&gt;
&lt;br /&gt;
'''predicted.classes &amp;lt;- factor(ifelse(predicted.prob &amp;gt; 0.5, &amp;quot;Kecimen&amp;quot;, &amp;quot;Besni&amp;quot;))'''&lt;br /&gt;
|| This retrieves the predicted classes from the probabilities. &lt;br /&gt;
&lt;br /&gt;
If the probability is greater than 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen&lt;br /&gt;
&lt;br /&gt;
We also convert the output to a '''factor''' datatype to fit in the Confusion matrix function.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us measure the accuracy of our model. &lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion_matrix &amp;lt;- confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(predicted.classes,test_data$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
List is created from the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion_matrix variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the '''Source '''window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type this command&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We use the '''plot_confusion_matrix()''' function to generate a visual plot of the '''confusion matrix list created.'''&lt;br /&gt;
&lt;br /&gt;
Select and run the command&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window&lt;br /&gt;
|- &lt;br /&gt;
|| '''Output in Plot window.'''&lt;br /&gt;
&lt;br /&gt;
|| This plot shows how well our model predicted the testing data.&lt;br /&gt;
&lt;br /&gt;
We observe that:&lt;br /&gt;
&lt;br /&gt;
'''21 '''misclassifications of Besni Class.&lt;br /&gt;
&lt;br /&gt;
'''13 '''misclassifications of Kecimen class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
&lt;br /&gt;
|| We will visualize the decision boundary of the model.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$prob &amp;lt;- predict(model, newdata = grid, type = &amp;quot;response&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- ifelse(grid$prob &amp;gt; 0.5, 'Kecimen', 'Besni')'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(as.factor(grid$class))'''&lt;br /&gt;
|| This code first generates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc''' features in the dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, it uses the '''Logistics Regression '''model to predict the probability of each point in this grid, storing these predictions as a new column ''''prob' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It converts the predicted probabilities of he points into classes.&lt;br /&gt;
&lt;br /&gt;
If the probability exceeds 0.5 then '''Kecimen '''class otherwise '''Besni '''Class is chosen.&lt;br /&gt;
&lt;br /&gt;
The prediced classes are stored in ‘class’ column of grid data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run the commands&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on grid in the Environment tab to load the generated data in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type these commands &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;Logistic Regression Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| We are creating the decision boundary plot using GGPlot2 from the data generated. &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can conclude that the decision boundary of logistic regression is a straight line.&lt;br /&gt;
&lt;br /&gt;
The line separates the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* It’s sensitive to outliers which can affect the accuracy of the classifier.&lt;br /&gt;
* It can perform poorly in the presence of multicollinearity among explanatory variables.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Here are some of the limitations of Logistic Regression&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us summarize what we have learned.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Logistic Regression&lt;br /&gt;
* Assumptions of Logistic Regression&lt;br /&gt;
* Advantages of Logistic Regression&lt;br /&gt;
* Implementation of Logistic Regression in '''R''' using '''Raisin '''dataset'''.'''&lt;br /&gt;
* Model Evaluation.&lt;br /&gt;
* Visualization of the model Decision Boundary&lt;br /&gt;
* Limitations of Logistic Regression&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply logistic regression on the '''Wine '''dataset. &lt;br /&gt;
* This dataset can be found in the '''HDclassif''' package. &lt;br /&gt;
* Install the package and import the dataset using the '''data()''' command.&lt;br /&gt;
* Measure the accuracy of the model&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' was established by the Ministry of Education Govt of India. &lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald. O and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-31T05:32:55Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| | '''Visual Cue'''&lt;br /&gt;
| | '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
|| * Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes is not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function to create our classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
  &lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data and it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| We are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is non-linear.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-31T05:30:05Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| | '''Visual Cue'''&lt;br /&gt;
| | '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
|| * Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes is not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function to create our classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
  &lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| We are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is non-linear.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-31T05:26:41Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| | '''Visual Cue'''&lt;br /&gt;
| | '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
|| * Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes is not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function to create our classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
  &lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
&lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| We are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is non-linear.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-31T05:14:46Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| | '''Visual Cue'''&lt;br /&gt;
| | '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
|| * Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes is not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function to create our classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
  &lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| We are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is non-linear.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-30T12:23:53Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| | '''Visual Cue'''&lt;br /&gt;
| | '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
|| * Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes is not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function&lt;br /&gt;
&lt;br /&gt;
to create our classifier.&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
  &lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the '''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| We are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is non-linear.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
|| &lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-30T05:22:41Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Linear Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''':  R, RStudio, machine learning, supervised, unsupervised, dimensionality reduction, confusion matrix, console, LDA, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Linear Discriminant Analysis in R.'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
# Linear Discriminant Analysis ('''LDA''') and its implementation.&lt;br /&gt;
# Assumptions of LDA&lt;br /&gt;
# Limitations of LDA&lt;br /&gt;
# LDA on a subset of Raisin dataset&lt;br /&gt;
# Visualization of the '''LDA''' separator and its corresponding confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
&lt;br /&gt;
* Basics of '''R''' programming. &lt;br /&gt;
* Basics of '''Machine Learning '''using '''R'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on '''R '''on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Linear Discriminant Analysis'''&lt;br /&gt;
|| Linear Discriminant Analysis is a statistical method.&lt;br /&gt;
* It is used for classification. &lt;br /&gt;
* It constructs a data driven line that best separates different classes.&lt;br /&gt;
* It is based on maximization of likelihood function to classify two or more classes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of LDA'''&lt;br /&gt;
|| &lt;br /&gt;
* LDA technique is used in several applications like&lt;br /&gt;
&lt;br /&gt;
** Fraud Detection&lt;br /&gt;
** Bio-Imaging classification&lt;br /&gt;
** Classify patient disease state&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| Let us now understand the assumptions of LDA.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for LDA'''&lt;br /&gt;
|| '''Multivariate Normality: '''&lt;br /&gt;
&lt;br /&gt;
* All data entries are continuous, Gaussian, with equal covariance matrix for all the classes.&lt;br /&gt;
* Mean vectors for each class are different. &lt;br /&gt;
* Data records are independent and identically distributed among each class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of LDA'''&lt;br /&gt;
|| Now we will see the limitations of LDA.&lt;br /&gt;
&lt;br /&gt;
* Departure from Gaussianity can increase misclassification probability in LDA.&lt;br /&gt;
* '''LDA''' may perform poorly if data has unequal class covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of LDA'''&lt;br /&gt;
|| Now let us implement '''LDA''' on the '''raisin dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
More information on '''raisin''' data is available in the '''Additional Reading material''' on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files''' &lt;br /&gt;
|| We will use a script file '''LDA.R'''&lt;br /&gt;
&lt;br /&gt;
Please download this file from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use it for practising.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Point to '''LDA.R''' and the folder '''LDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the''' LDA folder.'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''LDA '''folder.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This folder is in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''LDA''' folder as my working''' directory'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the script file '''LDA.R.'''&lt;br /&gt;
|| In this tutorial, we will create a '''LDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let us switch to '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Open '''LDA.R '''in '''RStudio'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to''' LDA.R''' in '''RStudio'''.&lt;br /&gt;
|| Open the script '''LDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''LDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the '''Readxl package.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(MASS) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the commands.&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
|| '''Readxl package''' is used to load the '''Excel''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' MASS package''' contains the '''lda()''' function that we will use for our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2 package''' is used to plot the results of our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''caret package''' contains the&lt;br /&gt;
&lt;br /&gt;
'''confusionMatrix''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It is used as a measure for the performance of the classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please note that in order to import these libraries, we need to install them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that everything is installed correctly. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can use the command '''install.packages(“package_name”)''' to install the required packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages, I will directly import them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(lattice)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the requisite packages.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the commands.&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| We will read the excel file and choose 3 columns, two features ('''minorAL, ecc)''' and one target ('''class''') variable.&lt;br /&gt;
&lt;br /&gt;
Run these commands to import the '''raisin''' dataset.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Environment '''tab clearly.&lt;br /&gt;
&lt;br /&gt;
Point to the data variable in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click the data to load the dataset.&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab under '''Data '''heading, you will see a '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click the data''' variable''' to load the dataset in the '''Source''' window. &lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window clearly.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in the source window.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the below commands.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
||Here we are converting the variable '''data$class''' to a factor.&lt;br /&gt;
&lt;br /&gt;
It ensures that the categorical data is properly encoded. &lt;br /&gt;
&lt;br /&gt;
Select the command and run it. &lt;br /&gt;
|-&lt;br /&gt;
||Only Narration.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command in the source window.&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
||In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will be 70% for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement. &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|| The vector is shown in the''' Environment '''tab.&lt;br /&gt;
|-&lt;br /&gt;
||Point to train-test split.&lt;br /&gt;
|| We use the indices that we previously generated to obtain our train-test split.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data [index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|- &lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
|| Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the Environment tab.&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data '''and '''train_data '''to load them in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us train our '''LDA''' model.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
|| In the '''Source '''window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
Point to the output in the '''console '''window.&lt;br /&gt;
|| We pass two parameters to the '''lda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Select the comands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''console''' window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight '''output''' in the '''console.'''&lt;br /&gt;
|| Our '''model''' provides us with a lot of information.&lt;br /&gt;
&lt;br /&gt;
Let us go through them one at a time.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''Prior probabilities of groups. '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' Group means.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Coefficients of linear discriminants '''&lt;br /&gt;
&lt;br /&gt;
|| These explain the distribution of classes in the training dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the mean values of each '''predictor '''variable for each '''species'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the '''linear combination of predictor''' variables. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The given linear combinations form the decision rule of the '''LDA''' model.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us use this model to make predictions on the testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(LDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type this command and run it. &lt;br /&gt;
&lt;br /&gt;
Let us check what '''predicted_values''' contain.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the table.&lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''predicted_values '''table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$posterior)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$x)'''&lt;br /&gt;
|| In the '''Source''' window type these commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the''' console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command output of '''head(predicted_values$class) '''in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$posterior)''' in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$x) '''in '''console'''&lt;br /&gt;
|| It contains the type of species that the model has predicted for each observation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It contains the '''posterior probability''' of the observation belonging to each class.&lt;br /&gt;
&lt;br /&gt;
This contains the linear discriminants for each observation.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Now we will measure the performance of our model using the '''Confusion Matrix'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;-table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''Save '''and''' Run''' buttons.&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Save and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusion &amp;lt;- table(test_data$class, predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
|| This table creates a confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''fourfoldplot()''' function generates a visual plot of the confusion matrix, &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the plot in '''plot window '''&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Given the specific seed (set.seed=1), LDA has misclassified 33 out of 270 observations. &lt;br /&gt;
&lt;br /&gt;
This number may change for different sets of training data. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us visualize how well our model separates different classes.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(train_data$minorAL), max(train_data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(train_data$ecc), max(train_data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max$predicted_class &amp;lt;- predict(LDA_model, newdata = min_max)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- predict(LDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This block of code operates as a setup for visual plotting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It consists of square grid coordinates in the range of training data and their predicted linear discriminants.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ''' seq ''' function generates a sequence of evenly spaced values within a range of smallest and largest values of 'minorAL' and 'ecc' variables from the training data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' 'grid' '''variable contains the generated data including the prediction of the LDA_model on it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the Environment tab.&lt;br /&gt;
|| Drag boundary to see the details in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These variables contain the data for the visualization of the linear discriminants.&lt;br /&gt;
&lt;br /&gt;
Click the '''grid''' '''data''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''grid data''' table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|| This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It plots the '''grid''' points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_contour '''creates the decision boundary of the LDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_color_manual''' function assigns specific colors to the classes and so does '''scale_fill_manual''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''Plots '''window&lt;br /&gt;
|| We can see that our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt:&lt;br /&gt;
&lt;br /&gt;
* Linear Discriminant Analysis ('''LDA''') and its implementation.&amp;amp;nbsp;&lt;br /&gt;
* Assumptions of LDA&lt;br /&gt;
* Limitations of LDA&lt;br /&gt;
* LDA on a subset of Raisin dataset&lt;br /&gt;
* Visualization of the '''LDA''' separator and its corresponding confusion matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Perform LDA on inbuilt '''PlantGrowthdataset'''&lt;br /&gt;
* Evaluate the model using a confusion matrix and visualize the results&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions.'''&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Thank You'''&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborthy from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-30T05:03:40Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Linear Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''':  R, RStudio, machine learning, supervised, unsupervised, dimensionality reduction, confusion matrix, console, LDA, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Linear Discriminant Analysis in R.'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
# Linear Discriminant Analysis ('''LDA''') and its implementation.&lt;br /&gt;
# Assumptions of LDA&lt;br /&gt;
# Limitations of LDA&lt;br /&gt;
# LDA on a subset of Raisin dataset&lt;br /&gt;
# Visualization of the '''LDA''' separator and its corresponding confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
&lt;br /&gt;
* Basics of '''R''' programming. &lt;br /&gt;
* Basics of '''Machine Learning '''using '''R'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on '''R '''on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Linear Discriminant Analysis'''&lt;br /&gt;
|| Linear Discriminant Analysis is a statistical method.&lt;br /&gt;
* It is used for classification. &lt;br /&gt;
* It constructs a data driven line that best separates different classes.&lt;br /&gt;
* It is based on maximization of likelihood function to classify two or more classes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of LDA'''&lt;br /&gt;
|| &lt;br /&gt;
* LDA technique is used in several applications like&lt;br /&gt;
&lt;br /&gt;
** Fraud Detection&lt;br /&gt;
** Bio-Imaging classification&lt;br /&gt;
** Classify patient disease state&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| Let us now understand the assumptions of LDA.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for LDA'''&lt;br /&gt;
|| '''Multivariate Normality: '''&lt;br /&gt;
&lt;br /&gt;
* All data entries are continuous, Gaussian, with equal covariance matrix for all the classes.&lt;br /&gt;
* Mean vectors for each class are different. &lt;br /&gt;
* Data records are independent and identically distributed among each class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of LDA'''&lt;br /&gt;
|| Now we will see the limitations of LDA.&lt;br /&gt;
&lt;br /&gt;
* Departure from Gaussianity can increase misclassification probability in LDA.&lt;br /&gt;
* '''LDA''' may perform poorly if data has unequal class covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of LDA'''&lt;br /&gt;
|| Now let us implement '''LDA''' on the '''raisin dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
More information on '''raisin''' data is available in the '''Additional Reading material''' on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files''' &lt;br /&gt;
|| We will use a script file '''LDA.R'''&lt;br /&gt;
&lt;br /&gt;
Please download this file from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use it for practising.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Point to '''LDA.R''' and the folder '''LDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the''' LDA folder.'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''LDA '''folder.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This folder is in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''LDA''' folder as my working''' directory'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the script file '''LDA.R.'''&lt;br /&gt;
|| In this tutorial, we will create a '''LDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let us switch to '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Open '''LDA.R '''in '''RStudio'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to''' LDA.R''' in '''RStudio'''.&lt;br /&gt;
|| Open the script '''LDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''LDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the '''Readxl package.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(MASS) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the commands.&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
|| '''Readxl package''' is used to load the '''Excel''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' MASS package''' contains the '''lda()''' function that we will use for our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2 package''' is used to plot the results of our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''caret package''' contains the&lt;br /&gt;
&lt;br /&gt;
'''confusionMatrix''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It is used as a measure for the performance of the classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please note that in order to import these libraries, we need to install them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that everything is installed correctly. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can use the command '''install.packages(“package_name”)''' to install the required packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages, I will directly import them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(lattice)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the requisite packages.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the commands.&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| We will read the excel file and choose 3 columns, two features ('''minorAL, ecc)''' and one target ('''class''') variable.&lt;br /&gt;
&lt;br /&gt;
Run these commands to import the '''raisin''' dataset.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Environment '''tab clearly.&lt;br /&gt;
&lt;br /&gt;
Point to the data variable in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click the data to load the dataset.&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab under '''Data '''heading, you will see a '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click the data''' variable''' to load the dataset in the '''Source''' window. &lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window clearly.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in the source window.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the below commands.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
||Here we are converting the variable '''data$class''' to a factor.&lt;br /&gt;
&lt;br /&gt;
It ensures that the categorical data is properly encoded. &lt;br /&gt;
&lt;br /&gt;
Select the command and run it. &lt;br /&gt;
|-&lt;br /&gt;
||Only Narration.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command in the source window.&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
||In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will be 70% for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement. &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| The vector is shown in the''' Environment '''tab.&lt;br /&gt;
|-&lt;br /&gt;
||Point to train-test split.&lt;br /&gt;
|| We use the indices that we previously generated to obtain our train-test split.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data [index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|- &lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
|| Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the Environment tab.&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data '''and '''train_data '''to load them in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us train our '''LDA''' model.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
|| In the '''Source '''window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
Point to the output in the '''console '''window.&lt;br /&gt;
|| We pass two parameters to the '''lda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Select the comands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''console''' window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight '''output''' in the '''console.'''&lt;br /&gt;
|| Our '''model''' provides us with a lot of information.&lt;br /&gt;
&lt;br /&gt;
Let us go through them one at a time.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''Prior probabilities of groups. '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' Group means.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Coefficients of linear discriminants '''&lt;br /&gt;
&lt;br /&gt;
|| These explain the distribution of classes in the training dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the mean values of each '''predictor '''variable for each '''species'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the '''linear combination of predictor''' variables. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The given linear combinations form the decision rule of the '''LDA''' model.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us use this model to make predictions on the testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(LDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type this command and run it. &lt;br /&gt;
&lt;br /&gt;
Let us check what '''predicted_values''' contain.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the table.&lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''predicted_values '''table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$posterior)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$x)'''&lt;br /&gt;
|| In the '''Source''' window type these commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the''' console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command output of '''head(predicted_values$class) '''in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$posterior)''' in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$x) '''in '''console'''&lt;br /&gt;
|| It contains the type of species that the model has predicted for each observation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It contains the '''posterior probability''' of the observation belonging to each class.&lt;br /&gt;
&lt;br /&gt;
This contains the linear discriminants for each observation.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Now we will measure the performance of our model using the '''Confusion Matrix'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;-table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''Save '''and''' Run''' buttons.&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Save and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusion &amp;lt;- table(test_data$class, predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
|| This table creates a confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''fourfoldplot()''' function generates a visual plot of the confusion matrix, &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the plot in '''plot window '''&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Given the specific seed (set.seed=1), LDA has misclassified 33 out of 270 observations. &lt;br /&gt;
&lt;br /&gt;
This number may change for different sets of training data. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us visualize how well our model separates different classes.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(train_data$minorAL), max(train_data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(train_data$ecc), max(train_data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max$predicted_class &amp;lt;- predict(LDA_model, newdata = min_max)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- predict(LDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This block of code operates as a setup for visual plotting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It consists of square grid coordinates in the range of training data and their predicted linear discriminants.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ''' seq ''' function generates a sequence of evenly spaced values within a range of smallest and largest values of 'minorAL' and 'ecc' variables from the training data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' 'grid' '''variable contains the generated data including the prediction of the LDA_model on it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the Environment tab.&lt;br /&gt;
|| Drag boundary to see the details in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These variables contain the data for the visualization of the linear discriminants.&lt;br /&gt;
&lt;br /&gt;
Click the '''grid''' '''data''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''grid data''' table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|| This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It plots the '''grid''' points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_contour '''creates the decision boundary of the LDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_color_manual''' function assigns specific colors to the classes and so does '''scale_fill_manual''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''Plots '''window&lt;br /&gt;
|| We can see that our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt:&lt;br /&gt;
&lt;br /&gt;
* Linear Discriminant Analysis ('''LDA''') and its implementation.&amp;amp;nbsp;&lt;br /&gt;
* Assumptions of LDA&lt;br /&gt;
* Limitations of LDA&lt;br /&gt;
* LDA on a subset of Raisin dataset&lt;br /&gt;
* Visualization of the '''LDA''' separator and its corresponding confusion matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Perform LDA on inbuilt '''PlantGrowthdataset'''&lt;br /&gt;
* Evaluate the model using a confusion matrix and visualize the results&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions.'''&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Thank You'''&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborthy from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-28T13:41:48Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Linear Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''':  R, RStudio, machine learning, supervised, unsupervised, dimensionality reduction, confusion matrix, console, LDA, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Linear Discriminant Analysis in R.'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
# Linear Discriminant Analysis ('''LDA''') and its implementation.&lt;br /&gt;
# Assumptions of LDA&lt;br /&gt;
# Limitations of LDA&lt;br /&gt;
# LDA on a subset of Raisin dataset&lt;br /&gt;
# Visualization of the '''LDA''' separator and its corresponding confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
&lt;br /&gt;
* Basics of '''R''' programming. &lt;br /&gt;
* Basics of '''Machine Learning '''using '''R'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on '''R '''on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Linear Discriminant Analysis'''&lt;br /&gt;
|| Linear Discriminant Analysis is a statistical method.&lt;br /&gt;
* It is used for classification. &lt;br /&gt;
* It constructs a data driven line that best separates different classes.&lt;br /&gt;
* It is based on maximization of likelihood function to classify two or more classes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of LDA'''&lt;br /&gt;
|| &lt;br /&gt;
* LDA technique is used in several applications like&lt;br /&gt;
&lt;br /&gt;
** Fraud Detection&lt;br /&gt;
** Bio-Imaging classification&lt;br /&gt;
** Classify patient disease state&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| Let us now understand the assumptions of LDA.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for LDA'''&lt;br /&gt;
|| '''Multivariate Normality: '''&lt;br /&gt;
&lt;br /&gt;
* All data entries are continuous, Gaussian, with equal covariance matrix for all the classes.&lt;br /&gt;
* Mean vectors for each class are different. &lt;br /&gt;
* Data records are independent and identically distributed among each class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of LDA'''&lt;br /&gt;
|| Now we will see the limitations of LDA.&lt;br /&gt;
&lt;br /&gt;
* Departure from Gaussianity can increase misclassification probability in LDA.&lt;br /&gt;
* '''LDA''' may perform poorly if data has unequal class covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of LDA'''&lt;br /&gt;
|| Now let us implement '''LDA''' on the '''raisin dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
More information on '''raisin''' data is available in the '''Additional Reading material''' on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files''' &lt;br /&gt;
|| We will use a script file '''LDA.R'''&lt;br /&gt;
&lt;br /&gt;
Please download this file from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use it for practising.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Point to '''LDA.R''' and the folder '''LDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the''' LDA folder.'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''LDA '''folder.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This folder is in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''LDA''' folder as my working''' directory'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the script file '''LDA.R.'''&lt;br /&gt;
|| In this tutorial, we will create a '''LDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let us switch to '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Open '''LDA.R '''in '''RStudio'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to''' LDA.R''' in '''RStudio'''.&lt;br /&gt;
|| Open the script '''LDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''LDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the '''Readxl package.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(MASS) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the commands.&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
|| '''Readxl package''' is used to load the '''Excel''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' MASS package''' contains the '''lda()''' function that we will use for our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2 package''' is used to plot the results of our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''caret package''' contains the&lt;br /&gt;
&lt;br /&gt;
'''confusionMatrix''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It is used as a measure for the performance of the classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please note that in order to import these libraries, we need to install them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that everything is installed correctly. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can use the command '''install.packages(“package_name”)''' to install the required packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages, I will directly import them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(lattice)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the requisite packages.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the commands.&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| We will read the excel file and choose 3 columns, two features ('''minorAL, ecc)''' and one target ('''class''') variable.&lt;br /&gt;
&lt;br /&gt;
Run these commands to import the '''raisin''' dataset.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Environment '''tab clearly.&lt;br /&gt;
&lt;br /&gt;
Point to the data variable in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click the data to load the dataset.&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab under '''Data '''heading, you will see a '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click the data''' variable''' to load the dataset in the '''Source''' window. &lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window clearly.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in the source window.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the below commands.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
||Here we are converting the variable '''data$class''' to a factor.&lt;br /&gt;
&lt;br /&gt;
It ensures that the categorical data is properly encoded. &lt;br /&gt;
&lt;br /&gt;
Select the command and run it. them.&lt;br /&gt;
|-&lt;br /&gt;
||Only Narration.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command in the source window.&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
||In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will be 70% for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement. &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| The vector is shown in the''' Environment '''tab.&lt;br /&gt;
|-&lt;br /&gt;
||Point to train-test split.&lt;br /&gt;
|| We use the indices that we previously generated to obtain our train-test split.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data [index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|- &lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
|| Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the Environment tab.&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data '''and '''train_data '''to load them in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us train our '''LDA''' model.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
|| In the '''Source '''window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
Point to the output in the '''console '''window.&lt;br /&gt;
|| We pass two parameters to the '''lda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Select the comands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''console''' window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight '''output''' in the '''console.'''&lt;br /&gt;
|| Our '''model''' provides us with a lot of information.&lt;br /&gt;
&lt;br /&gt;
Let us go through them one at a time.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''Prior probabilities of groups. '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' Group means.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Coefficients of linear discriminants '''&lt;br /&gt;
&lt;br /&gt;
|| These explain the distribution of classes in the training dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the mean values of each '''predictor '''variable for each '''species'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the '''linear combination of predictor''' variables. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The given linear combinations form the decision rule of the '''LDA''' model.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us use this model to make predictions on the testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(LDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type this command and run it. &lt;br /&gt;
&lt;br /&gt;
Let us check what '''predicted_values''' contain.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the table.&lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''predicted_values '''table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$posterior)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$x)'''&lt;br /&gt;
|| In the '''Source''' window type these commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the''' console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command output of '''head(predicted_values$class) '''in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$posterior)''' in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$x) '''in '''console'''&lt;br /&gt;
|| It contains the type of species that the model has predicted for each observation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It contains the '''posterior probability''' of the observation belonging to each class.&lt;br /&gt;
&lt;br /&gt;
This contains the linear discriminants for each observation.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Now we will measure the performance of our model using the '''Confusion Matrix'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;-table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''Save '''and''' Run''' buttons.&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Save and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusion &amp;lt;- table(test_data$class, predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
|| This table creates a confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''fourfoldplot()''' function generates a visual plot of the confusion matrix, &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the plot in '''plot window '''&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Given the specific seed (set.seed=1), LDA has misclassified 33 out of 270 observations. &lt;br /&gt;
&lt;br /&gt;
This number may change for different sets of training data. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us visualize how well our model separates different classes.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(train_data$minorAL), max(train_data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(train_data$ecc), max(train_data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max$predicted_class &amp;lt;- predict(LDA_model, newdata = min_max)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- predict(LDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This block of code operates as a setup for visual plotting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It consists of square grid coordinates in the range of training data and their predicted linear discriminants.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ''' seq ''' function generates a sequence of evenly spaced values within a range of smallest and largest values of 'minorAL' and 'ecc' variables from the training data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' 'grid' '''variable contains the generated data including the prediction of the LDA_model on it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the Environment tab.&lt;br /&gt;
|| Drag boundary to see the details in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These variables contain the data for the visualization of the linear discriminants.&lt;br /&gt;
&lt;br /&gt;
Click the '''grid''' '''data''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''grid data''' table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|| This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It plots the '''grid''' points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_contour '''creates the decision boundary of the LDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_color_manual''' function assigns specific colors to the classes and so does '''scale_fill_manual''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''Plots '''window&lt;br /&gt;
|| We can see that our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt:&lt;br /&gt;
&lt;br /&gt;
* Linear Discriminant Analysis ('''LDA''') and its implementation.&amp;amp;nbsp;&lt;br /&gt;
* Assumptions of LDA&lt;br /&gt;
* Limitations of LDA&lt;br /&gt;
* LDA on a subset of Raisin dataset&lt;br /&gt;
* Visualization of the '''LDA''' separator and its corresponding confusion matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Perform LDA on inbuilt '''PlantGrowthdataset'''&lt;br /&gt;
* Evaluate the model using a confusion matrix and visualize the results&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions.'''&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Thank You'''&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborthy from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-28T13:39:36Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Linear Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''':  R, RStudio, machine learning, supervised, unsupervised, dimensionality reduction, confusion matrix, console, LDA, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Linear Discriminant Analysis in R.'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
# Linear Discriminant Analysis ('''LDA''') and its implementation.&lt;br /&gt;
# Assumptions of LDA&lt;br /&gt;
# Limitations of LDA&lt;br /&gt;
# LDA on a subset of Raisin dataset&lt;br /&gt;
# Visualization of the '''LDA''' separator and its corresponding confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
&lt;br /&gt;
* Basics of '''R''' programming. &lt;br /&gt;
* Basics of '''Machine Learning '''using '''R'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on '''R '''on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Linear Discriminant Analysis'''&lt;br /&gt;
|| Linear Discriminant Analysis is a statistical method.&lt;br /&gt;
* It is used for classification. &lt;br /&gt;
* It constructs a data driven line that best separates different classes.&lt;br /&gt;
* It is based on maximization of likelihood function to classify two or more classes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of LDA'''&lt;br /&gt;
|| &lt;br /&gt;
* LDA technique is used in several applications like&lt;br /&gt;
&lt;br /&gt;
** Fraud Detection&lt;br /&gt;
** Bio-Imaging classification&lt;br /&gt;
** Classify patient disease state&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| Let us now understand the assumptions of LDA.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for LDA'''&lt;br /&gt;
|| '''Multivariate Normality: '''&lt;br /&gt;
&lt;br /&gt;
* All data entries are continuous, Gaussian, with equal covariance matrix for all the classes.&lt;br /&gt;
* Mean vectors for each class are different. &lt;br /&gt;
* Data records are independent and identically distributed among each class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of LDA'''&lt;br /&gt;
|| Now we will see the limitations of LDA.&lt;br /&gt;
&lt;br /&gt;
* Departure from Gaussianity can increase misclassification probability in LDA.&lt;br /&gt;
* '''LDA''' may perform poorly if data has unequal class covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of LDA'''&lt;br /&gt;
|| Now let us implement '''LDA''' on the '''raisin dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
More information on '''raisin''' data is available in the '''Additional Reading material''' on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files''' &lt;br /&gt;
|| We will use a script file '''LDA.R'''&lt;br /&gt;
&lt;br /&gt;
Please download this file from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use it for practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Point to '''LDA.R''' and the folder '''LDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the''' LDA folder.'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''LDA '''folder.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This folder is in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''LDA''' folder as my working''' directory'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the script file '''LDA.R.'''&lt;br /&gt;
|| In this tutorial, we will create a '''LDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let us switch to '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Open '''LDA.R '''in '''RStudio'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to''' LDA.R''' in '''RStudio'''.&lt;br /&gt;
|| Open the script '''LDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''LDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the '''Readxl package.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(MASS) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the commands.&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
|| '''Readxl package''' is used to load the '''Excel''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' MASS package''' contains the '''lda()''' function that we will use for our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2 package''' is used to plot the results of our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''caret package''' contains the&lt;br /&gt;
&lt;br /&gt;
'''confusionMatrix''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It is used as a measure for the performance of the classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please note that in order to import these libraries, we need to install them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that everything is installed correctly. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can use the command '''install.packages(“package_name”)''' to install the required packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages, I will directly import them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(lattice)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the requisite packages.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the commands.&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| We will read the excel file and choose 3 columns, two features ('''minorAL, ecc)''' and one target ('''class''') variable.&lt;br /&gt;
&lt;br /&gt;
Run these commands to import the '''raisin''' dataset.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Environment '''tab clearly.&lt;br /&gt;
&lt;br /&gt;
Point to the data variable in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click the data to load the dataset.&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab under '''Data '''heading, you will see a '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click the data''' variable''' to load the dataset in the '''Source''' window. &lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window clearly.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in the source window.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the below commands.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
||Here we are converting the variable '''data$class''' to a factor.&lt;br /&gt;
&lt;br /&gt;
It ensures that the categorical data is properly encoded. &lt;br /&gt;
&lt;br /&gt;
Select the command and run it. them.&lt;br /&gt;
|-&lt;br /&gt;
||Only Narration.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command in the source window.&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
||In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will be 70% for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement. &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| The vector is shown in the''' Environment '''tab.&lt;br /&gt;
|-&lt;br /&gt;
||Point to train-test split.&lt;br /&gt;
|| We use the indices that we previously generated to obtain our train-test split.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data [index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|- &lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
|| Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the Environment tab.&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data '''and '''train_data '''to load them in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us train our '''LDA''' model.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
|| In the '''Source '''window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
Point to the output in the '''console '''window.&lt;br /&gt;
|| We pass two parameters to the '''lda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Select the comands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''console''' window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight '''output''' in the '''console.'''&lt;br /&gt;
|| Our '''model''' provides us with a lot of information.&lt;br /&gt;
&lt;br /&gt;
Let us go through them one at a time.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''Prior probabilities of groups. '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' Group means.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Coefficients of linear discriminants '''&lt;br /&gt;
&lt;br /&gt;
|| These explain the distribution of classes in the training dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the mean values of each '''predictor '''variable for each '''species'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the '''linear combination of predictor''' variables. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The given linear combinations form the decision rule of the '''LDA''' model.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us use this model to make predictions on the testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(LDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type this command and run it. &lt;br /&gt;
&lt;br /&gt;
Let us check what '''predicted_values''' contain.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the table.&lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''predicted_values '''table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$posterior)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$x)'''&lt;br /&gt;
|| In the '''Source''' window type these commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the''' console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command output of '''head(predicted_values$class) '''in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$posterior)''' in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$x) '''in '''console'''&lt;br /&gt;
|| It contains the type of species that the model has predicted for each observation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It contains the '''posterior probability''' of the observation belonging to each class.&lt;br /&gt;
&lt;br /&gt;
This contains the linear discriminants for each observation.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Now we will measure the performance of our model using the '''Confusion Matrix'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;-table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''Save '''and''' Run''' buttons.&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Save and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusion &amp;lt;- table(test_data$class, predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
|| This table creates a confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''fourfoldplot()''' function generates a visual plot of the confusion matrix, &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the plot in '''plot window '''&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Given the specific seed (set.seed=1), LDA has misclassified 33 out of 270 observations. &lt;br /&gt;
&lt;br /&gt;
This number may change for different sets of training data. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us visualize how well our model separates different classes.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(train_data$minorAL), max(train_data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(train_data$ecc), max(train_data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max$predicted_class &amp;lt;- predict(LDA_model, newdata = min_max)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- predict(LDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This block of code operates as a setup for visual plotting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It consists of square grid coordinates in the range of training data and their predicted linear discriminants.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ''' seq ''' function generates a sequence of evenly spaced values within a range of smallest and largest values of 'minorAL' and 'ecc' variables from the training data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' 'grid' '''variable contains the generated data including the prediction of the LDA_model on it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the Environment tab.&lt;br /&gt;
|| Drag boundary to see the details in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These variables contain the data for the visualization of the linear discriminants.&lt;br /&gt;
&lt;br /&gt;
Click the '''grid''' '''data''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''grid data''' table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|| This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It plots the '''grid''' points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_contour '''creates the decision boundary of the LDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_color_manual''' function assigns specific colors to the classes and so does '''scale_fill_manual''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''Plots '''window&lt;br /&gt;
|| We can see that our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt:&lt;br /&gt;
&lt;br /&gt;
* Linear Discriminant Analysis ('''LDA''') and its implementation.&amp;amp;nbsp;&lt;br /&gt;
* Assumptions of LDA&lt;br /&gt;
* Limitations of LDA&lt;br /&gt;
* LDA on a subset of Raisin dataset&lt;br /&gt;
* Visualization of the '''LDA''' separator and its corresponding confusion matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Perform LDA on inbuilt '''PlantGrowthdataset'''&lt;br /&gt;
* Evaluate the model using a confusion matrix and visualize the results&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions.'''&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Thank You'''&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborthy from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Quadratic-Discriminant-Analysis-in-R/English"/>
				<updated>2024-05-16T12:44:19Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Quadratic Discriminant Analysis in R  '''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty  &amp;lt;div style=&amp;quot;margin-right:-1.27cm;&amp;quot;&amp;gt;'''Keywo...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Quadratic Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': Yate Asseke Ronald Olivera and Debatosh Chakraborty&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;margin-right:-1.27cm;&amp;quot;&amp;gt;'''Keywords''': R, RStudio, machine learning, supervised, unsupervised, QDA, quadratic discriminant analysis, video tutorial.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
| align=center| '''Visual Cue'''&lt;br /&gt;
| align=center| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on''' Quadratic Discriminant Analysis in R'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about:&lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know,&lt;br /&gt;
* Basic programming in '''R'''.&lt;br /&gt;
* '''Basics of Machine Learning'''.&lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Quadratic Discriminant Analysis'''&lt;br /&gt;
||&lt;br /&gt;
* Quadratic discriminant analysis is a statistical method used for classification.&lt;br /&gt;
* QDA constructs a data-driven non-linear separator between two classes.&lt;br /&gt;
* The covariance matrix for different classes isThis line is repeated in the next two slides.Just like &amp;quot;It is based on maximization of likelihood function to classify two or more classes.&amp;quot; in LDA, we can specify a way how QDA created non-linear boundary. not necessarily equal. &lt;br /&gt;
* A quadratic function describes the decision boundary between each pair of classes.&lt;br /&gt;
* more than 80 characters. please shorten he sentence.The decision boundary between each pair of classes is described by a quadratic function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Differences between LDA and QDA'''&lt;br /&gt;
|| Now let’s see the differences between LDA and QDA&lt;br /&gt;
&lt;br /&gt;
* '''LDA''' assumes that each class has the same covariance matrix.&lt;br /&gt;
* '''QDA''' relaxes the assumption of an equal covariance matrix for all the classes.&lt;br /&gt;
* '''LDA''' constructs a linear boundary, while '''QDA '''constructs a non-linear boundary.&lt;br /&gt;
* When the covariance matrices of different classes are the same, '''QDA '''reduces to '''LDA'''.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slides'''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for QDA'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA '''is primarily used when data is multivariate Gaussian.&lt;br /&gt;
&lt;br /&gt;
'''QDA''' assumes that each class has its own covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|| Now let us see the assumption of QDA&lt;br /&gt;
&lt;br /&gt;
QDA is used when data is multivariate Gaussian and each class has its own covariance matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Multicollinearity among predictors may lead to poor performance.&lt;br /&gt;
* The presence of outliers in data may also lead to poor performance. &lt;br /&gt;
&lt;br /&gt;
|| These are the limitations of QDA&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of QDA'''&lt;br /&gt;
&lt;br /&gt;
* Medical Diagnosis.&lt;br /&gt;
* Bio-Imaging classification.&lt;br /&gt;
* Fraud Detection.&lt;br /&gt;
&lt;br /&gt;
|| QDA technique is used in several applications.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of QDA'''&lt;br /&gt;
|| Let us implement '''QDA '''on the '''Raisin''' '''dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more information on Raisin data please see the Additional Reading material on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files '''&lt;br /&gt;
|| We will use a script file '''QDA.R '''and '''Raisin Dataset ‘raisin.xlsx’'''&lt;br /&gt;
&lt;br /&gt;
Please download these files from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use them while practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
point to '''QDA.R''' and the folder '''QDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
|| I have downloaded and moved these files to the '''QDA '''folder. &lt;br /&gt;
&lt;br /&gt;
This folder is located in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
I have also set the '''QDA''' folder as my working Directory.&lt;br /&gt;
&lt;br /&gt;
In this tutorial, we will create a '''QDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us switch to '''RStudio'''. &lt;br /&gt;
|- &lt;br /&gt;
|| Click QDA.R in RStudio&lt;br /&gt;
&lt;br /&gt;
Point to QDA.R in RStudio.&lt;br /&gt;
|| Let us open the script '''QDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''QDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''QDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(dplyr)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Point to the command.'''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select and run these commands to import the packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''readxl''' package to load the excel file of our '''Raisin Dataset'''.&lt;br /&gt;
&lt;br /&gt;
The '''MASS''' package contains the '''qda()''' function&lt;br /&gt;
&lt;br /&gt;
to create our classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''caret''' package to create the '''confusion matrix.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2''' package will be used to create the '''decision boundary plot.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will use the '''dplyr''' package to aid the visualisation of the confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that all the packages are installed correctly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed the packages.&lt;br /&gt;
&lt;br /&gt;
I have directly imported them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;- read_xlsx(&amp;quot;Raisint.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' data&amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
These commands are already there in script file'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| Run this command to load the '''Raisin '''dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the Environment tab below Data, you will see the '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Then click on '''data '''to load the dataset in the Source window. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in R studio.&lt;br /&gt;
&lt;br /&gt;
These commands are already there in script file'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window and close the tab.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command.&lt;br /&gt;
&lt;br /&gt;
These commands are already there in script file'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,ecc,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button&lt;br /&gt;
|| We now select three columns from data and convert the variable '''data$class '''to a factor. &lt;br /&gt;
&lt;br /&gt;
Select and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Click on the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''data.'''&lt;br /&gt;
|| Click on '''data '''to load the modified data in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the data.&lt;br /&gt;
|| Now let us split our data into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''index_split&amp;lt;- sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It will be a 70% of the total number of rows for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|-&lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Click the '''train_data '''and '''test_data '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the '''Environment '''tab.&lt;br /&gt;
&lt;br /&gt;
Click on '''train_data '''and '''test_data '''to load them in the Source window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let’s perform '''QDA''' on the '''training''' dataset.&lt;br /&gt;
|- &lt;br /&gt;
|| [Rstudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window&lt;br /&gt;
&lt;br /&gt;
type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model &amp;lt;- qda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''QDA_model '''&lt;br /&gt;
&lt;br /&gt;
Click Save and Click Run buttons. &lt;br /&gt;
|| We use this command to create '''QDA Model'''&lt;br /&gt;
&lt;br /&gt;
We pass two parameters to the '''qda()''' function.# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Click Save.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console '''window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the console window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window. &lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''console '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Prior probabilities of group'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Group means'''&lt;br /&gt;
|| These are the parameters of our model.&lt;br /&gt;
&lt;br /&gt;
This indicates the composition of classes in the training data.&lt;br /&gt;
&lt;br /&gt;
These indicate the mean values of the predictor variables for each class.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Source '''window.&lt;br /&gt;
|| Drag boundary to see the '''Source''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us now use our model to make predictions on test data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(QDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(model, test)'''&lt;br /&gt;
&lt;br /&gt;
Type the command before highlighting&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''predicted_values '''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| Let’s use this command to predict the class variable from the test data using the trained QDA model.&lt;br /&gt;
&lt;br /&gt;
This will give us more information about the model such as class and posterior.&lt;br /&gt;
&lt;br /&gt;
This predicts the class and posterior probability for the testing data.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| This part is not clear Click on '''predicted_values '''in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Point the output in the This part is not clear'''console'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''class'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''posterior'''&lt;br /&gt;
|| Click on '''predicted_values''' in the Environment tab&lt;br /&gt;
&lt;br /&gt;
This shows us that our predicted variable has two components.&lt;br /&gt;
&lt;br /&gt;
'''class''' contains the predicted '''classes '''of the testing data.&lt;br /&gt;
&lt;br /&gt;
'''Posterior''' contains the '''posterior probability''' of an observation belonging to each class.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us compute the accuracy of our model.&lt;br /&gt;
|- &lt;br /&gt;
|| '''confusion &amp;lt;- confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusionMatrix(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Point to the confusion in the Environment Tab&lt;br /&gt;
&lt;br /&gt;
Highlight the attribute&lt;br /&gt;
&lt;br /&gt;
'''table'''&lt;br /&gt;
|| This command creates a confusion matrix list.&lt;br /&gt;
&lt;br /&gt;
The list is created from the actual and predicted class labels of testing data.&lt;br /&gt;
&lt;br /&gt;
And it is stored in the confusion variable.&lt;br /&gt;
&lt;br /&gt;
It helps to assess the classification model's performance and accuracy.&lt;br /&gt;
&lt;br /&gt;
Select and run the command. &lt;br /&gt;
&lt;br /&gt;
The confusion matrix list is shown in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click '''confusion '''to load it in the''' Source '''window.&lt;br /&gt;
&lt;br /&gt;
'''confusion '''list contains a component table containing the required confusion matrix.&lt;br /&gt;
|- &lt;br /&gt;
|| '''plot_confusion_matrix &amp;lt;- function(confusion_matrix){'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| Now let’s plot the confusion matrix from the table.&lt;br /&gt;
&lt;br /&gt;
Click on '''QDA.R''' in the source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
'''Highlight '''the command &lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- confusion_matrix$table'''&lt;br /&gt;
&lt;br /&gt;
'''tab = as.data.frame(tab)'''&lt;br /&gt;
&lt;br /&gt;
'''tab$Prediction &amp;lt;- factor(tab$Prediction, levels = rev(levels(tab$Prediction)))'''&lt;br /&gt;
&lt;br /&gt;
'''tab &amp;lt;- tab %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''rename(Actual = Reference) %&amp;gt;%'''&lt;br /&gt;
&lt;br /&gt;
'''mutate(cor = if_else(Actual == Prediction, 1,0))'''&lt;br /&gt;
&lt;br /&gt;
'''tab$cor &amp;lt;- as.factor(tab$cor)'''&lt;br /&gt;
&lt;br /&gt;
'''Highlight '''the command&lt;br /&gt;
&lt;br /&gt;
'''ggplot(tab, aes(Actual,Prediction)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_tile(aes(fill= cor),alpha = 0.4) + geom_text(aes(label=Freq)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;red&amp;quot;,&amp;quot;green&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_light() +'''&lt;br /&gt;
&lt;br /&gt;
'''theme(legend.position = &amp;quot;None&amp;quot;,'''&lt;br /&gt;
&lt;br /&gt;
'''line = element_blank()) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_x_discrete(position = &amp;quot;top&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
'''}'''&lt;br /&gt;
&lt;br /&gt;
|| These commands create a function '''plot_confusion_matrix '''to display the confusion matrix from the confusion matrix list created.&lt;br /&gt;
&lt;br /&gt;
It fetches the confusion matrix table from the list.&lt;br /&gt;
&lt;br /&gt;
It creates a data frame from the table which is suitable for plotting using '''GGPlot2'''.&lt;br /&gt;
&lt;br /&gt;
It plots the confusion matrix using the data frame created.&lt;br /&gt;
&lt;br /&gt;
It represents correct and incorrect predictions using different colors.&lt;br /&gt;
&lt;br /&gt;
Select and run the commands. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ff0000;&amp;quot;&amp;gt;'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the '''Source '''window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;- table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ff0000;&amp;quot;&amp;gt;'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
'''plot_confusion_matrix(confusion)'''&lt;br /&gt;
&lt;br /&gt;
Click on''' Save '''and '''Run '''buttons.&lt;br /&gt;
|| The table output is not displayed / used.'''table''' creates a confusion matrix that compares the actual and predicted class labels.&lt;br /&gt;
&lt;br /&gt;
We are using the created '''plot_confusion_matrix()''' function to generate the visual plot of the confusion matrix in '''confusion''' variable&lt;br /&gt;
&lt;br /&gt;
Select and run the command.&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''plot window'''&lt;br /&gt;
|| Drag boundary to see the plot window clearly &lt;br /&gt;
&lt;br /&gt;
Observe that: &lt;br /&gt;
&lt;br /&gt;
22 24 samples of class 0 ...samples of class Kecimen have been incorrectly classified.&lt;br /&gt;
&lt;br /&gt;
11 samples of class Besni have been incorrectly classified. &lt;br /&gt;
&lt;br /&gt;
Overall, the model has misclassified only '''33''' out of '''270 '''samples.&lt;br /&gt;
&lt;br /&gt;
We can say that our model performs well.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the source window clearly.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = seq(min(data$minorAL), max(data$minorAL), length = 500),'''&lt;br /&gt;
&lt;br /&gt;
'''ecc = seq(min(data$ecc), max(data$ecc), length = 500)) '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''grid$class = predict(QDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
|| This block of code first creates a '''grid '''of points spanning the range of '''minorAL '''and '''ecc '''features in the dataset.&lt;br /&gt;
&lt;br /&gt;
It stores it in a variable ''''grid''''. &lt;br /&gt;
&lt;br /&gt;
Then, it uses the QDA model to predict the class of each point in this grid.&lt;br /&gt;
&lt;br /&gt;
It stores these predictions as a new column ''''class' '''in the '''grid '''dataframe. &lt;br /&gt;
&lt;br /&gt;
I have added this part The '''as.numeric''' function encodes the predicted classes string labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
The resulting grid of points and their predicted classes will be used to visualize the decision boundaries of the QDA model.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Click '''grid''' on the Environment tab to load the grid dataframe in the source window.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = minorAL, y = ecc, fill = class), alpha = 0.4) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = minorAL, y = ecc, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 0.7) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;MinorAL&amp;quot;, y = &amp;quot;ecc&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| Click on '''QDA.R''' in the Source window.&lt;br /&gt;
&lt;br /&gt;
In the '''Source''' window type these commands&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data = grid, aes(x = var, y = kurt, fill = class), alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = var, y = kurt, color = class)) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data = grid, aes(x = var, y = kurt, z = classnum),'''&lt;br /&gt;
&lt;br /&gt;
'''colour = &amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(x = &amp;quot;Variance&amp;quot;, y = &amp;quot;Kurtosis&amp;quot;, title = &amp;quot;QDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''')'''&lt;br /&gt;
|| This command is same as LDA plot one. Please check if that script part can be added hereWe are creating the decision boundary plot using '''ggplot2.''' &lt;br /&gt;
&lt;br /&gt;
This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
It plots the grid points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_point '''plots the training data points in the plot.&lt;br /&gt;
&lt;br /&gt;
'''geom_contour''' creates the decision boundary of the QDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_fill_manual''' function assigns specific colors to the classes and so does '''scale_color_manual''' function.&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary.&lt;br /&gt;
&lt;br /&gt;
And the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| We can see that the decision boundary of our model is a non-linear line.&lt;br /&gt;
&lt;br /&gt;
And our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| With this, we come to the end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
|| In this tutorial we have learned about:&lt;br /&gt;
* Quadratic Discriminant Analysis (QDA).&lt;br /&gt;
* Comparison between '''QDA '''and''' LDA'''.&lt;br /&gt;
* Assumptions for QDA.&lt;br /&gt;
* Limitations of QDA&lt;br /&gt;
* Applications of QDA&lt;br /&gt;
* Implementation Of QDA using''' Raisin''' Dataset'''.'''&lt;br /&gt;
* Visualization of the '''QDA '''separator&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Here is an assignment for you.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Assignment&lt;br /&gt;
||&lt;br /&gt;
* Apply '''QDA''' on the '''wine''' dataset.&lt;br /&gt;
* Measure the accuracy of the model.&lt;br /&gt;
&lt;br /&gt;
This dataset can be found in the '''HDclassif '''package. &lt;br /&gt;
&lt;br /&gt;
Install the package and import the dataset using the '''data() '''command&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
About the Spoken Tutorial Project&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| Show slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Workshops&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Spoken Tutorial Forum to answer questions&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.&lt;br /&gt;
&lt;br /&gt;
Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Forum to answer questions&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
&lt;br /&gt;
Show Slide&lt;br /&gt;
&lt;br /&gt;
Textbook Companion&lt;br /&gt;
&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Acknowledgment&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| Show Slide&lt;br /&gt;
&lt;br /&gt;
Thank You&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborty from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</id>
		<title>Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php/Machine-Learning-using-R/C2/Linear-Discriminant-Analysis-in-R/English"/>
				<updated>2023-11-30T09:50:47Z</updated>
		
		<summary type="html">&lt;p&gt;Ushav: Created page with &amp;quot;'''Title of the script''': Linear Discriminant Analysis in R  '''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty  '''Keywords''':  R, RStudio, machine learnin...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Title of the script''': Linear Discriminant Analysis in R&lt;br /&gt;
&lt;br /&gt;
'''Author''': YATE ASSEKE RONALD OLIVERA  and Debatosh Charkraborty&lt;br /&gt;
&lt;br /&gt;
'''Keywords''':  R, RStudio, machine learning, supervised, unsupervised, dimensionality reduction, confusion matrix, console, LDA, video tutorial.&lt;br /&gt;
&lt;br /&gt;
{| border=1&lt;br /&gt;
|- &lt;br /&gt;
|| '''Visual Cue'''&lt;br /&gt;
|| '''Narration'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Opening Slide'''&lt;br /&gt;
|| Welcome to this spoken tutorial on '''Linear Discriminant Analysis in R.'''&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
&lt;br /&gt;
|| In this tutorial, we will learn about: &lt;br /&gt;
# Linear Discriminant Analysis ('''LDA''') and its implementation.&lt;br /&gt;
# Assumptions of LDA&lt;br /&gt;
# Limitations of LDA&lt;br /&gt;
# LDA on a subset of Raisin dataset&lt;br /&gt;
# Visualization of the '''LDA''' separator and its corresponding confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''System Specifications'''&lt;br /&gt;
|| This tutorial is recorded using,&lt;br /&gt;
* '''Windows 11 '''&lt;br /&gt;
* '''R '''version''' 4.3.0'''&lt;br /&gt;
* '''RStudio''' version '''2023.06.1'''&lt;br /&gt;
&lt;br /&gt;
It is recommended to install '''R''' version '''4.2.0''' or higher. &lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Prerequisites '''&lt;br /&gt;
&lt;br /&gt;
'''https://spoken-tutorial.org'''&lt;br /&gt;
|| To follow this tutorial, the learner should know:&lt;br /&gt;
&lt;br /&gt;
* Basics of '''R''' programming. &lt;br /&gt;
* Basics of '''Machine Learning '''using '''R'''. &lt;br /&gt;
&lt;br /&gt;
If not, please access the relevant tutorials on '''R '''on this website.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Linear Discriminant Analysis'''&lt;br /&gt;
|| Linear Discriminant Analysis is a statistical method.&lt;br /&gt;
* It is used for classification. &lt;br /&gt;
* It constructs a data driven line that best separates different classes.&lt;br /&gt;
* It is based on maximization of likelihood function to classify two or more classes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide.'''&lt;br /&gt;
&lt;br /&gt;
'''Applications of LDA'''&lt;br /&gt;
|| &lt;br /&gt;
* LDA technique is used in several applications like&lt;br /&gt;
&lt;br /&gt;
** Fraud Detection&lt;br /&gt;
** Bio-Imaging classification&lt;br /&gt;
** Classify patient disease state&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| Let us now understand the assumptions of LDA.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Assumptions for LDA'''&lt;br /&gt;
|| '''Multivariate Normality: '''&lt;br /&gt;
&lt;br /&gt;
* All data entries are continuous, Gaussian, with equal covariance matrix for all the classes.&lt;br /&gt;
* Mean vectors for each class are different. &lt;br /&gt;
* Data records are independent and identically distributed among each class.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide '''&lt;br /&gt;
&lt;br /&gt;
'''Limitations of LDA'''&lt;br /&gt;
|| Now we will see the limitations of LDA.&lt;br /&gt;
&lt;br /&gt;
* Departure from Gaussianity may increase misclassification probability in LDA.&lt;br /&gt;
* '''LDA''' may perform poorly if data has unequal class covariance matrix.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Implementation Of LDA'''&lt;br /&gt;
|| Now let us implement '''LDA''' on the '''raisin dataset '''with two chosen variables'''.'''&lt;br /&gt;
&lt;br /&gt;
More information on '''raisin''' data is available in the '''Additional Reading material''' on this tutorial page.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide '''&lt;br /&gt;
&lt;br /&gt;
'''Download Files''' &lt;br /&gt;
|| We will use a script file '''LDA.R'''&lt;br /&gt;
&lt;br /&gt;
Please download this file from the''' Code files''' link of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Make a copy and then use it for practicing.&lt;br /&gt;
|- &lt;br /&gt;
|| [Computer screen]&lt;br /&gt;
&lt;br /&gt;
Point to '''LDA.R''' and the folder '''LDA.'''&lt;br /&gt;
&lt;br /&gt;
Point to the''' MLProject folder '''on the '''Desktop.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the''' LDA folder.'''&lt;br /&gt;
|| I have downloaded and moved these files to the '''LDA '''folder.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This folder is in the '''MLProject''' folder on my '''Desktop'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also set the '''LDA''' folder as my working''' directory'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the script file '''LDA.R.'''&lt;br /&gt;
|| In this tutorial, we will create a '''LDA''' classifier model on the '''raisin''' dataset. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let us switch to '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Open '''LDA.R '''in '''RStudio'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to''' LDA.R''' in '''RStudio'''.&lt;br /&gt;
|| Open the script '''LDA.R''' in '''RStudio'''.&lt;br /&gt;
&lt;br /&gt;
For this, click on the script '''LDA.R.'''&lt;br /&gt;
&lt;br /&gt;
Script '''LDA.R''' opens in '''RStudio'''.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the '''Readxl package.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(MASS) '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the commands.&lt;br /&gt;
&lt;br /&gt;
'''&amp;lt;nowiki&amp;gt;#install.packages(“package_name”)&amp;lt;/nowiki&amp;gt;'''&lt;br /&gt;
|| '''Readxl package''' is used to load the '''Excel''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' MASS package''' contains the '''lda()''' function that we will use for our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''ggplot2 package''' is used to plot the results of our analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''caret package''' contains the&lt;br /&gt;
&lt;br /&gt;
'''confusionMatrix''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It is used as a measure for the performance of the classifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please note that in order to import these libraries, we need to install them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please ensure that everything is installed correctly. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can use the command '''install.packages(“package_name”)''' to install the required packages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As I have already installed these packages, I will directly import them. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''library(readxl)'''&lt;br /&gt;
&lt;br /&gt;
'''library(MASS)'''&lt;br /&gt;
&lt;br /&gt;
'''library(ggplot2)'''&lt;br /&gt;
&lt;br /&gt;
'''library(caret)'''&lt;br /&gt;
&lt;br /&gt;
'''library(lattice)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|| Select and run these commands to import the requisite packages.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command''' '''&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the commands.&lt;br /&gt;
&lt;br /&gt;
'''data &amp;lt;- read_xlsx(&amp;quot;Raisin.xlsx&amp;quot;)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''data&amp;lt;-data[c(&amp;quot;minorAL&amp;quot;,&amp;quot;ecc&amp;quot;,&amp;quot;class&amp;quot;)]'''&lt;br /&gt;
&lt;br /&gt;
|| We will read the excel file and choose 3 columns, two features ('''minorAL, ecc)''' and one target ('''class''') variable.&lt;br /&gt;
&lt;br /&gt;
Run these commands to import the '''raisin''' dataset.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''Environment '''tab clearly.&lt;br /&gt;
&lt;br /&gt;
Point to the data variable in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
Click the data to load the dataset.&lt;br /&gt;
&lt;br /&gt;
|| Drag boundary to see the Environment tab clearly.&lt;br /&gt;
&lt;br /&gt;
In the Environment tab under '''Data '''heading, you will see a '''data '''variable.&lt;br /&gt;
&lt;br /&gt;
Click the data''' variable''' to load the dataset in the '''Source''' window. &lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window clearly.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type these commands in the source window.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window type this command.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the below commands.&lt;br /&gt;
&lt;br /&gt;
'''data$class &amp;lt;- factor(data$class)'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
||Here we are converting the variable '''data$class''' to a factor.&lt;br /&gt;
&lt;br /&gt;
It ensures that the categorical data is properly encoded. &lt;br /&gt;
&lt;br /&gt;
Select the command and run it. them.&lt;br /&gt;
|-&lt;br /&gt;
||Only Narration.&lt;br /&gt;
|| Now we split our dataset into training and testing data.&lt;br /&gt;
|-&lt;br /&gt;
||[RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command in the source window.&lt;br /&gt;
&lt;br /&gt;
'''set.seed(1) '''&lt;br /&gt;
&lt;br /&gt;
'''index_split=sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
||In the '''Source''' window type these commands.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
||Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''set.seed(1)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''sample(1:nrow(data),size=0.7*nrow(data),replace=FALSE)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''replace=FALSE'''&lt;br /&gt;
&lt;br /&gt;
Select the commands and click the Run button.&lt;br /&gt;
||First we set a seed for reproducible results.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We will create a vector of indices using '''sample() '''function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will be 70% for training and 30% for testing.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The training data is chosen using simple random sampling without replacement. &lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
|-&lt;br /&gt;
|| The vector is shown in the''' Environment '''tab.&lt;br /&gt;
|-&lt;br /&gt;
||Point to train-test split.&lt;br /&gt;
|| We use the indices that we previously generated to obtain our train-test split.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
Type the command&lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data [index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''train_data &amp;lt;- data[index_split, ]'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''test_data &amp;lt;- data[-c(index_split), ]'''&lt;br /&gt;
|| This creates training data, consisting of 630 unique rows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This creates testing data, consisting of 270 unique rows.&lt;br /&gt;
|- &lt;br /&gt;
|| Select the commands and click the Run button.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the sets in the Environment Tab&lt;br /&gt;
|| Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data sets are shown in the Environment tab.&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
Click on '''test_data '''and '''train_data '''to load them in the Source window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us train our '''LDA''' model.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
|| In the '''Source '''window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''LDA_model &amp;lt;- lda(class~.,data=train_data)'''&lt;br /&gt;
&lt;br /&gt;
'''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''LDA_model'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
Point to the output in the '''console '''window.&lt;br /&gt;
|| We pass two parameters to the '''lda()''' function.&lt;br /&gt;
# formula &lt;br /&gt;
# data on which the model should train.&lt;br /&gt;
&lt;br /&gt;
Select the comands and run them.&lt;br /&gt;
&lt;br /&gt;
The output is shown in the '''console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the '''console''' window.&lt;br /&gt;
|| Drag boundary to see the '''console '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight '''output''' in the '''console.'''&lt;br /&gt;
|| Our '''model''' provides us with a lot of information.&lt;br /&gt;
&lt;br /&gt;
Let us go through them one at a time.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''Prior probabilities of groups. '''&lt;br /&gt;
&lt;br /&gt;
Highlight the command''' Group means.'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command '''Coefficients of linear discriminants '''&lt;br /&gt;
&lt;br /&gt;
|| These explain the distribution of classes in the training dataset.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the mean values of each '''predictor '''variable for each '''species'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These display the '''linear combination of predictor''' variables. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The given linear combinations form the decision rule of the '''LDA''' model.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Drag boundary to see the Source window.&lt;br /&gt;
|| Drag boundary to see the '''Source '''window clearly.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Let us use this model to make predictions on the testing data.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''predicted_values &amp;lt;- predict(LDA_model, test_data)'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source '''window type this command and run it. &lt;br /&gt;
&lt;br /&gt;
Let us check what '''predicted_values''' contain.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Point to the table.&lt;br /&gt;
|| Click the '''predicted_values '''data in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''predicted_values '''table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$posterior)'''&lt;br /&gt;
&lt;br /&gt;
'''head(predicted_values$x)'''&lt;br /&gt;
|| In the '''Source''' window type these commands and run them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the''' console''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command output of '''head(predicted_values$class) '''in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$posterior)''' in the '''console.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Highlight the command output of '''head(predicted_values$x) '''in '''console'''&lt;br /&gt;
|| It contains the type of species that the model has predicted for each observation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It contains the '''posterior probability''' of the observation belonging to each class.&lt;br /&gt;
&lt;br /&gt;
This contains the linear discriminants for each observation.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Now we will measure the performance of our model using the '''Confusion Matrix'''.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
'''confusion &amp;lt;-table(test_data$class,predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on '''Save '''and''' Run''' buttons.&lt;br /&gt;
|| In the '''Source '''window type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Save and run the commands.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command '''confusion &amp;lt;- table(test_data$class, predicted_values$class)'''&lt;br /&gt;
&lt;br /&gt;
Highlight the command&lt;br /&gt;
&lt;br /&gt;
'''fourfoldplot(confusion, color = c(&amp;quot;red&amp;quot;, green&amp;quot;), conf.level = 0, margin=1)'''&lt;br /&gt;
&lt;br /&gt;
|| This table creates a confusion matrix.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''fourfoldplot()''' function generates a visual plot of the confusion matrix, &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The output is seen in the '''plot''' window.&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the plot in '''plot window '''&lt;br /&gt;
|| Drag boundary to see the plot window clearly.&lt;br /&gt;
&lt;br /&gt;
Given the specific seed (set.seed=1), LDA has misclassified 33 out of 270 observations. &lt;br /&gt;
&lt;br /&gt;
This number may change for different sets of training data. &lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration.&lt;br /&gt;
|| Let us visualize how well our model separates different classes.&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
[RStudio]&lt;br /&gt;
&lt;br /&gt;
'''X &amp;lt;- seq(min(train_data$minorAL), max(train_data$minorAL), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Y &amp;lt;- seq(min(train_data$ecc), max(train_data$ecc), length.out = 100)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''min_max$predicted_class &amp;lt;- predict(LDA_model, newdata = min_max)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid &amp;lt;- expand.grid(minorAL = X, ecc = Y)'''&lt;br /&gt;
&lt;br /&gt;
'''grid$class &amp;lt;- predict(LDA_model, newdata = grid)$class'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''grid$classnum &amp;lt;- as.numeric(grid$class)'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on Save and Run buttons.&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This block of code operates as a setup for visual plotting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It consists of square grid coordinates in the range of training data and their predicted linear discriminants.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ''' seq ''' function generates a sequence of evenly spaced values within a range of smallest and largest values of 'minorAL' and 'ecc' variables from the training data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The''' 'grid' '''variable contains the generated data including the prediction of the LDA_model on it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''as.numeric''' function encodes the predicted classes labels into numeric values.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Point to the Environment tab.&lt;br /&gt;
|| Drag boundary to see the details in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These variables contain the data for the visualization of the linear discriminants.&lt;br /&gt;
&lt;br /&gt;
Click the '''grid''' '''data''' in the Environment tab.&lt;br /&gt;
&lt;br /&gt;
The '''grid data''' table is loaded in the '''Source''' window.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| [RStudio]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
|| In the '''Source''' window, type these commands.&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| Highlight the command &lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = min_max, aes(x = minorAL, y = ecc, color = predicted_class), size = 1, alpha = 0.3) +theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''ggplot() +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_raster(data=grid, aes(x=minorAL, y=ecc, fill = class),alpha=0.3) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_point(data = train_data, aes(x = minorAL, y = ecc, color = class), size = 2) +'''&lt;br /&gt;
&lt;br /&gt;
'''geom_contour(data= grid, aes(x=minorAL, y=ecc, z = classnum), colour=&amp;quot;black&amp;quot;, linewidth = 1.2) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_fill_manual(values = c(&amp;quot;#ffff46&amp;quot;, &amp;quot;#FF46e9&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''scale_color_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +'''&lt;br /&gt;
&lt;br /&gt;
'''labs(title = &amp;quot;LDA Decision Boundary&amp;quot;) +'''&lt;br /&gt;
&lt;br /&gt;
'''theme_minimal()'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select the commands and run them.&lt;br /&gt;
&lt;br /&gt;
|| This command creates the decision boundary plot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It plots the '''grid''' points with colors indicating the predicted classes. &lt;br /&gt;
&lt;br /&gt;
'''geom_raster '''creates a colour map indicating the predicted classes of the grid points&lt;br /&gt;
&lt;br /&gt;
'''geom_contour '''creates the decision boundary of the LDA.&lt;br /&gt;
&lt;br /&gt;
The '''scale_color_manual''' function assigns specific colors to the classes and so does '''scale_fill_manual''' function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The overall plot provides a visual representation of the decision boundary and the distribution of training data points of the '''model'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Select and run these commands.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Drag boundaries to see the plot window clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Point the output in the '''Plots '''window&lt;br /&gt;
|| We can see that our model has separated most of the data points clearly.&lt;br /&gt;
|- &lt;br /&gt;
|| Only Narration&lt;br /&gt;
|| With this we come to end of this tutorial.&lt;br /&gt;
&lt;br /&gt;
Let us summarize.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Summary'''&lt;br /&gt;
|| In this tutorial we have learnt:&lt;br /&gt;
&lt;br /&gt;
* Linear Discriminant Analysis ('''LDA''') and its implementation.&amp;amp;nbsp;&lt;br /&gt;
* Assumptions of LDA&lt;br /&gt;
* Limitations of LDA&lt;br /&gt;
* LDA on a subset of Raisin dataset&lt;br /&gt;
* Visualization of the '''LDA''' separator and its corresponding confusion matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| &lt;br /&gt;
|| Now we will suggest an assignment for this Spoken Tutorial.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
|| &lt;br /&gt;
* Perform LDA on inbuilt '''PlantGrowthdataset'''&lt;br /&gt;
* Evaluate the model using a confusion matrix and visualize the results&lt;br /&gt;
&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''About the Spoken Tutorial Project'''&lt;br /&gt;
|| The video at the following link summarizes the Spoken Tutorial project. &lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Workshops'''&lt;br /&gt;
|| We conduct workshops using Spoken Tutorials and give certificates.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Spoken Tutorial Forum to answer questions.'''&lt;br /&gt;
&lt;br /&gt;
Do you have questions in THIS Spoken Tutorial?&lt;br /&gt;
&lt;br /&gt;
Choose the minute and second where you have the question.Explain your question briefly.&lt;br /&gt;
&lt;br /&gt;
Someone from the FOSSEE team will answer them.&lt;br /&gt;
&lt;br /&gt;
Please visit this site.&lt;br /&gt;
|| Please post your timed queries in this forum.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Forum to answer questions'''&lt;br /&gt;
|| Do you have any general/technical questions?&lt;br /&gt;
&lt;br /&gt;
Please visit the forum given in the link.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Textbook Companion'''&lt;br /&gt;
|| The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.&lt;br /&gt;
&lt;br /&gt;
We give certificates to those who do this.&lt;br /&gt;
&lt;br /&gt;
For more details, please visit these sites.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgment'''&lt;br /&gt;
|| The '''Spoken Tutorial''' project was established by the Ministry of Education Govt of India.&lt;br /&gt;
|- &lt;br /&gt;
|| '''Show Slide'''&lt;br /&gt;
&lt;br /&gt;
'''Thank You'''&lt;br /&gt;
|| This tutorial is contributed by Yate Asseke Ronald and Debatosh Chakraborthy from IIT Bombay.&lt;br /&gt;
&lt;br /&gt;
Thank you for joining.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Ushav</name></author>	</entry>

	</feed>