R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English-timed
From Script | Spoken-Tutorial
Revision as of 20:35, 1 June 2020 by Sakinashaikh (Talk | contribs)
Time | Narration |
00:01 | Welcome to this tutorial on Plotting bar charts and scatter plot. |
00:08 | In this tutorial, we will learn how to: |
00:12 | Plot bar charts |
00:14 | Plot scatter plot |
00:18 | Find the correlation coefficient between two objects. |
00:22 | To understand this tutorial, you should know, |
00:27 | Data frames in R |
00:29 | Basics of Statistics |
00:33 | If not, please locate the relevant tutorials R on this website. |
00:40 | This tutorial is recorded on |
00:43 | Ubuntu Linux OS version 16.04 |
00:48 | R version 3.4.4 |
00:51 | RStudio version 1.1.463 |
00:57 | Install R version 3.2.0 or higher. |
01:02 | For this tutorial, we will use |
01:06 | A data frame moviesData.csv |
01:10 | A script file barPlots.R. |
01:15 | Please download these files from the Code files link of this tutorial. |
01:21 | I have downloaded and moved these files to Plots folder. |
01:28 | This folder is located in myProject folder on my Desktop. |
01:33 | I have also set Plots folder as my Working Directory. |
01:40 | Let us switch to Rstudio. |
01:42 | Open the script barPlots.R in RStudio. |
01:49 | Run this script by clicking on Source button. |
01:53 | movies data frame opens in the Source window. |
01:58 | It has 600 observations of 31 variables. |
02:04 | In the Source window, scroll from left to right. |
02:10 | This will enable us to see the remaining objects of movies data frame. |
02:17 | Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies. |
02:27 | A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable. |
02:37 | R uses the function barplot to create bar charts. |
02:42 | Let us switch to RStudio. |
02:45 | For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart. |
02:54 | Click on the script barPlots.R |
02:58 | In the Source window, type the following command. |
03:02 | Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
03:11 | Let me resize the Source window.
|
03:14 | moviesSub with 20 observations is loaded in the Environment. |
03:21 | Now, we draw a bar chart of imdb_rating for these movies. |
03:28 | In the Source window, type the following command. |
03:34 | Here, we have used the following arguments: |
03:39 | moviesSub dollar sign imdb underscore rating is the data for plotting |
03:46 | ylab and xlab for adding labels to the respective axes. |
03:53 | col to set the color of bins |
03:57 | ylim to set the range of values on Y-axis |
04:02 | main for adding a title to the bar chart. |
04:07 | Run the current line. |
04:09 | The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis. |
04:18 | In the Plots window, click on Zoom to maximize the plot. |
04:23 | This particular movie has an IMDB rating of approximately 6. |
04:31 | Similarly, this particular movie has an IMDB rating of approximately 8. |
04:39 | However, we do not know the name of the movies. |
04:44 | So, we will add more arguments in barplot function to show the names of movies on X-axis. |
04:52 | Close this plot. |
04:55 | In the Source window, type the following command. |
05:00 | Here, we have used the argument names.arg and set it to title. |
05:06 | Remember, title column in moviesSub contains the names of movies. |
05:13 | Run the current line. |
05:16 | In the Plots window, click on Zoom to maximize the plot. |
05:22 | Now, the names of movies are displayed on the X-axis. |
05:27 | But not for all movies. |
05:30 | This is due to the point that the names are too long to be accommodated. |
05:36 | That’s why, we will make these names perpendicular to X-axis. |
05:42 | Close this plot. |
05:44 | In the Source window, type the following command. |
05:49 | Here, we have used las argument. |
05:53 | las equal to 2 produces labels which are at right angles to the axis. |
06:01 | Run the current line. |
06:03 | In the Plots window, click on Zoom to maximize the plot. |
06:10 | Now the names for all the movies are displayed on X-axis. |
06:15 | For example, Filly Brown has an IMDB rating of approximately 6. |
06:23 | However, longer names are being truncated. |
06:28 | We can add more arguments to barplot function for adjusting labels. |
06:34 | For more information, please refer to the Additional Material section on this website. |
06:42 | Close this plot. |
06:44 | In the Source window, click on movies. |
06:47 | Let us analyze the relation between imdb underscore rating |
06:54 | and audience underscore score. |
06:58 | For this, we will draw a scatter plot with these two objects by using plot function. |
07:05 | Remember, we have already learnt how to plot a single object. |
07:11 | Scatter plot is a graph in which the values of two variables are plotted along two axes. |
07:18 | The pattern of the resulting points reveals the correlation. |
07:24 | Let us switch to RStudio. |
07:27 | In the Source window, click on the script barPlots.R |
07:32 | In the Source window, type the following command. |
07:39 | Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis. |
07:50 | As imdb underscore rating of any movie varies between 0 and 10, |
07:56 | we have set the range of values on X-axis from 0 to 10. |
08:02 | Similarly, we have set the range of values on Y-axis from 0 to 100. |
08:08 | Save the script and run the current line. |
08:13 | In the Plots window, click on Zoom to maximize the plot. |
08:18 | We can observe that the movies having higher imdb underscore rating has a high audience underscore score. |
08:28 | Close this plot. |
08:31 | Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score. |
08:42 | For this, we use cor function. |
08:46 | In the Source window, type the following command. |
08:50 | Save the script and run the current line. |
08:55 | The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865. |
09:08 | The value of correlation coefficient is always between -1 and +1. |
09:15 | A positive value indicates that the variables are positively related. |
09:21 | Let us summarize what we have learnt. |
09:25 | In this tutorial, we have learnt how to: |
09:29 | Plot bar charts |
09:31 | Plot scatter plot |
09:34 | Find the correlation coefficient between two objects |
09:39 | We now suggest an assignment. |
09:43 | Read the file moviesData.csv. |
09:48 | Create a bar chart of critics underscore score for the first 10 movies. |
09:55 | Create a scatter plot of imdb underscore rating and imdb underscore num underscore votes to see their relation. |
10:08 | Save both the plots. |
10:11 | The video at the following link summarises the Spoken Tutorial project. |
10:19 | We conduct workshops using Spoken Tutorials and give certificates. |
10:24 | Please contact us.
|
10:27 | Please post your timed queries in this forum. |
10:31 | Please post your general queries in this forum. |
10:35 | The FOSSEE team coordinates the TBC project. |
10:40 | For more details, please visit these sites. |
10:43 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
10:50 | The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai). |
10:58 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |