Difference between revisions of "R/C2/Indexing-and-Slicing-Data-Frames/English-timed"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 283: Line 283:
  
 
|-
 
|-
|| 07: 18
+
|| 07:18
 
|| For this, we need to know the values in the first '''column'''.  
 
|| For this, we need to know the values in the first '''column'''.  
  

Revision as of 15:39, 22 May 2020


Time Narration
00:01 Welcome to the spoken tutorial on Indexing and Slicing Data Frames.
00:08 In this tutorial, we will learn how to:
00:12 Extract rows or columns from a data frame
00:17 Create a subset from a data frame
00:21 Retrieve data using double square brackets
00:25 To understand this tutorial, you should have knowledge about
00:30 Data frames in R
00:33 R script in RStudio and
00:37 How to set Working directory in RStudio
00:41 If not, please locate the relevant tutorials on R on this website.
00:48 This tutorial is recorded on
00:52 Ubuntu Linux OS version 16.04
00:57 R version 3.2.3
01:01 RStudio version 1.1.456
01:07 Install R version 3.2.0 or higher.
01:14 For this tutorial, we will use the data frame CaptaincyData.csv and a script file mydataframe.R.
01:25 Download these files from the Code files link of this tutorial.
01:31 I have downloaded and moved these files to the folder myProject on my Desktop.
01:39 I have also set this folder as my Working Directory.
01:45 Let us switch to Rstudio.
01:48 Let us open the script mydataframe.R in RStudio.
01:55 For this, click on the script mydataframe.R.
02:01 Script mydataframe.R opens in Rstudio.
02:07 Here, we have declared a variable captaincy to store and read CaptaincyData.csv.
02:16 Also, View function is being used to see the contents of the file.
02:23 Remember, you may also use equal to sign in place of less than symbol followed by hyphen.
02:32 However, we recommend less than symbol followed by hyphen.
02:38 For this, there is a shortcut in RStudio.
02:42 Suppose we want to assign a value of 2 to a variable named testvar.
02:49 In the Console window, type testvar and then press
02:54 Alt and -'(hyphen) keys simultaneously.
02:58 Then type 2 and press Enter.
03:01 Let us get back to the Source window.
03:06 Run the script mydataframe.R by clicking on the Source button.
03:13 Now let us extract the contents of the third row of the captaincy data frame.
03:20 Click on script mydataframe.R
03:25 In the Source window, type captaincy
03:29 Type capt and press Enter to make the variable captaincy autocomplete. Now within square brackets 3 followed by comma. And press Enter.
03:47 Press Enter key at the end of every command.
03:51 Remember one of the most important features of RStudio include intelligent auto-completion of function names, packages, and R objects.
04:04 We use a comma within square brackets when we wish to extract a row.
04:11 Save the script and execute the current line by pressing Ctrl+Enter keys.
04:20 The third row of the captaincy data frame is seen in the Console window.
04:26 Now, let us run the same command without a comma.
04:31 In the Source window, type captaincy then within square brackets 3.
04:39 Save the script and run this line only, as shown earlier.
04:46 The contents of the third column of the data frame, are displayed on the Console window.
04:54 So, to extract a column, we shouldn’t use a comma within the square brackets.
05:01 When we extract data using row number or column number, it is known as numeric indexing.
05:09 In the Source window, click on the captaincy data frame.
05:14 Let us now extract the contents of second and third rows of the captaincy data frame.
05:22 To retrieve more than one row, we use a numeric index vector. Click on the script mydataframe.R
05:32 In the Source window, type the following command and press Enter.
05:38 c function is being used to concatenate the second and third rows.
05:45 Save the script and execute the current line.
05:51 The second and third rows of captaincy data frame are seen in the Console window.
05:58 In the Source window, click on the captaincy data frame.
06:03 Now, we’ll find who has played 25 matches from the played column of captaincy data frame.
06:12 Extracting this type of information is known as logical indexing.
06:19 Click on the script mydataframe.R
06:23 In the Source window, type captaincy
06:27 Within square brackets captaincy dollar sign played equal to equal to 25 comma. Press Enter.
06:38 Remember, dollar sign allows you to extract elements by name.
06:44 Please note that there is no space between the two equal to signs.
06:50 Save the script and execute the current line.
06:55 The details of captain Dravid are shown in the Console window.
07:01 In the Source window, click on the captaincy data frame.
07:06 Now let us learn how to get the values of any particular attribute for all the players.
07:14 We will fetch the names of all the captains.
07:18 For this, we need to know the values in the first column.
07:23 Click on the script mydataframe.R
07:28 In the Source window, type captaincy and within square brackets 1.
07:35 Please note that I have not used a comma inside the square brackets.
07:41 Save the script and execute the current line.
07:46 The names of the captains are seen in the Console window.
07:51 Usually we use column names instead of column numbers.
07:57 To know the names of the captains, type captaincy.
08:01 Within square brackets inside double quotes names
08:07 Save the script and execute the current line.
08:12 Names of the captains are shown in the Console window.
08:17 Extracting data by column names is known as name indexing.
08:23 Clear the Console window by clicking on the broom icon.
08:28 In the Source window, click on the captaincy data frame
08:33 Now let us view the names of the captains along with the number of matches they have won.
08:40 Click on the script mydataframe.R
08:44 In the Source window, type the following command and press Enter.
08:52 Here c function is used to concatenate names and won.
08:59 Observe that names and won have been written within double quotes.
09:06 Save the script and execute this line.
09:11 The names of captains and the number of matches won are seen in the Console window.
09:18 Now let us extract a subset from captaincy data frame.
09:24 We will create a subset of captains, who have won more than 30% of their matches.
09:32 This is called slicing a data frame.
09:36 In the Source window, click on the captaincy data frame.
09:41 Please note that there is a column named victory in the captaincy data frame.
09:48 For required subset of captains, their victory should be greater than 0.3.
09:55 For this subset, we shall only show the
09:58 names
10:00 number of matches played
10:02 number of matches won
10:05 Click on the script mydataframe.R
10:09 I will resize the Source window.
10:13 In the Source window, type the following command
10:18 Now, press Enter after the comma followed by 0 point 3.
10:25 You can press Enter after a comma for better visibility.
10:30 The select parameter is used to select the required columns, names, played, and won.
10:38 In the Source window, type print within parentheses subData .
10:45 Save the script and run these two lines.
10:51 I am resizing the Console window to see the output properly.
10:57 The subData is shown in the Console window.
11:01 In the Source window, click on the captaincy data frame.
11:07 Finally, let us learn how to extract a particular entry from some column of a data frame.
11:16 We will extract the third value in the fourth column of the captaincy data frame.
11:23 Click on the script mydataframe.R.
11:26 In the Source window, type captaincy within double square brackets 4 within single square brackets 3.
11:37 Save the script and execute the current line.
11:42 The expected value 14 is seen in the Console window.
11:48 For more information on indexing and slicing data frames, please refer to the Additional materials section on this website.
11:58 Let us summarize what we have learnt.
12:03 In this tutorial, we have learnt how to:
12:07 Extract rows or columns from a data frame
12:12 Create a subset from a data frame
12:16 Retrieve data using double square brackets
12:20 We now suggest an assignment.
12:24 Create a subset from captaincy data frame with the captains who have played more than(>) 20 matches and lost less than(<) 14 matches.
12:36 The video at the following link summarises the Spoken Tutorial project.
12:41 Please download and watch it.
12:44 We conduct workshops using Spoken Tutorials and give Certificates.
12:50 Please contact us.
12:53 Please post your timed queries in this forum.
12:57 Please post your general queries in this forum.
13:02 The FOSSEE team coordinates the TBC project.
13:06 For more details, please visit these sites.
13:10 The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
13:16 The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
13:24 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh