R/C2/Indexing-and-Slicing-Data-Frames/English-timed
From Script | Spoken-Tutorial
Revision as of 16:26, 5 June 2020 by Sakinashaikh (Talk | contribs)
| Time | Narration |
| 00:01 | Welcome to the spoken tutorial on Indexing and Slicing Data Frames. |
| 00:08 | In this tutorial, we will learn how to: |
| 00:12 | Extract rows or columns from a data frame |
| 00:17 | Create a subset from a data frame |
| 00:21 | Retrieve data using double square brackets |
| 00:25 | To understand this tutorial, you should have knowledge about |
| 00:30 | Data frames in R |
| 00:33 | R script in RStudio and |
| 00:37 | How to set Working directory in RStudio |
| 00:41 | If not, please locate the relevant tutorials on R on this website. |
| 00:48 | This tutorial is recorded on |
| 00:52 | Ubuntu Linux OS version 16.04 |
| 00:57 | R version 3.2.3 |
| 01:01 | RStudio version 1.1.456 |
| 01:07 | Install R version 3.2.0 or higher. |
| 01:14 | For this tutorial, we will use the data frame CaptaincyData.csv and a script file mydataframe.R. |
| 01:25 | Download these files from the Code files link of this tutorial. |
| 01:31 | I have downloaded and moved these files to the folder myProject on my Desktop. |
| 01:39 | I have also set this folder as my Working Directory. |
| 01:45 | Let us switch to Rstudio. |
| 01:48 | Let us open the script mydataframe.R in RStudio. |
| 01:55 | For this, click on the script mydataframe.R. |
| 02:01 | Script mydataframe.R opens in Rstudio. |
| 02:07 | Here, we have declared a variable captaincy to store and read CaptaincyData.csv. |
| 02:16 | Also, View function is being used to see the contents of the file. |
| 02:23 | Remember, you may also use equal to sign in place of less than symbol followed by hyphen. |
| 02:32 | However, we recommend less than symbol followed by hyphen. |
| 02:38 | For this, there is a shortcut in RStudio. |
| 02:42 | Suppose we want to assign a value of 2 to a variable named testvar. |
| 02:49 | In the Console window, type testvar and then press Alt and -(hyphen) keys simultaneously. |
| 02:58 | Then type 2 and press Enter. |
| 03:01 | Let us get back to the Source window. |
| 03:06 | Run the script mydataframe.R by clicking on the Source button. |
| 03:13 | Now let us extract the contents of the third row of the captaincy data frame. |
| 03:20 | Click on script mydataframe.R |
| 03:25 | In the Source window, type captaincy |
| 03:29 | Type capt and press Enter to make the variable captaincy autocomplete. Now within square brackets 3 followed by comma. And press Enter. |
| 03:47 | Press Enter key at the end of every command. |
| 03:51 | Remember one of the most important features of RStudio include intelligent auto-completion of function names, packages, and R objects. |
| 04:04 | We use a comma within square brackets when we wish to extract a row. |
| 04:11 | Save the script and execute the current line by pressing Ctrl+Enter keys. |
| 04:20 | The third row of the captaincy data frame is seen in the Console window. |
| 04:26 | Now, let us run the same command without a comma. |
| 04:31 | In the Source window, type captaincy then within square brackets 3. |
| 04:39 | Save the script and run this line only, as shown earlier. |
| 04:46 | The contents of the third column of the data frame, are displayed on the Console window. |
| 04:54 | So, to extract a column, we shouldn’t use a comma within the square brackets. |
| 05:01 | When we extract data using row number or column number, it is known as numeric indexing. |
| 05:09 | In the Source window, click on the captaincy data frame. |
| 05:14 | Let us now extract the contents of second and third rows of the captaincy data frame. |
| 05:22 | To retrieve more than one row, we use a numeric index vector. Click on the script mydataframe.R |
| 05:32 | In the Source window, type the following command and press Enter. |
| 05:38 | c function is being used to concatenate the second and third rows. |
| 05:45 | Save the script and execute the current line. |
| 05:51 | The second and third rows of captaincy data frame are seen in the Console window. |
| 05:58 | In the Source window, click on the captaincy data frame. |
| 06:03 | Now, we’ll find who has played 25 matches from the played column of captaincy data frame. |
| 06:12 | Extracting this type of information is known as logical indexing. |
| 06:19 | Click on the script mydataframe.R |
| 06:23 | In the Source window, type captaincy |
| 06:27 | Within square brackets captaincy dollar sign played equal to equal to 25 comma. Press Enter. |
| 06:38 | Remember, dollar sign allows you to extract elements by name. |
| 06:44 | Please note that there is no space between the two equal to signs. |
| 06:50 | Save the script and execute the current line. |
| 06:55 | The details of captain Dravid are shown in the Console window. |
| 07:01 | In the Source window, click on the captaincy data frame. |
| 07:06 | Now let us learn how to get the values of any particular attribute for all the players. |
| 07:14 | We will fetch the names of all the captains. |
| 07:18 | For this, we need to know the values in the first column. |
| 07:23 | Click on the script mydataframe.R |
| 07:28 | In the Source window, type captaincy and within square brackets 1. |
| 07:35 | Please note that I have not used a comma inside the square brackets. |
| 07:41 | Save the script and execute the current line. |
| 07:46 | The names of the captains are seen in the Console window. |
| 07:51 | Usually we use column names instead of column numbers. |
| 07:57 | To know the names of the captains, type captaincy. |
| 08:01 | Within square brackets inside double quotes names |
| 08:07 | Save the script and execute the current line. |
| 08:12 | Names of the captains are shown in the Console window. |
| 08:17 | Extracting data by column names is known as name indexing. |
| 08:23 | Clear the Console window by clicking on the broom icon. |
| 08:28 | In the Source window, click on the captaincy data frame |
| 08:33 | Now let us view the names of the captains along with the number of matches they have won. |
| 08:40 | Click on the script mydataframe.R |
| 08:44 | In the Source window, type the following command and press Enter. |
| 08:52 | Here c function is used to concatenate names and won. |
| 08:59 | Observe that names and won have been written within double quotes. |
| 09:06 | Save the script and execute this line. |
| 09:11 | The names of captains and the number of matches won are seen in the Console window. |
| 09:18 | Now let us extract a subset from captaincy data frame. |
| 09:24 | We will create a subset of captains, who have won more than 30% of their matches. |
| 09:32 | This is called slicing a data frame. |
| 09:36 | In the Source window, click on the captaincy data frame. |
| 09:41 | Please note that there is a column named victory in the captaincy data frame. |
| 09:48 | For required subset of captains, their victory should be greater than 0.3. |
| 09:55 | For this subset, we shall only show the |
| 09:58 | names |
| 10:00 | number of matches played |
| 10:02 | number of matches won |
| 10:05 | Click on the script mydataframe.R |
| 10:09 | I will resize the Source window. |
| 10:13 | In the Source window, type the following command |
| 10:18 | Now, press Enter after the comma followed by 0 point 3. |
| 10:25 | You can press Enter after a comma for better visibility. |
| 10:30 | The select parameter is used to select the required columns, names, played, and won. |
| 10:38 | In the Source window, type print within parentheses subData . |
| 10:45 | Save the script and run these two lines. |
| 10:51 | I am resizing the Console window to see the output properly. |
| 10:57 | The subData is shown in the Console window. |
| 11:01 | In the Source window, click on the captaincy data frame. |
| 11:07 | Finally, let us learn how to extract a particular entry from some column of a data frame. |
| 11:16 | We will extract the third value in the fourth column of the captaincy data frame. |
| 11:23 | Click on the script mydataframe.R. |
| 11:26 | In the Source window, type captaincy within double square brackets 4 within single square brackets 3. |
| 11:37 | Save the script and execute the current line. |
| 11:42 | The expected value 14 is seen in the Console window. |
| 11:48 | For more information on indexing and slicing data frames, please refer to the Additional materials section on this website. |
| 11:58 | Let us summarize what we have learnt. |
| 12:03 | In this tutorial, we have learnt how to: |
| 12:07 | Extract rows or columns from a data frame |
| 12:12 | Create a subset from a data frame |
| 12:16 | Retrieve data using double square brackets |
| 12:20 | We now suggest an assignment. |
| 12:24 | Create a subset from captaincy data frame with the captains who have played more than(>) 20 matches and lost less than(<) 14 matches. |
| 12:36 | The video at the following link summarises the Spoken Tutorial project. |
| 12:41 | Please download and watch it. |
| 12:44 | We conduct workshops using Spoken Tutorials and give Certificates. |
| 12:50 | Please contact us. |
| 12:53 | Please post your timed queries in this forum. |
| 12:57 | Please post your general queries in this forum. |
| 13:02 | The FOSSEE team coordinates the TBC project. |
| 13:06 | For more details, please visit these sites. |
| 13:10 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
| 13:16 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
| 13:24 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |