Difference between revisions of "R/C2/Indexing-and-Slicing-Data-Frames/English-timed"
From Script | Spoken-Tutorial
Sakinashaikh (Talk | contribs) |
Sakinashaikh (Talk | contribs) |
||
Line 198: | Line 198: | ||
|| 04:54 | || 04:54 | ||
|| So, to extract a '''column''', we shouldn’t use a '''comma''' within the square brackets. | || So, to extract a '''column''', we shouldn’t use a '''comma''' within the square brackets. | ||
− | |||
− | |||
|- | |- |
Revision as of 15:38, 22 May 2020
Time | Narration |
00:01 | Welcome to the spoken tutorial on Indexing and Slicing Data Frames. |
00:08 | In this tutorial, we will learn how to: |
00:12 | Extract rows or columns from a data frame |
00:17 | Create a subset from a data frame |
00:21 | Retrieve data using double square brackets |
00:25 | To understand this tutorial, you should have knowledge about |
00:30 | Data frames in R |
00:33 | R script in RStudio and |
00:37 | How to set Working directory in RStudio |
00:41 | If not, please locate the relevant tutorials on R on this website. |
00:48 | This tutorial is recorded on |
00:52 | Ubuntu Linux OS version 16.04 |
00:57 | R version 3.2.3 |
01:01 | RStudio version 1.1.456 |
01:07 | Install R version 3.2.0 or higher. |
01:14 | For this tutorial, we will use the data frame CaptaincyData.csv and a script file mydataframe.R. |
01:25 | Download these files from the Code files link of this tutorial. |
01:31 | I have downloaded and moved these files to the folder myProject on my Desktop. |
01:39 | I have also set this folder as my Working Directory. |
01:45 | Let us switch to Rstudio. |
01:48 | Let us open the script mydataframe.R in RStudio. |
01:55 | For this, click on the script mydataframe.R. |
02:01 | Script mydataframe.R opens in Rstudio. |
02:07 | Here, we have declared a variable captaincy to store and read CaptaincyData.csv. |
02:16 | Also, View function is being used to see the contents of the file. |
02:23 | Remember, you may also use equal to sign in place of less than symbol followed by hyphen. |
02:32 | However, we recommend less than symbol followed by hyphen. |
02:38 | For this, there is a shortcut in RStudio. |
02:42 | Suppose we want to assign a value of 2 to a variable named testvar. |
02:49 | In the Console window, type testvar and then press |
02:54 | Alt and -'(hyphen) keys simultaneously. |
02:58 | Then type 2 and press Enter. |
03:01 | Let us get back to the Source window. |
03:06 | Run the script mydataframe.R by clicking on the Source button. |
03:13 | Now let us extract the contents of the third row of the captaincy data frame. |
03:20 | Click on script mydataframe.R |
03:25 | In the Source window, type captaincy |
03:29 | Type capt and press Enter to make the variable captaincy autocomplete. Now within square brackets 3 followed by comma. And press Enter. |
03:47 | Press Enter key at the end of every command. |
03:51 | Remember one of the most important features of RStudio include intelligent auto-completion of function names, packages, and R objects. |
04:04 | We use a comma within square brackets when we wish to extract a row. |
04:11 | Save the script and execute the current line by pressing Ctrl+Enter keys. |
04:20 | The third row of the captaincy data frame is seen in the Console window. |
04:26 | Now, let us run the same command without a comma. |
04:31 | In the Source window, type captaincy then within square brackets 3. |
04:39 | Save the script and run this line only, as shown earlier. |
04:46 | The contents of the third column of the data frame, are displayed on the Console window. |
04:54 | So, to extract a column, we shouldn’t use a comma within the square brackets. |
05:01 | When we extract data using row number or column number, it is known as numeric indexing. |
05:09 | In the Source window, click on the captaincy data frame. |
05:14 | Let us now extract the contents of second and third rows of the captaincy data frame. |
05:22 | To retrieve more than one row, we use a numeric index vector. Click on the script mydataframe.R |
05:32 | In the Source window, type the following command and press Enter. |
05:38 | c function is being used to concatenate the second and third rows. |
05:45 | Save the script and execute the current line. |
05:51 | The second and third rows of captaincy data frame are seen in the Console window. |
05:58 | In the Source window, click on the captaincy data frame. |
06:03 | Now, we’ll find who has played 25 matches from the played column of captaincy data frame. |
06:12 | Extracting this type of information is known as logical indexing. |
06:19 | Click on the script mydataframe.R |
06:23 | In the Source window, type captaincy |
06:27 | Within square brackets captaincy dollar sign played equal to equal to 25 comma. Press Enter. |
06:38 | Remember, dollar sign allows you to extract elements by name. |
06:44 | Please note that there is no space between the two equal to signs. |
06:50 | Save the script and execute the current line. |
06:55 | The details of captain Dravid are shown in the Console window. |
07:01 | In the Source window, click on the captaincy data frame. |
07:06 | Now let us learn how to get the values of any particular attribute for all the players. |
07:14 | We will fetch the names of all the captains. |
07: 18 | For this, we need to know the values in the first column. |
07:23 | Click on the script mydataframe.R |
07:28 | In the Source window, type captaincy and within square brackets 1. |
07:35 | Please note that I have not used a comma inside the square brackets. |
07:41 | Save the script and execute the current line. |
07:46 | The names of the captains are seen in the Console window. |
07:51 | Usually we use column names instead of column numbers. |
07:57 | To know the names of the captains, type captaincy. |
08:01 | Within square brackets inside double quotes names |
08:07 | Save the script and execute the current line. |
08:12 | Names of the captains are shown in the Console window. |
08:17 | Extracting data by column names is known as name indexing. |
08:23 | Clear the Console window by clicking on the broom icon. |
08:28 | In the Source window, click on the captaincy data frame |
08:33 | Now let us view the names of the captains along with the number of matches they have won. |
08:40 | Click on the script mydataframe.R |
08:44 | In the Source window, type the following command and press Enter. |
08:52 | Here c function is used to concatenate names and won. |
08:59 | Observe that names and won have been written within double quotes. |
09:06 | Save the script and execute this line. |
09:11 | The names of captains and the number of matches won are seen in the Console window. |
09:18 | Now let us extract a subset from captaincy data frame. |
09:24 | We will create a subset of captains, who have won more than 30% of their matches. |
09:32 | This is called slicing a data frame. |
09:36 | In the Source window, click on the captaincy data frame. |
09:41 | Please note that there is a column named victory in the captaincy data frame. |
09:48 | For required subset of captains, their victory should be greater than 0.3. |
09:55 | For this subset, we shall only show the |
09:58 | names |
10:00 | number of matches played |
10:02 | number of matches won |
10:05 | Click on the script mydataframe.R |
10:09 | I will resize the Source window. |
10:13 | In the Source window, type the following command |
10:18 | Now, press Enter after the comma followed by 0 point 3. |
10:25 | You can press Enter after a comma for better visibility. |
10:30 | The select parameter is used to select the required columns, names, played, and won. |
10:38 | In the Source window, type print within parentheses subData . |
10:45 | Save the script and run these two lines. |
10:51 | I am resizing the Console window to see the output properly. |
10:57 | The subData is shown in the Console window. |
11:01 | In the Source window, click on the captaincy data frame. |
11:07 | Finally, let us learn how to extract a particular entry from some column of a data frame. |
11:16 | We will extract the third value in the fourth column of the captaincy data frame. |
11:23 | Click on the script mydataframe.R. |
11:26 | In the Source window, type captaincy within double square brackets 4 within single square brackets 3. |
11:37 | Save the script and execute the current line. |
11:42 | The expected value 14 is seen in the Console window. |
11:48 | For more information on indexing and slicing data frames, please refer to the Additional materials section on this website. |
11:58 | Let us summarize what we have learnt. |
12:03 | In this tutorial, we have learnt how to: |
12:07 | Extract rows or columns from a data frame |
12:12 | Create a subset from a data frame |
12:16 | Retrieve data using double square brackets |
12:20 | We now suggest an assignment. |
12:24 | Create a subset from captaincy data frame with the captains who have played more than(>) 20 matches and lost less than(<) 14 matches.
|
12:36 | The video at the following link summarises the Spoken Tutorial project. |
12:41 | Please download and watch it. |
12:44 | We conduct workshops using Spoken Tutorials and give Certificates. |
12:50 | Please contact us. |
12:53 | Please post your timed queries in this forum. |
12:57 | Please post your general queries in this forum. |
13:02 | The FOSSEE team coordinates the TBC project. |
13:06 | For more details, please visit these sites. |
13:10 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
13:16 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
13:24 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |