Difference between revisions of "Linux-AWK/C2/Built-in-Variables-in-awk/English-timed"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
Line 6: Line 6:
 
|-
 
|-
 
| 00:01
 
| 00:01
| Welcome to the spoken tutorial on '''awk built-in variables''' and '''awk script.'''
+
| Welcome to the '''spoken tutorial''' on '''awk built-in variables''' and '''awk script.'''
  
 
|-
 
|-
 
|00:07
 
|00:07
| In this tutorial we will learn about '''Built-in variables ''',  '''awk script'''
+
| In this tutorial, we will learn about '''Built-in variables ''',  '''awk script'''.
  
 
|-
 
|-
Line 18: Line 18:
 
|-
 
|-
 
| 00:17
 
| 00:17
| To record this tutorial, I am using  '''Ubuntu Linux 16.04 Operating System '''and  '''gedit text editor''' 3.20.1
+
| To record this tutorial, I am using:  
 +
'''Ubuntu Linux 16.04 Operating System '''and  '''gedit text editor''' 3.20.1
  
 
|-
 
|-
Line 72: Line 73:
 
|-
 
|-
 
|01:44
 
|01:44
| When we are processing this '''awkdemo''' file with '''awk''' command, this becomes our '''input '''file.
+
| When we are processing this '''awkdemo''' file with ''''awk' command''', this becomes our '''input '''file.
  
 
|-
 
|-
 
| 01:51
 
| 01:51
| Observe that all the records are separated from each other by a '''newline character'''.  
+
| Observe that all the '''record'''s are separated from each other by a '''newline character'''.  
  
 
|-
 
|-
Line 86: Line 87:
 
|-
 
|-
 
| 02:08
 
| 02:08
| Notice that all the fields are separated by the '''pipe symbol'''.  
+
| Notice that all the '''field'''s are separated by the '''pipe''' symbol.  
  
 
How can we inform '''awk '''about it?  
 
How can we inform '''awk '''about it?  
Line 94: Line 95:
 
|-
 
|-
 
| 02:18
 
| 02:18
| By default, any number of '''spaces''' or a '''tabs''' separate the fields.
+
| By default, any number of '''space'''s or '''tab'''s separate the '''field'''s.
  
 
|-
 
|-
 
|02:24
 
|02:24
| We can reset this with the help of '''hyphen capital F''' option as learnt in our earlier tutorials.
+
| We can '''reset''' this with the help of '''hyphen capital F''' option as learnt in our earlier tutorials.
  
 
|-
 
|-
 
|02:33
 
|02:33
| Or else, we can reset this in the '''BEGIN section '''with the use of '''FS''' '''variable'''.  
+
| Or else, we can reset this in the '''BEGIN''' section with the use of '''FS''' '''variable'''.  
  
 
|-
 
|-
Line 116: Line 117:
 
|-
 
|-
 
| 02:57
 
| 02:57
|  Go to the folder in which you downloaded and extracted the '''Code Files''' using '''cd command.'''
+
|  Go to the '''folder''' in which you downloaded and '''extract'''ed the '''Code Files''' using '''cd command.'''
  
 
|-
 
|-
Line 124: Line 125:
 
|-
 
|-
 
| 03:08
 
| 03:08
| Here in the '''BEGIN''' section, we have assigned the value of '''FS''' as a '''pipe symbol.'''
+
| Here, in the '''BEGIN''' section, we have assigned the value of '''FS''' as a '''pipe''' symbol.
  
 
Similarly, we can modify '''RS variable.'''
 
Similarly, we can modify '''RS variable.'''
Line 130: Line 131:
 
|-
 
|-
 
| 03:19
 
| 03:19
| Press '''Enter''' to execute the command.
+
| Press '''Enter''' to '''execute''' the '''command'''.
  
 
|-
 
|-
Line 138: Line 139:
 
|-
 
|-
 
| 03:30
 
| 03:30
| Here the '''name '''field and the '''stipend '''field is separated by a blank '''space'''.
+
| Here, the '''name '''field and the '''stipend '''field are separated by a blank '''space'''.
  
 
|-
 
|-
 
|03:36
 
|03:36
| Also, all the records are separated by a '''newline character.'''
+
| Also, all the '''record'''s are separated by a '''newline character.'''
  
 
|-
 
|-
 
| 03:42
 
| 03:42
| Suppose we want '''colon '''as the '''output field separator.'''
+
| Suppose we want '''colon '''as the '''output field separator'''
  
And double '''newline '''as '''output record separator'''.
+
and double '''newline '''as '''output record separator'''.
  
 
|-
 
|-
Line 160: Line 161:
 
|-
 
|-
 
| 04:01
 
| 04:01
| Modify the command as shown here.
+
| Modify the command as shown here  
  
And then press '''Enter.'''
+
and then press '''Enter.'''
  
 
|-
 
|-
Line 178: Line 179:
 
|-
 
|-
 
| 04:27
 
| 04:27
| How can we extract the roll no. and name information from this file?
+
| How can we '''extract''' the '''roll no.''' and '''name''' information from this file?
  
 
|-
 
|-
Line 194: Line 195:
 
|-
 
|-
 
|04:47
 
|04:47
|  Capital '''NR''' gives  the '''Number of Records''' processed by '''awk'''
+
|  Capital '''NR''' gives  the '''Number of Records''' processed by '''awk'''.
  
 
|-
 
|-
 
|04:53
 
|04:53
|  Capital '''NF''' gives the '''Number of Fields '''in the current record
+
|  Capital '''NF''' gives the '''Number of Fields '''in the current record.
  
 
|-
 
|-
Line 212: Line 213:
 
|-
 
|-
 
| 05:13
 
| 05:13
| Switch to the '''terminal'''. Let me clear the terminal using '''Ctrl''' and '''L''' keys
+
| Switch to the '''terminal'''. Let me clear the terminal using '''Ctrl''' and '''L''' keys.
  
 
|-
 
|-
Line 220: Line 221:
 
|-
 
|-
 
| 05:24
 
| 05:24
| As the fields are separated by '''pipe '''symbol, set the '''FS''' value to '''pipe''' symbol in the '''BEGIN section.'''
+
| As the fields are separated by '''pipe '''symbol, set the '''FS''' value to '''pipe''' symbol in the '''BEGIN''' section.
  
 
|-
 
|-
Line 228: Line 229:
 
|-
 
|-
 
| 05:37
 
| 05:37
| This checks whether the number of fields in the current line, is not equal to 6.
+
| This checks whether the number of fields in the current line is not equal to 6.
  
 
|-
 
|-
 
| 05:43
 
| 05:43
| If true, then  '''print section '''will print the record’s line number '''NR''', along with the entire line denoted by '''$0'''.
+
| If true, then  '''print''' section will print the record’s line number '''NR''', along with the entire line denoted by '''$0'''.
  
 
Press '''Enter'''.
 
Press '''Enter'''.
Line 238: Line 239:
 
|-
 
|-
 
| 05:55
 
| 05:55
| In the output, we can see that record number 16 is the incomplete record.  
+
| In the '''output''', we can see that record number 16 is the incomplete record.  
  
 
It has only 5 '''fields '''instead of 6.
 
It has only 5 '''fields '''instead of 6.
Line 274: Line 275:
 
| We want to print the first 3 lines from each of these two files.
 
| We want to print the first 3 lines from each of these two files.
  
We can do this using '''NR variable'''.
+
We can do this using '''NR''' variable.
  
 
|-
 
|-
Line 282: Line 283:
 
|-
 
|-
 
| 07:02
 
| 07:02
| Now to display the first 3 lines from each file, type the following command on the '''terminal.'''
+
| Now, to display the first 3 lines from each file, type the following command on the '''terminal.'''
  
 
|-
 
|-
Line 308: Line 309:
 
|-
 
|-
 
| 07:39
 
| 07:39
| It is reinitialized to zero each time a new input file is started.
+
| It is re-initialized to zero each time a new input file is started.
  
 
|-
 
|-
Line 332: Line 333:
 
|-
 
|-
 
| 08:14
 
| 08:14
| In the '''Print section,''' next to '''NR,''' type '''FNR'''.  Press '''Enter.'''
+
| In the '''Print''' section, next to '''NR,''' type '''FNR'''.  Press '''Enter.'''
  
 
|-
 
|-
Line 376: Line 377:
 
| Here we have used '''space '''as a '''string concatenation operator.'''
 
| Here we have used '''space '''as a '''string concatenation operator.'''
  
Press '''Enter''' to execute the command.
+
Press '''Enter''' to '''execute''' the '''command'''.
  
 
|-
 
|-
Line 390: Line 391:
 
|-
 
|-
 
| 09:48
 
| 09:48
| Clear the '''terminal'''
+
| Clear the '''terminal'''.
  
 
Press the '''up arrow '''key to get the previously executed command.
 
Press the '''up arrow '''key to get the previously executed command.
Line 422: Line 423:
 
|-
 
|-
 
| 10:36
 
| 10:36
| In the '''terminal''' type the following command  
+
| In the '''terminal''', type the following command.
  
 
Press '''Enter'''.
 
Press '''Enter'''.
Line 430: Line 431:
 
| We can see that only one student has passed and gets stipend more than Rs.8000.
 
| We can see that only one student has passed and gets stipend more than Rs.8000.
  
And the record number is 2.
+
And, the record number is 2.
  
 
|-
 
|-
Line 468: Line 469:
 
|-
 
|-
 
| 11:42
 
| 11:42
|This code is also available in the '''Code Files''' link.
+
|This '''code''' is also available in the '''Code Files''' link.
  
 
|-
 
|-
Line 474: Line 475:
 
| Switch to the '''terminal'''.
 
| Switch to the '''terminal'''.
  
See what have we written inside '''single quotes''' of the '''command '''last executed?
+
See, what have we written inside '''single quotes''' of the '''command '''last executed?
  
 
|-
 
|-
Line 488: Line 489:
 
| To execute the file, type the following on the '''terminal-'''
 
| To execute the file, type the following on the '''terminal-'''
  
'''awk space hyphen small f space prog1.awk space awkdemo.txt '''and press''' Enter'''
+
'''awk space hyphen small f space prog1.awk space awkdemo.txt '''and press''' Enter'''.
  
 
|-
 
|-
Line 506: Line 507:
 
|-
 
|-
 
| 12:40
 
| 12:40
| In this tutorial we learnt about-  '''Built-in variables'''
+
| In this tutorial we learnt about-   
 
+
'''Built-in variables''',
'''awk script'''
+
  
 +
'''awk script'''
 
using various examples.
 
using various examples.
  
 
|-
 
|-
 
| 12:48
 
| 12:48
| As an assignment- Write an '''awk''' script to print the last field of the 5th line in '''awkdemo.txt '''file.
+
| As an assignment-  
 +
write an '''awk''' script to print the last field of the 5th line in '''awkdemo.txt '''file.
  
 
|-
 
|-
Line 538: Line 540:
 
|-
 
|-
 
| 13:28
 
| 13:28
| The video at the following link summarises the Spoken Tutorial project.
+
| The video at the following link summarises the '''Spoken Tutorial''' project.
  
 
Please download and watch it.
 
Please download and watch it.
Line 554: Line 556:
 
|-
 
|-
 
| 13:51
 
| 13:51
| Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
+
| Spoken Tutorial Project is funded by '''NMEICT, MHRD''', Government of India.
  
 
More information on this mission is available at this link.
 
More information on this mission is available at this link.
Line 560: Line 562:
 
|-
 
|-
 
| 14:03
 
| 14:03
| The script has been contributed by Antara. And this is Praveen from IIT Bombay signing off.
+
| The script has been contributed by Antara. And this is Praveen from '''IIT Bombay''', signing off.
  
 
Thanks for joining.
 
Thanks for joining.
  
 
|}
 
|}

Latest revision as of 11:30, 10 July 2019

Time
Narration
00:01 Welcome to the spoken tutorial on awk built-in variables and awk script.
00:07 In this tutorial, we will learn about Built-in variables , awk script.
00:14 We will do this through some examples.
00:17 To record this tutorial, I am using:

Ubuntu Linux 16.04 Operating System and gedit text editor 3.20.1

00:30 The files used in this tutorial are available in the Code Files link on this tutorial page.

Please download and use them.

00:40 To practice this tutorial, you should have gone through the earlier awk tutorials on this website.
00:47 If not, then please go through the corresponding tutorials on this website.
00:52 First, let us see some of the built-in variables in awk.
00:57 Capital RS specifies the record separator in an input file. By default, it is newline.
01:07 Capital FS specifies the field separator in an input file.
01:13 By default, the value of FS is a whitespace.
01:18 Capital ORS defines the output record separator.

By default, it is newline.

01:27 Capital OFS defines the output field separator.

By default, it is whitespace.

01:36 Let us understand the meaning of each of these.
01:40 Let us have a look at the awkdemo file now.
01:44 When we are processing this awkdemo file with 'awk' command, this becomes our input file.
01:51 Observe that all the records are separated from each other by a newline character.
01:58 newline is the default value for record separator RS variable.

So, there is no need to do anything else.

02:08 Notice that all the fields are separated by the pipe symbol.

How can we inform awk about it?

Let us see.

02:18 By default, any number of spaces or tabs separate the fields.
02:24 We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials.
02:33 Or else, we can reset this in the BEGIN section with the use of FS variable.
02:40 Let us do this through an example.

Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000.

02:51 Open the terminal by pressing CTRL, ALT and T keys.
02:57 Go to the folder in which you downloaded and extracted the Code Files using cd command.
03:04 Type the command as shown here.
03:08 Here, in the BEGIN section, we have assigned the value of FS as a pipe symbol.

Similarly, we can modify RS variable.

03:19 Press Enter to execute the command.
03:23 The output shows the list of students who are receiving more than Rs.5000 as a stipend.
03:30 Here, the name field and the stipend field are separated by a blank space.
03:36 Also, all the records are separated by a newline character.
03:42 Suppose we want colon as the output field separator

and double newline as output record separator.

03:52 How can we do this? Let us see.
03:55 In the terminal, press the up arrow key to get the previously executed command.
04:01 Modify the command as shown here

and then press Enter.

04:08 We get the output in the desired format.
04:12 Now, suppose our new input file is sample.txt.
04:18 Observe that the field separator here is newline and record separator is double newline.
04:27 How can we extract the roll no. and name information from this file?
04:32 Yes, you have guessed correctly. We have to modify both the FS and RS variables.
04:39 Pause this tutorial and do this as an assignment.
04:43 Next, let us see other built-in variables.
04:47 Capital NR gives the Number of Records processed by awk.
04:53 Capital NF gives the Number of Fields in the current record.
04:59 Let us see one example on this.

Suppose, we want to find incomplete lines in the file.

05:07 Here, incomplete line means it has less than the normal 6 fields.
05:13 Switch to the terminal. Let me clear the terminal using Ctrl and L keys.
05:20 Type the command as shown.
05:24 As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section.
05:33 Next we have written NF not equal to 6.
05:37 This checks whether the number of fields in the current line is not equal to 6.
05:43 If true, then print section will print the record’s line number NR, along with the entire line denoted by $0.

Press Enter.

05:55 In the output, we can see that record number 16 is the incomplete record.

It has only 5 fields instead of 6.

06:05 Let us see one more example.

How can we print the first and last field for each student regardless of how many fields there are?

06:16 Type the command as shown here on the terminal.
06:21 Here we have used hyphen capital F option instead of setting FS variable.

Press Enter.

06:30 We get only the first and the last fields for each record in the file.
06:36 Let’s try something else now.
06:39 Suppose, the student records are distributed across two files demo1.txt, demo2.txt.
06:48 We want to print the first 3 lines from each of these two files.

We can do this using NR variable.

06:57 Here are the contents of the two files.
07:02 Now, to display the first 3 lines from each file, type the following command on the terminal.
07:11 Press Enter.
07:13 The output shows only the first 3 records of demo1.txt file.
07:20 How can we print the same for the second file also?
07:24 The solution is to use FNR instead of NR.

FNR is the current record number in the current file.

07:34 FNR is incremented each time a new record is read.
07:39 It is re-initialized to zero each time a new input file is started.
07:46 But NR is the number of input records awk has processed since the starting of the program's execution.
07:55 It does not reset to zero with a new file.
07:59 Switch to the terminal.

Press the up arrow key to get the previously executed command.

08:06 Modify the previous command as follows.

Type FNR instead of NR.

08:14 In the Print section, next to NR, type FNR. Press Enter.
08:21 See, we get the correct output now.

FNR is set to zero with new file but NR keeps on increasing.

08:31 Let us now look at some other built-in variables.

FILENAME variable gives the name of the file being read.

08:40 ARGC specifies the number of arguments provided at the command line.
08:46 ARGV represents an array that stores the command line arguments.
08:52 ENVIRON specifies the array of the shell environment variables and corresponding values.
09:00 As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials.
09:09 Let us have a look at the variable FILENAME now.

How can we print the name of the current file being processed?

09:18 Switch to the terminal and type the command as shown.
09:23 Here we have used space as a string concatenation operator.

Press Enter to execute the command.

09:32 The output shows the input filename multiple times.
09:37 This is because, this command prints the filename once for each row in the awkdemo.txt file.

How can we print this only once?

09:48 Clear the terminal.

Press the up arrow key to get the previously executed command.

09:55 Modify the previous command as shown here.

Press Enter.

10:02 Now, We get the filename only once.
10:06 There are some other built-in variables in awk.

Please browse the internet to know more on them.

10:14 Suppose, we want to find the students who have passed and have stipend more than Rs.8000
10:22 use comma as the output field separator and print “The data is shown for file” and the name of file in the footer section.

How can we do this?

10:36 In the terminal, type the following command.

Press Enter.

10:43 We can see that only one student has passed and gets stipend more than Rs.8000.

And, the record number is 2.

10:53 We can also see the name of the file in the footer, as desired.
10:58 We can use awk for more and more complex tasks.
11:03 In that case, it becomes more difficult to write the commands every time on the terminal.
11:09 We can instead write the awk program in a separate file.
11:14 To be executable, that file should have the dot awk extension.
11:19 While executing, we can just specify this awk program filename with the awk command.
11:26 For doing so, we need to use hyphen small f option.

Let us see an example.

11:35 I have already written an awk program and saved it as prog1 dot awk.
11:42 This code is also available in the Code Files link.
11:46 Switch to the terminal.

See, what have we written inside single quotes of the command last executed?

11:55 Content of prog1.awk file is exactly the same.
12:00 The only difference is that in the awk file, we have not written inside the single quotes.
12:07 To execute the file, type the following on the terminal-

awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter.

12:24 We are getting exactly the same output as we have seen before.
12:29 So, this way you can write awk programs and use it multiple times.
12:35 This brings us to the end of this tutorial.

Let us summarize.

12:40 In this tutorial we learnt about-

Built-in variables,

awk script using various examples.

12:48 As an assignment-

write an awk script to print the last field of the 5th line in awkdemo.txt file.

12:58 Open the system file /etc/passwd on the terminal.
13:05 Identify all the separators therein.
13:09 Now write a script to process the file from the 20th line onwards.
13:15 That too, only for the lines that contain more than 6 fields.
13:20 You should print the line number, entire line and count of fields in that particular line.
13:28 The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

13:36 The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates.

For more details, please write to us.

13:47 Please post your timed queries in this Forum.
13:51 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this mission is available at this link.

14:03 The script has been contributed by Antara. And this is Praveen from IIT Bombay, signing off.

Thanks for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14