Linux-AWK/C2/Built-in-Variables-in-awk/English-timed

From Script | Spoken-Tutorial
Revision as of 11:30, 10 July 2019 by Sandhya.np14 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Time
Narration
00:01 Welcome to the spoken tutorial on awk built-in variables and awk script.
00:07 In this tutorial, we will learn about Built-in variables , awk script.
00:14 We will do this through some examples.
00:17 To record this tutorial, I am using:

Ubuntu Linux 16.04 Operating System and gedit text editor 3.20.1

00:30 The files used in this tutorial are available in the Code Files link on this tutorial page.

Please download and use them.

00:40 To practice this tutorial, you should have gone through the earlier awk tutorials on this website.
00:47 If not, then please go through the corresponding tutorials on this website.
00:52 First, let us see some of the built-in variables in awk.
00:57 Capital RS specifies the record separator in an input file. By default, it is newline.
01:07 Capital FS specifies the field separator in an input file.
01:13 By default, the value of FS is a whitespace.
01:18 Capital ORS defines the output record separator.

By default, it is newline.

01:27 Capital OFS defines the output field separator.

By default, it is whitespace.

01:36 Let us understand the meaning of each of these.
01:40 Let us have a look at the awkdemo file now.
01:44 When we are processing this awkdemo file with 'awk' command, this becomes our input file.
01:51 Observe that all the records are separated from each other by a newline character.
01:58 newline is the default value for record separator RS variable.

So, there is no need to do anything else.

02:08 Notice that all the fields are separated by the pipe symbol.

How can we inform awk about it?

Let us see.

02:18 By default, any number of spaces or tabs separate the fields.
02:24 We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials.
02:33 Or else, we can reset this in the BEGIN section with the use of FS variable.
02:40 Let us do this through an example.

Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000.

02:51 Open the terminal by pressing CTRL, ALT and T keys.
02:57 Go to the folder in which you downloaded and extracted the Code Files using cd command.
03:04 Type the command as shown here.
03:08 Here, in the BEGIN section, we have assigned the value of FS as a pipe symbol.

Similarly, we can modify RS variable.

03:19 Press Enter to execute the command.
03:23 The output shows the list of students who are receiving more than Rs.5000 as a stipend.
03:30 Here, the name field and the stipend field are separated by a blank space.
03:36 Also, all the records are separated by a newline character.
03:42 Suppose we want colon as the output field separator

and double newline as output record separator.

03:52 How can we do this? Let us see.
03:55 In the terminal, press the up arrow key to get the previously executed command.
04:01 Modify the command as shown here

and then press Enter.

04:08 We get the output in the desired format.
04:12 Now, suppose our new input file is sample.txt.
04:18 Observe that the field separator here is newline and record separator is double newline.
04:27 How can we extract the roll no. and name information from this file?
04:32 Yes, you have guessed correctly. We have to modify both the FS and RS variables.
04:39 Pause this tutorial and do this as an assignment.
04:43 Next, let us see other built-in variables.
04:47 Capital NR gives the Number of Records processed by awk.
04:53 Capital NF gives the Number of Fields in the current record.
04:59 Let us see one example on this.

Suppose, we want to find incomplete lines in the file.

05:07 Here, incomplete line means it has less than the normal 6 fields.
05:13 Switch to the terminal. Let me clear the terminal using Ctrl and L keys.
05:20 Type the command as shown.
05:24 As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section.
05:33 Next we have written NF not equal to 6.
05:37 This checks whether the number of fields in the current line is not equal to 6.
05:43 If true, then print section will print the record’s line number NR, along with the entire line denoted by $0.

Press Enter.

05:55 In the output, we can see that record number 16 is the incomplete record.

It has only 5 fields instead of 6.

06:05 Let us see one more example.

How can we print the first and last field for each student regardless of how many fields there are?

06:16 Type the command as shown here on the terminal.
06:21 Here we have used hyphen capital F option instead of setting FS variable.

Press Enter.

06:30 We get only the first and the last fields for each record in the file.
06:36 Let’s try something else now.
06:39 Suppose, the student records are distributed across two files demo1.txt, demo2.txt.
06:48 We want to print the first 3 lines from each of these two files.

We can do this using NR variable.

06:57 Here are the contents of the two files.
07:02 Now, to display the first 3 lines from each file, type the following command on the terminal.
07:11 Press Enter.
07:13 The output shows only the first 3 records of demo1.txt file.
07:20 How can we print the same for the second file also?
07:24 The solution is to use FNR instead of NR.

FNR is the current record number in the current file.

07:34 FNR is incremented each time a new record is read.
07:39 It is re-initialized to zero each time a new input file is started.
07:46 But NR is the number of input records awk has processed since the starting of the program's execution.
07:55 It does not reset to zero with a new file.
07:59 Switch to the terminal.

Press the up arrow key to get the previously executed command.

08:06 Modify the previous command as follows.

Type FNR instead of NR.

08:14 In the Print section, next to NR, type FNR. Press Enter.
08:21 See, we get the correct output now.

FNR is set to zero with new file but NR keeps on increasing.

08:31 Let us now look at some other built-in variables.

FILENAME variable gives the name of the file being read.

08:40 ARGC specifies the number of arguments provided at the command line.
08:46 ARGV represents an array that stores the command line arguments.
08:52 ENVIRON specifies the array of the shell environment variables and corresponding values.
09:00 As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials.
09:09 Let us have a look at the variable FILENAME now.

How can we print the name of the current file being processed?

09:18 Switch to the terminal and type the command as shown.
09:23 Here we have used space as a string concatenation operator.

Press Enter to execute the command.

09:32 The output shows the input filename multiple times.
09:37 This is because, this command prints the filename once for each row in the awkdemo.txt file.

How can we print this only once?

09:48 Clear the terminal.

Press the up arrow key to get the previously executed command.

09:55 Modify the previous command as shown here.

Press Enter.

10:02 Now, We get the filename only once.
10:06 There are some other built-in variables in awk.

Please browse the internet to know more on them.

10:14 Suppose, we want to find the students who have passed and have stipend more than Rs.8000
10:22 use comma as the output field separator and print “The data is shown for file” and the name of file in the footer section.

How can we do this?

10:36 In the terminal, type the following command.

Press Enter.

10:43 We can see that only one student has passed and gets stipend more than Rs.8000.

And, the record number is 2.

10:53 We can also see the name of the file in the footer, as desired.
10:58 We can use awk for more and more complex tasks.
11:03 In that case, it becomes more difficult to write the commands every time on the terminal.
11:09 We can instead write the awk program in a separate file.
11:14 To be executable, that file should have the dot awk extension.
11:19 While executing, we can just specify this awk program filename with the awk command.
11:26 For doing so, we need to use hyphen small f option.

Let us see an example.

11:35 I have already written an awk program and saved it as prog1 dot awk.
11:42 This code is also available in the Code Files link.
11:46 Switch to the terminal.

See, what have we written inside single quotes of the command last executed?

11:55 Content of prog1.awk file is exactly the same.
12:00 The only difference is that in the awk file, we have not written inside the single quotes.
12:07 To execute the file, type the following on the terminal-

awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter.

12:24 We are getting exactly the same output as we have seen before.
12:29 So, this way you can write awk programs and use it multiple times.
12:35 This brings us to the end of this tutorial.

Let us summarize.

12:40 In this tutorial we learnt about-

Built-in variables,

awk script using various examples.

12:48 As an assignment-

write an awk script to print the last field of the 5th line in awkdemo.txt file.

12:58 Open the system file /etc/passwd on the terminal.
13:05 Identify all the separators therein.
13:09 Now write a script to process the file from the 20th line onwards.
13:15 That too, only for the lines that contain more than 6 fields.
13:20 You should print the line number, entire line and count of fields in that particular line.
13:28 The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

13:36 The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates.

For more details, please write to us.

13:47 Please post your timed queries in this Forum.
13:51 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this mission is available at this link.

14:03 The script has been contributed by Antara. And this is Praveen from IIT Bombay, signing off.

Thanks for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14