Linux-AWK/C2/Built-in-Variables-in-awk/English-timed
|
|
00:01 | Welcome to the spoken tutorial on awk built-in variables and awk script. |
00:07 | In this tutorial, we will learn about Built-in variables , awk script. |
00:14 | We will do this through some examples. |
00:17 | To record this tutorial, I am using:
Ubuntu Linux 16.04 Operating System and gedit text editor 3.20.1 |
00:30 | The files used in this tutorial are available in the Code Files link on this tutorial page.
Please download and use them. |
00:40 | To practice this tutorial, you should have gone through the earlier awk tutorials on this website. |
00:47 | If not, then please go through the corresponding tutorials on this website. |
00:52 | First, let us see some of the built-in variables in awk. |
00:57 | Capital RS specifies the record separator in an input file. By default, it is newline. |
01:07 | Capital FS specifies the field separator in an input file. |
01:13 | By default, the value of FS is a whitespace. |
01:18 | Capital ORS defines the output record separator.
By default, it is newline. |
01:27 | Capital OFS defines the output field separator.
By default, it is whitespace. |
01:36 | Let us understand the meaning of each of these. |
01:40 | Let us have a look at the awkdemo file now. |
01:44 | When we are processing this awkdemo file with 'awk' command, this becomes our input file. |
01:51 | Observe that all the records are separated from each other by a newline character. |
01:58 | newline is the default value for record separator RS variable.
So, there is no need to do anything else. |
02:08 | Notice that all the fields are separated by the pipe symbol.
How can we inform awk about it? Let us see. |
02:18 | By default, any number of spaces or tabs separate the fields. |
02:24 | We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials. |
02:33 | Or else, we can reset this in the BEGIN section with the use of FS variable. |
02:40 | Let us do this through an example.
Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000. |
02:51 | Open the terminal by pressing CTRL, ALT and T keys. |
02:57 | Go to the folder in which you downloaded and extracted the Code Files using cd command. |
03:04 | Type the command as shown here. |
03:08 | Here, in the BEGIN section, we have assigned the value of FS as a pipe symbol.
Similarly, we can modify RS variable. |
03:19 | Press Enter to execute the command. |
03:23 | The output shows the list of students who are receiving more than Rs.5000 as a stipend. |
03:30 | Here, the name field and the stipend field are separated by a blank space. |
03:36 | Also, all the records are separated by a newline character. |
03:42 | Suppose we want colon as the output field separator
and double newline as output record separator. |
03:52 | How can we do this? Let us see. |
03:55 | In the terminal, press the up arrow key to get the previously executed command. |
04:01 | Modify the command as shown here
and then press Enter. |
04:08 | We get the output in the desired format. |
04:12 | Now, suppose our new input file is sample.txt. |
04:18 | Observe that the field separator here is newline and record separator is double newline. |
04:27 | How can we extract the roll no. and name information from this file? |
04:32 | Yes, you have guessed correctly. We have to modify both the FS and RS variables. |
04:39 | Pause this tutorial and do this as an assignment. |
04:43 | Next, let us see other built-in variables. |
04:47 | Capital NR gives the Number of Records processed by awk. |
04:53 | Capital NF gives the Number of Fields in the current record. |
04:59 | Let us see one example on this.
Suppose, we want to find incomplete lines in the file. |
05:07 | Here, incomplete line means it has less than the normal 6 fields. |
05:13 | Switch to the terminal. Let me clear the terminal using Ctrl and L keys. |
05:20 | Type the command as shown. |
05:24 | As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section. |
05:33 | Next we have written NF not equal to 6. |
05:37 | This checks whether the number of fields in the current line is not equal to 6. |
05:43 | If true, then print section will print the record’s line number NR, along with the entire line denoted by $0.
Press Enter. |
05:55 | In the output, we can see that record number 16 is the incomplete record.
It has only 5 fields instead of 6. |
06:05 | Let us see one more example.
How can we print the first and last field for each student regardless of how many fields there are? |
06:16 | Type the command as shown here on the terminal. |
06:21 | Here we have used hyphen capital F option instead of setting FS variable.
Press Enter. |
06:30 | We get only the first and the last fields for each record in the file. |
06:36 | Let’s try something else now. |
06:39 | Suppose, the student records are distributed across two files demo1.txt, demo2.txt. |
06:48 | We want to print the first 3 lines from each of these two files.
We can do this using NR variable. |
06:57 | Here are the contents of the two files. |
07:02 | Now, to display the first 3 lines from each file, type the following command on the terminal. |
07:11 | Press Enter. |
07:13 | The output shows only the first 3 records of demo1.txt file. |
07:20 | How can we print the same for the second file also? |
07:24 | The solution is to use FNR instead of NR.
FNR is the current record number in the current file. |
07:34 | FNR is incremented each time a new record is read. |
07:39 | It is re-initialized to zero each time a new input file is started. |
07:46 | But NR is the number of input records awk has processed since the starting of the program's execution. |
07:55 | It does not reset to zero with a new file. |
07:59 | Switch to the terminal.
Press the up arrow key to get the previously executed command. |
08:06 | Modify the previous command as follows.
Type FNR instead of NR. |
08:14 | In the Print section, next to NR, type FNR. Press Enter. |
08:21 | See, we get the correct output now.
FNR is set to zero with new file but NR keeps on increasing. |
08:31 | Let us now look at some other built-in variables.
FILENAME variable gives the name of the file being read. |
08:40 | ARGC specifies the number of arguments provided at the command line. |
08:46 | ARGV represents an array that stores the command line arguments. |
08:52 | ENVIRON specifies the array of the shell environment variables and corresponding values. |
09:00 | As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials. |
09:09 | Let us have a look at the variable FILENAME now.
How can we print the name of the current file being processed? |
09:18 | Switch to the terminal and type the command as shown. |
09:23 | Here we have used space as a string concatenation operator.
Press Enter to execute the command. |
09:32 | The output shows the input filename multiple times. |
09:37 | This is because, this command prints the filename once for each row in the awkdemo.txt file.
How can we print this only once? |
09:48 | Clear the terminal.
Press the up arrow key to get the previously executed command. |
09:55 | Modify the previous command as shown here.
Press Enter. |
10:02 | Now, We get the filename only once. |
10:06 | There are some other built-in variables in awk.
Please browse the internet to know more on them. |
10:14 | Suppose, we want to find the students who have passed and have stipend more than Rs.8000 |
10:22 | use comma as the output field separator and print “The data is shown for file” and the name of file in the footer section.
How can we do this? |
10:36 | In the terminal, type the following command.
Press Enter. |
10:43 | We can see that only one student has passed and gets stipend more than Rs.8000.
And, the record number is 2. |
10:53 | We can also see the name of the file in the footer, as desired. |
10:58 | We can use awk for more and more complex tasks. |
11:03 | In that case, it becomes more difficult to write the commands every time on the terminal. |
11:09 | We can instead write the awk program in a separate file. |
11:14 | To be executable, that file should have the dot awk extension. |
11:19 | While executing, we can just specify this awk program filename with the awk command. |
11:26 | For doing so, we need to use hyphen small f option.
Let us see an example. |
11:35 | I have already written an awk program and saved it as prog1 dot awk. |
11:42 | This code is also available in the Code Files link. |
11:46 | Switch to the terminal.
See, what have we written inside single quotes of the command last executed? |
11:55 | Content of prog1.awk file is exactly the same. |
12:00 | The only difference is that in the awk file, we have not written inside the single quotes. |
12:07 | To execute the file, type the following on the terminal-
awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter. |
12:24 | We are getting exactly the same output as we have seen before. |
12:29 | So, this way you can write awk programs and use it multiple times. |
12:35 | This brings us to the end of this tutorial.
Let us summarize. |
12:40 | In this tutorial we learnt about-
Built-in variables, awk script using various examples. |
12:48 | As an assignment-
write an awk script to print the last field of the 5th line in awkdemo.txt file. |
12:58 | Open the system file /etc/passwd on the terminal. |
13:05 | Identify all the separators therein. |
13:09 | Now write a script to process the file from the 20th line onwards. |
13:15 | That too, only for the lines that contain more than 6 fields. |
13:20 | You should print the line number, entire line and count of fields in that particular line. |
13:28 | The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
13:36 | The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates.
For more details, please write to us. |
13:47 | Please post your timed queries in this Forum. |
13:51 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this mission is available at this link. |
14:03 | The script has been contributed by Antara. And this is Praveen from IIT Bombay, signing off.
Thanks for joining. |