Linux-AWK/C2/Built-in-Variables-in-awk/English-timed

Time	Narration
00:01	Welcome to the spoken tutorial on awk built-in variables and awk script.
00:07	In this tutorial, we will learn about Built-in variables , awk script.
00:14	We will do this through some examples.
00:17	To record this tutorial, I am using: Ubuntu Linux 16.04 Operating System and gedit text editor 3.20.1
00:30	The files used in this tutorial are available in the Code Files link on this tutorial page. Please download and use them.
00:40	To practice this tutorial, you should have gone through the earlier awk tutorials on this website.
00:47	If not, then please go through the corresponding tutorials on this website.
00:52	First, let us see some of the built-in variables in awk.
00:57	Capital RS specifies the record separator in an input file. By default, it is newline.
01:07	Capital FS specifies the field separator in an input file.
01:13	By default, the value of FS is a whitespace.
01:18	Capital ORS defines the output record separator. By default, it is newline.
01:27	Capital OFS defines the output field separator. By default, it is whitespace.
01:36	Let us understand the meaning of each of these.
01:40	Let us have a look at the awkdemo file now.
01:44	When we are processing this awkdemo file with 'awk' command, this becomes our input file.
01:51	Observe that all the records are separated from each other by a newline character.
01:58	newline is the default value for record separator RS variable. So, there is no need to do anything else.
02:08	Notice that all the fields are separated by the pipe symbol. How can we inform awk about it? Let us see.
02:18	By default, any number of spaces or tabs separate the fields.
02:24	We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials.
02:33	Or else, we can reset this in the BEGIN section with the use of FS variable.
02:40	Let us do this through an example. Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000.
02:51	Open the terminal by pressing CTRL, ALT and T keys.
02:57	Go to the folder in which you downloaded and extracted the Code Files using cd command.
03:04	Type the command as shown here.
03:08	Here, in the BEGIN section, we have assigned the value of FS as a pipe symbol. Similarly, we can modify RS variable.
03:19	Press Enter to execute the command.
03:23	The output shows the list of students who are receiving more than Rs.5000 as a stipend.
03:30	Here, the name field and the stipend field are separated by a blank space.
03:36	Also, all the records are separated by a newline character.
03:42	Suppose we want colon as the output field separator and double newline as output record separator.
03:52	How can we do this? Let us see.
03:55	In the terminal, press the up arrow key to get the previously executed command.
04:01	Modify the command as shown here and then press Enter.
04:08	We get the output in the desired format.
04:12	Now, suppose our new input file is sample.txt.
04:18	Observe that the field separator here is newline and record separator is double newline.
04:27	How can we extract the roll no. and name information from this file?
04:32	Yes, you have guessed correctly. We have to modify both the FS and RS variables.
04:39	Pause this tutorial and do this as an assignment.
04:43	Next, let us see other built-in variables.
04:47	Capital NR gives the Number of Records processed by awk.
04:53	Capital NF gives the Number of Fields in the current record.
04:59	Let us see one example on this. Suppose, we want to find incomplete lines in the file.
05:07	Here, incomplete line means it has less than the normal 6 fields.
05:13	Switch to the terminal. Let me clear the terminal using Ctrl and L keys.
05:20	Type the command as shown.
05:24	As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section.
05:33	Next we have written NF not equal to 6.
05:37	This checks whether the number of fields in the current line is not equal to 6.
05:43	If true, then print section will print the record’s line number NR, along with the entire line denoted by $0. Press Enter.
05:55	In the output, we can see that record number 16 is the incomplete record. It has only 5 fields instead of 6.
06:05	Let us see one more example. How can we print the first and last field for each student regardless of how many fields there are?
06:16	Type the command as shown here on the terminal.
06:21	Here we have used hyphen capital F option instead of setting FS variable. Press Enter.
06:30	We get only the first and the last fields for each record in the file.
06:36	Let’s try something else now.
06:39	Suppose, the student records are distributed across two files demo1.txt, demo2.txt.
06:48	We want to print the first 3 lines from each of these two files. We can do this using NR variable.
06:57	Here are the contents of the two files.
07:02	Now, to display the first 3 lines from each file, type the following command on the terminal.
07:11	Press Enter.
07:13	The output shows only the first 3 records of demo1.txt file.
07:20	How can we print the same for the second file also?
07:24	The solution is to use FNR instead of NR. FNR is the current record number in the current file.
07:34	FNR is incremented each time a new record is read.
07:39	It is re-initialized to zero each time a new input file is started.
07:46	But NR is the number of input records awk has processed since the starting of the program's execution.
07:55	It does not reset to zero with a new file.
07:59	Switch to the terminal. Press the up arrow key to get the previously executed command.
08:06	Modify the previous command as follows. Type FNR instead of NR.
08:14	In the Print section, next to NR, type FNR. Press Enter.
08:21	See, we get the correct output now. FNR is set to zero with new file but NR keeps on increasing.
08:31	Let us now look at some other built-in variables. FILENAME variable gives the name of the file being read.
08:40	ARGC specifies the number of arguments provided at the command line.
08:46	ARGV represents an array that stores the command line arguments.
08:52	ENVIRON specifies the array of the shell environment variables and corresponding values.
09:00	As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials.
09:09	Let us have a look at the variable FILENAME now. How can we print the name of the current file being processed?
09:18	Switch to the terminal and type the command as shown.
09:23	Here we have used space as a string concatenation operator. Press Enter to execute the command.
09:32	The output shows the input filename multiple times.
09:37	This is because, this command prints the filename once for each row in the awkdemo.txt file. How can we print this only once?
09:48	Clear the terminal. Press the up arrow key to get the previously executed command.
09:55	Modify the previous command as shown here. Press Enter.
10:02	Now, We get the filename only once.
10:06	There are some other built-in variables in awk. Please browse the internet to know more on them.
10:14	Suppose, we want to find the students who have passed and have stipend more than Rs.8000
10:22	use comma as the output field separator and print “The data is shown for file” and the name of file in the footer section. How can we do this?
10:36	In the terminal, type the following command. Press Enter.
10:43	We can see that only one student has passed and gets stipend more than Rs.8000. And, the record number is 2.
10:53	We can also see the name of the file in the footer, as desired.
10:58	We can use awk for more and more complex tasks.
11:03	In that case, it becomes more difficult to write the commands every time on the terminal.
11:09	We can instead write the awk program in a separate file.
11:14	To be executable, that file should have the dot awk extension.
11:19	While executing, we can just specify this awk program filename with the awk command.
11:26	For doing so, we need to use hyphen small f option. Let us see an example.
11:35	I have already written an awk program and saved it as prog1 dot awk.
11:42	This code is also available in the Code Files link.
11:46	Switch to the terminal. See, what have we written inside single quotes of the command last executed?
11:55	Content of prog1.awk file is exactly the same.
12:00	The only difference is that in the awk file, we have not written inside the single quotes.
12:07	To execute the file, type the following on the terminal- awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter.
12:24	We are getting exactly the same output as we have seen before.
12:29	So, this way you can write awk programs and use it multiple times.
12:35	This brings us to the end of this tutorial. Let us summarize.
12:40	In this tutorial we learnt about- Built-in variables, awk script using various examples.
12:48	As an assignment- write an awk script to print the last field of the 5th line in awkdemo.txt file.
12:58	Open the system file /etc/passwd on the terminal.
13:05	Identify all the separators therein.
13:09	Now write a script to process the file from the 20th line onwards.
13:15	That too, only for the lines that contain more than 6 fields.
13:20	You should print the line number, entire line and count of fields in that particular line.
13:28	The video at the following link summarises the Spoken Tutorial project. Please download and watch it.
13:36	The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates. For more details, please write to us.
13:47	Please post your timed queries in this Forum.
13:51	Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. More information on this mission is available at this link.
14:03	The script has been contributed by Antara. And this is Praveen from IIT Bombay, signing off. Thanks for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14

Linux-AWK/C2/Built-in-Variables-in-awk/English-timed

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools