Difference between revisions of "Linux-AWK/C2/Built-in-Variables-in-awk/English"

Revision as of 15:16, 31 January 2018

Title of script: Built-in variables and awk Script

Author: Antara Roy Choudhury

Keywords: Built-in variables, RS, ORS, NR, NF, FS, OFS, FILENAME

Visual Cue	Narration
Slide 1: Introduction	Welcome to the spoken tutorial on awk built-in variables and awk script.
Slide 2: Learning Objectives	In this tutorial we will learn about Built-in variables awk script We will do this through some examples.
Slide 3a: System requirement	To record this tutorial, I am using Ubuntu Linux 16.04 OS and gedit text editor 3.20.1
Slide 3b: Code Files	The files used in this tutorial are available in the Code Files link on this tutorial page. Please download and use them.
Slide 4: Prerequisite	To practice this tutorial, you should have gone through the earlier awk tutorials on this website. If not, then please go through the corresponding tutorials on this website.
Slide 5: awk built-in variables	First, let us see some of the built-in variables in awk. Capital RS specifies the record separator in an input file. By default, it is newline. Capital FS specifies the field separator in an input file. By default, the value of FS is a whitespace.
Slide 5: awk built-in variables	Capital ORS defines the output record separator. By default, it is newline. Capital OFS defines the output field separator. By default, it is whitespace. Let us understand the meaning of each of these.
Show awkdemo.txt in Gedit	Let us have a look at the awkdemo file now. When we are processing this awkdemo file with awk command, this becomes our input file.
Highlight appropriately	Observe that all the records are separated from each other by a newline character. newline is the default value for record separator RS variable. So, there is no need to do anything else.
Highlight vertical bar \| character	Notice that all the fields are separated by the pipe symbol. How can we inform awk about it? Let us see.
Slide 6: How to reset value of FS variable?	By default, any number of spaces or a tabs separate the fields. We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials. Or else, we can reset this in the BEGIN section with the use of FS variable.
	Let us do this through an example. Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000.
Open the terminal	Open the terminal by pressing CTRL, ALT and T keys.
cd /<saved folder>	Go to the folder in which you downloaded and extracted the Code Files using cd command.
awk 'BEGIN{FS="\|"} $6>5000 {print $2,$6}' awkdemo.txt	Type the command as shown here.
Highlight {FS="\|"} area	Here in the BEGIN section, we have assigned the value of FS as a pipe symbol. Similarly, we can modify RS variable.
[Enter]	Press Enter to execute the command.
Show the output	The output shows the list of students who are receiving more than Rs.5000 as a stipend.
Highlight appropriately	Here the name field and the stipend field is separated by a blank space. Also, all the records are separated by a newline character.
Slide	Suppose we want colon as the output field separator. And double newline as output record separator. How can we do this? Let us see.
In terminal	In the terminal, press the up arrow key to get the previously executed command.
Modify the previous command awk 'BEGIN{FS="\|";OFS=":"; ORS="\n\n"} $6>5000 {print $2, $6}' awkdemo.txt [Enter]	Modify the command as shown here. And then press Enter.
Show the output	We get the output in the desired format.
Show sample.txt Point or highlight	Now, suppose our new input file is sample.txt. Observe that the field separator here is newline and record separator is double newline.
	How can we extract the roll no. and name information from this file? Yes, you have guessed correctly. We have to modify both FS and RS variables. Pause this tutorial and do this as an assignment.
Slide 7:	Next, let us see other built-in variables. Capital NR gives us the Number of Records processed by awk Capital NF gives the Number of Fields in the current record
Slide	Let us see one example of this. Suppose, we want to find incomplete lines in the file. Here, incomplete line means it has less than the normal 6 fields.
Switch to the terminal and clear it	Switch to the terminal. Let me clear the terminal using Ctrl and L keys
Type awk 'BEGIN{FS="\|"} NF !=6 {print NR, $0}' awkdemo.txt [Enter]	Type the command as shown.
Show the command and highlight appropriate areas as per narration	As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section. Next we have written NF!=6. This checks whether the number of fields in the current line, is not equal to 6. If true, then print section will print the record’s line number NR, along with the entire line denoted by $0. Press Enter.
Show the output	In the output, we can see that record number 16 is the incomplete record. It has only 5 fields instead of 6.
Retain the same screen	Let us see one more example. How can we print the first and last field for each student regardless of how many fields there are?
Type: awk -F"\|" '{print $1,$NF}' awkdemo.txt	Type the command as shown here on the terminal.
Highlight -F	Here we have used hyphen capital F option instead of setting FS variable. Press Enter.
Show the Output	We get only the first and the last fields for each record in the file.
Slide 8:	Let’s try something else now. Suppose, the student records are distributed across two files demo1.txt and demo2.txt. We want to print the first 3 lines from each of these two files. We can do this using NR variable.
Show demo1.txt & demo2.txt in gedit	Here are the contents of the two files.
Type: awk 'NR<=3 {print NR, $0}' demo1.txt demo2.txt [Enter]	Now to display the first 3 lines from each file, type the following command on the terminal. Press Enter.
Show the output	The output shows only the first 3 records of demo1.txt file. How can we print the same for the second file also?
Slide 8:	The solution is to use FNR instead of NR. FNR is the current record number in the current file. FNR is incremented each time a new record is read. It is reinitialized to zero each time a new input file is started.
Slide 9:	But NR is the number of input records awk has processed since the starting of the program's execution. It does not reset to zero with a new file.
In Terminal Press up key	Switch to terminal. Press the up arrow key to get the previous command.
awk ‘FNR<=3 {print NR,FNR, $0}’ demo1.txt demo2.txt [Enter]	Modify the previous command as follows. Type FNR instead of NR. In the Print section, next to NR, type FNR and press Enter.
Show the output and highlight appropriately	See, we get the correct output now. FNR is set to zero with new file but NR keeps on increasing.
Slide 10:	Let us now look at some other built-in variables. FILENAME variable gives the name of the file being read. ARGC specifies the number of arguments provided at the command line.
Slide 11	ARGV represents an array that stores the command line arguments. ENVIRON specifies the array of the shell environment variables and corresponding values. As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials.
	Let us have a look at the variable FILENAME now. How can we print the name of the current file being processed?
Type: awk '{print "We are processing input file " FILENAME}' awkdemo.txt	Switch to the terminal and type the command as shown.
Highlight the space here awk '{print "We are processing input file " FILENAME}' awkdemo.txt	Here we have used space as a string concatenation operator. Press Enter to execute the command.
Show the output.	The output shows the input filename multiple times. This is because, this command prints the filename once for each row in the awkdemo.txt file. How can we print this only once?
Press Up arrow key	Clear the terminal Press the up arrow key to get the previously executed command.
Modify the command to become: awk 'END{print "We are processing input file " FILENAME}' awkdemo.txt [Enter]	Modify the previous command as shown here. Press Enter.
Show the output	We get the filename only once.
Retain same screen	There are some other built-in variables in awk. Please browse the internet to know more on them.
Slide 12	Suppose, we want to find the students who have passed and have stipend more than Rs.8000 use comma as the output field separator and print “The data is shown for file” and the name of file in the footer section. How can we do this?
awk 'BEGIN{FS="\|"; OFS=","} $5=="Pass" && $6>8000 {print NR, $2, $5, $6} END{print "The data is shown for file " FILENAME }' awkdemo.txt [Enter]	In the terminal type the following command Press Enter.
Show the output and highlight appropriately	We can see that only one student has passed and gets stipend more than Rs.8000. And the record number is 2. We can also see the name of the file in the footer, as desired.
Slide 13	We can use awk for more and more complex tasks. In that case, it becomes more difficult to write the commands every time on the terminal. We can instead write the awk program in a separate file. To be executable, that file should have the dot awk extension.
Slide 14	While executing, we can just specify this awk program filename with the awk command. For doing so, we need to use hyphen small f option. Let us see an example.
Show prog1.awk in gedit	I have already written an awk program and saved it as prog1 dot awk. This code is also available in the Code Files link.
In the terminal show the last executed command Highlight the portion in the last executed command and also in this program	Switch to the terminal. See what have we written inside single quotes of the command last executed? Content of prog1.awk file is exactly the same. The only difference is that in the awk file, we have not written inside the single quotes.
Type: awk -f prog1.awk awkdemo.txt [enter]	To execute the file, type the following on the terminal- awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter
Show the output	We are getting exactly the same output as we have seen before. So, this way you can write awk programs and use it multiple times.
	This brings us to the end of this tutorial. Let us summarize.
Slide 15 Summary	In this tutorial we learnt about- Built-in variables awk script using various examples.
Slide 16 Assignment 1	As an assignment- 1. Write an awk script to print the last field of the 5th line in awkdemo.txt file.
Slide 17 Assignment 2	1. Open the system file /etc/passwd on the terminal. 2. Identify all the separators therein. 3. Now write a script to process the file from the 20th line onwards. 4. That too, only for the lines that contain more than 6 fields. 5. You should print the line number, entire line and count of fields in that particular line.
Slide 18 About Spoken Tutorial project	The video at the following link summarises the Spoken Tutorial project. Please download and watch it.
Slide 19 Spoken Tutorial workshops	The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates. For more details, please write to us.
Slide 20 Forum for specific questions:	Pls post your timed queries in this Forum.
Slide 21 Acknowledgement	Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. More information on this mission is available at this link.
	The script has been contributed by Antara. And this is Praveen from IIT Bombay signing off. Thanks for joining.

Contributors and Content Editors

Antarade, Nancyvarkey

@@ Line 18: / Line 18: @@
 | style="background-color:#ffffff;border-top:0.035cm solid #000001;border-bottom:0.035cm solid #000001;border-left:0.035cm solid #000001;border-right:none;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"| Slide 2: Learning Objectives
 | style="background-color:#ffffff;border:0.035cm solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"| In this tutorial we will learn about
 * '''Built-in variables '''
@@ Line 29: / Line 28: @@
 | style="background-color:#ffffff;border:0.035cm solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"| To record this tutorial, I am using
-* '''Ubuntu Linux 16.04 OS '''and''' '''
+* '''Ubuntu Linux 16.04 OS '''and
-* '''gedit text editor version 3.20.1'''
+* '''gedit text editor''' 3.20.1
 |-
@@ Line 60: / Line 57: @@
 |-
 | style="background-color:#ffffff;border-top:0.035cm solid #000001;border-bottom:0.035cm solid #000001;border-left:0.035cm solid #000001;border-right:none;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"| Slide 5: awk built-in variables
-| style="background-color:#ffffff;border:0.035cm solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"| * Capital '''ORS''' defines the '''output record separator'''.
+| style="background-color:#ffffff;border:0.035cm solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"|
+* Capital '''ORS''' defines the '''output record separator'''.
 By default, it is '''newline'''.
@@ Line 350: / Line 348: @@
-It does to reset to zero with a new file.
+It does not reset to zero with a new file.
 |-

Difference between revisions of "Linux-AWK/C2/Built-in-Variables-in-awk/English"

Revision as of 15:16, 31 January 2018

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools