Linux-AWK/C2/Loops-in-awk/English

From Script | Spoken-Tutorial
Revision as of 06:41, 31 January 2018 by Antarade (Talk | contribs)

Jump to: navigation, search

Title of script: Built-in variables and awk Script

Author: Antara Roy Choudhury

Keywords: Built-in variables, RS, ORS, NR, NF, FS, OFS, FILENAME


Visual Cue
Narration
Slide 1: Introduction Welcome to the spoken tutorial on awk built-in variables and awk script.
Slide 2: Learning Objectives In this tutorial we will learn about


  • Built-in variables
  • awk script

We will do this through some examples.

Slide 3a: System requirement To record this tutorial, I am using
  • Ubuntu Linux 16.04 OS and
  • gedit text editor version 3.20.1


Slide 3b: Code Files The files used in this tutorial are available in the Code Files link on this tutorial page.


Please download and use them.

Slide 4: Prerequisite To practice this tutorial, you should have gone through the earlier awk tutorials on this website.


If not, then please go through the corresponding tutorials on this website.

Slide 5: awk built-in variables First, let us see some of the built-in variables in awk.


  • Capital RS specifies the record separator in an input file. By default, it is newline.
  • Capital FS specifies the field separator in an input file.

By default, the value of FS is a whitespace.

Slide 5: awk built-in variables * Capital ORS defines the output record separator.

By default, it is newline.

  • Capital OFS defines the output field separator.

By default, it is whitespace.


Let us understand the meaning of each of these.

Show awkdemo.txt in Gedit Let us have a look at the awkdemo file now.


When we are processing this awkdemo file with awk command, this becomes our input file.

Highlight appropriately Observe that all the records are separated from each other by a newline character.


newline is the default value for record separator RS variable.


So, there is no need to do anything else.

Highlight vertical bar | character Notice that all the fields are separated by the pipe symbol.


How can we inform awk about it?


Let us see.

Slide 6: How to reset value of FS variable? By default, any number of spaces or a tabs separate the fields.


We can reset this with the help of hyphen capital F option as learnt in our earlier tutorials.


Or else, we can reset this in the BEGIN section with the use of FS variable.

Let us do this through an example.


Suppose, I want to find out the name of students who are getting a stipend of more than Rs.5000.

Open the terminal Open the terminal by pressing CTRL, ALT and T keys.
cd /<saved folder> Go to the folder in which you downloaded and extracted the Code Files using cd command.
awk 'BEGIN{FS="|"} $6>5000 {print $2,$6}' awkdemo.txt Type the command as shown here.
Highlight {FS="|"} area Here in the BEGIN section, we have assigned the value of FS as a pipe symbol.


Similarly, we can modify RS variable.

[Enter] Press Enter to execute the command.
Show the output The output shows the list of students who are receiving more than Rs.5000 as a stipend.
Highlight appropriately Here the name field and the stipend field is separated by a blank space.


Also, all the records are separated by a newline character.

Slide


Suppose we want colon as the output field separator.


And double newline as output record separator.


How can we do this? Let us see.

In terminal In the terminal, press the up arrow key to get the previously executed command.
Modify the previous command


awk 'BEGIN{FS="|";OFS=":"; ORS="\n\n"} $6>5000 {print $2, $6}' awkdemo.txt

[Enter]

Modify the command as shown here.


And then press Enter.

Show the output We get the output in the desired format.
Show sample.txt


Point or highlight

Now, suppose our new input file is sample.txt.


Observe that the field separator here is newline and record separator is double newline.

How can we extract the roll no. and name information from this file?


Yes, you have guessed correctly.


We have to modify both FS and RS variables.


Pause this tutorial and do this as an assignment.

Slide 7: Next, let us see other built-in variables.


  • Capital NR gives us the Number of Records processed by awk
  • Capital NF gives the Number of Fields in the current record


Slide Let us see one example of this.


Suppose, we want to find incomplete lines in the file.


Here, incomplete line means it has less than the normal 6 fields.

Switch to the terminal and clear it Switch to the terminal.


Let me clear the terminal using Ctrl and L keys

Type

awk 'BEGIN{FS="|"} NF !=6 {print NR, $0}' awkdemo.txt

[Enter]

Type the command as shown.



Show the command and highlight appropriate areas as per narration As the fields are separated by pipe symbol, set the FS value to pipe symbol in the BEGIN section.


Next we have written NF!=6.


This checks whether the number of fields in the current line, is not equal to 6.


If true, then

  • print section will print the record’s line number NR,
  • along with the entire line denoted by $0.

Press Enter.

Show the output In the output, we can see that record number 16 is the incomplete record.


It has only 5 fields instead of 6.

Retain the same screen Let us see one more example.


How can we print the first and last field for each student regardless of how many fields there are?

Type:


awk -F"|" '{print $1,$NF}' awkdemo.txt

Type the command as shown here on the terminal.



Highlight -F Here we have used hyphen capital F option instead of setting FS variable.


Press Enter.

Show the Output We get only the first and the last fields for each record in the file.
Slide 8: Let’s try something else now.


Suppose, the student records are distributed across two files demo1.txt and demo2.txt.


We want to print the first 3 lines from each of these two files.


We can do this using NR variable.

Show demo1.txt & demo2.txt in gedit Here are the contents of the two files.
Type:

awk 'NR<=3 {print NR, $0}' demo1.txt demo2.txt


[Enter]

Now to display the first 3 lines from each file, type the following command on the terminal.


Press Enter.

Show the output The output shows only the first 3 records of demo1.txt file.


How can we print the same for the second file also?

Slide 8: The solution is to use FNR instead of NR.


FNR is the current record number in the current file.


FNR is incremented each time a new record is read.


It is reinitialized to zero each time a new input file is started.

Slide 9: But NR is the number of input records awk has processed since the starting of the program's execution.


It does to reset to zero with a new file.

In Terminal


Press up key

Switch to terminal.


Press the up arrow key to get the previous command.

awk ‘FNR<=3 {print NR,FNR, $0}’ demo1.txt demo2.txt

[Enter]

Modify the previous command as follows.


Type FNR instead of NR.


In the Print section, next to NR, type FNR and press Enter.

Show the output and highlight appropriately See, we get the correct output now.


FNR is set to zero with new file but NR keeps on increasing.

Slide 10: Let us now look at some other built-in variables.


FILENAME variable gives the name of the file being read.


ARGC specifies the number of arguments provided at the command line.

Slide 11 ARGV represents an array that stores the command line arguments.


ENVIRON specifies the array of the shell environment variables and corresponding values.


As ARGV and ENVIRON use array in awk, we will look at those in subsequent tutorials.

Let us have a look at the variable FILENAME now.


How can we print the name of the current file being processed?

Type:

awk '{print "We are processing input file " FILENAME}' awkdemo.txt

Switch to the terminal and type the command as shown.
Highlight the space here

awk '{print "We are processing input file " FILENAME}' awkdemo.txt

Here we have used space as a string concatenation operator.


Press Enter to execute the command.

Show the output. The output shows the input filename multiple times.


This is because, this command prints the filename once for each row in the awkdemo.txt file.


How can we print this only once?

Press Up arrow key Clear the terminal


Press the up arrow key to get the previously executed command.

Modify the command to become:

awk 'END{print "We are processing input file " FILENAME}' awkdemo.txt

[Enter]

Modify the previous command as shown here.


Press Enter.

Show the output We get the filename only once.
Retain same screen There are some other built-in variables in awk.


Please browse the internet to know more on them.

Slide 12


Suppose, we want to
  • find the students who have passed and have stipend more than Rs.8000
  • use comma as the output field separator
  • and print “The data is shown for file” and the name of file in the footer section.

How can we do this?

awk 'BEGIN{FS="|"; OFS=","} $5=="Pass" && $6>8000 {print NR, $2, $5, $6} END{print "The data is shown for file " FILENAME }' awkdemo.txt


[Enter]

In the terminal type the following command


Press Enter.

Show the output and highlight appropriately We can see that only one student has passed and gets stipend more than Rs.8000.


And the record number is 2.


We can also see the name of the file in the footer, as desired.

Slide 13 We can use awk for more and more complex tasks.


In that case, it becomes more difficult to write the commands every time on the terminal.


We can instead write the awk program in a separate file.


To be executable, that file should have the dot awk extension.

Slide 14 While executing, we can just specify this awk program filename with the awk command.


For doing so, we need to use hyphen small f option.


Let us see an example.

Show prog1.awk in gedit I have already written an awk program and saved it as prog1 dot awk.


This code is also available in the Code Files link.

In the terminal show the last executed command


Highlight the portion in the last executed command and also in this program

Switch to the terminal.


See what have we written inside single quotes of the command last executed?


Content of prog1.awk file is exactly the same.


The only difference is that in the awk file, we have not written inside the single quotes.

Type:

awk -f prog1.awk awkdemo.txt [enter]

To execute the file, type the following on the terminal-


awk space hyphen small f space prog1.awk space awkdemo.txt and press Enter

Show the output We are getting exactly the same output as we have seen before.


So, this way you can write awk programs and use it multiple times.

This brings us to the end of this tutorial.


Let us summarize.

Slide 15

Summary

In this tutorial we learnt about-
  • Built-in variables
  • awk script

using various examples.

Slide 16

Assignment 1


As an assignment-


1. Write an awk script to print the last field of the 5th line in awkdemo.txt file.

Slide 17

Assignment 2

1. Open the system file /etc/passwd on the terminal.


2. Identify all the separators therein.


3. Now write a script to process the file from the 20th line onwards.


4. That too, only for the lines that contain more than 6 fields.


5. You should print the line number, entire line and count of fields in that particular line.

Slide 18

About Spoken Tutorial project

The video at the following link summarises the Spoken Tutorial project.


Please download and watch it.

Slide 19

Spoken Tutorial workshops

The Spoken Tutorial Project team conducts workshops using spoken tutorials and gives certificates.


For more details, please write to us.

Slide 20

Forum for specific questions:

Pls post your timed queries in this Forum.
Slide 21

Acknowledgement

Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.


More information on this mission is available at

this link.

The script has been contributed by Antara.


And this is Praveen from IIT Bombay signing off.


Thanks for joining.

Contributors and Content Editors

Antarade, Nancyvarkey