Difference between revisions of "Linux-AWK/C2/Basics-of-awk/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with ''''Title of script''': The Awk tool Part 1 '''Author: Sachin Patil''' '''Keywords: Selection criteria, action, formatted printing, fields, Regular expressions, Variables''' …')
 
m (Nancyvarkey moved page Linux/C3/Basics-of-awk/English to Linux-AWK/C2/Basics-of-awk/English without leaving a redirect: New series)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Title of script''': The Awk tool Part 1
+
'''Title of script''': The Awk Command
  
'''Author: Sachin Patil'''
+
'''Author: Sachin Patil and Anirban'''
  
 
'''Keywords: Selection criteria, action, formatted printing, fields, Regular expressions, Variables'''
 
'''Keywords: Selection criteria, action, formatted printing, fields, Regular expressions, Variables'''
Line 13: Line 13:
 
|-
 
|-
 
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 1
 
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 1
| style="border:1pt solid #000000;padding:0.097cm;"| Welcome to this spoken tutorial on the '''awk''' command.
+
| style="border:1pt solid #000000;padding:0.097cm;"| Welcome to the spoken tutorial on the '''awk''' command.
  
 
|-
 
|-
Line 21: Line 21:
 
| style="border:1pt solid #000000;padding:0.176cm;"| In this tutorial we will learn,
 
| style="border:1pt solid #000000;padding:0.176cm;"| In this tutorial we will learn,
  
awk command.
+
'''awk command.'''
 +
 
 +
We will do this through some examples.
  
 
|-
 
|-
Line 29: Line 31:
 
| style="border:1pt solid #000000;padding:0.176cm;"| To record this tutorial, I am using
 
| style="border:1pt solid #000000;padding:0.176cm;"| To record this tutorial, I am using
  
Ubuntu Linux 12.04 OS
+
*Ubuntu Linux 12.04 OS
 +
*GNU BASH v. 4.2.24
  
GNU BASH v. 4.2.24(1)
+
Please note, GNU Bash version 4 or above is recommended to practice this tutorial.
 
+
Please note, GNU bash version 4 or above is recommended to practice this tutorial.
+
  
 
|-
 
|-
 
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 4
 
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 4
 +
 +
  
 
Introduction
 
Introduction
| style="border:1pt solid #000000;padding:0.097cm;"| The''' awk''' command is a very powerful text manipulation tool of Linux.  
+
| style="border:1pt solid #000000;padding:0.097cm;"| Let us start with an introduction to '''awk'''.
 +
 
 +
The''' awk''' command is a very powerful text manipulation tool.
  
It is named after its authors, Aho, Weinberger and Kernighan.
+
It is named after its authors, '''Aho, Weinberger''' and '''Kernighan'''.
  
 
|-
 
|-
Line 58: Line 63:
 
| style="border:1pt solid #000000;padding:0.097cm;"| Let us see some examples.  
 
| style="border:1pt solid #000000;padding:0.097cm;"| Let us see some examples.  
  
For demonstration purpose we use the''' awkdemo.txt '''file.  
+
For demonstration purpose, we use the''' awkdemo.txt '''file.  
 
+
Let us see the contents of '''awkdemo.txt'''.
+
 
+
  
 +
Let us see the contents of '''awkdemo.txt''' file.
  
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| '''awkdemo.txt'''
+
| style="border:1pt solid #000000;padding:0.176cm;"| Now open the '''terminal''' by pressing
| style="border:1pt solid #000000;padding:0.176cm;"| This is the content of '''awkdemo.txt''' file.
+
  
|-
+
'''CTRL + ALT''' and '''T''' keys simultaneosuly on your keyboard.
| style="border:1pt solid #000000;padding:0.176cm;"| Open the terminal
+
 
+
'''ctrl+alt+t'''
+
  
  
 
type:
 
type:
| style="border:1pt solid #000000;padding:0.097cm;"| Now open the terminal by pressing '''ctrl+alt+t'''
+
| style="border:1pt solid #000000;padding:0.097cm;"| Now open the terminal by pressing '''CTRL + ALT and T keys''' simultaneosuly on your keyboard.
  
 
|-
 
|-
Line 83: Line 82:
  
 
"awk '/Pass/ {print}' awkdemo.txt<nowiki>" [</nowiki>enter]
 
"awk '/Pass/ {print}' awkdemo.txt<nowiki>" [</nowiki>enter]
| style="border:1pt solid #000000;padding:0.097cm;"| Now type:
+
| style="border:1pt solid #000000;padding:0.097cm;"| Let us see how to print using '''awk command.'''
  
'''awk''' space (opening single quote) (front slash) ‘'''/Pass '''(front slash)'''/'''(in curly brace) '''{print}''' (closing single quote) space '''awkdemo.txt'''
+
Type:
  
 +
'''awk space''' (within single quote) (front slash) ‘'''/Pass '''(front slash)'''/'''(opening curly bracket) '''{print (closing curly bracket)}''' (after the  quotes) '''space awkdemo.txt'''
  
Here '''Pass''' is the selection criterion.  
+
 
 +
Press '''Enter'''
 +
 
 +
|-
 +
| style="border:1pt solid #000000;padding:0.097cm;"|
 +
| style="border:1pt solid #000000;padding:0.097cm;"|Here '''Pass''' is the selection criterion.  
  
  
Line 100: Line 105:
  
 
"awk '/M<nowiki>[</nowiki>ei]*ra<nowiki>*/ {</nowiki>print}' awkdemo.txt<nowiki>" [</nowiki>enter]
 
"awk '/M<nowiki>[</nowiki>ei]*ra<nowiki>*/ {</nowiki>print}' awkdemo.txt<nowiki>" [</nowiki>enter]
| style="border:1pt solid #000000;padding:0.097cm;"| We can also use regular expressions in awk
+
| style="border:1pt solid #000000;padding:0.097cm;"| We can also use '''regular expressions''' in '''awk'''
  
  
Line 106: Line 111:
  
  
We would type:
+
|-
 +
| style="border:1pt solid #000000;padding:0.097cm;"|
 +
| style="border:1pt solid #000000;padding:0.097cm;"|We would type:
  
  
'''awk''' space ''''/M<nowiki>[</nowiki>ei]*ra<nowiki>*/{</nowiki>print}'''' space '''awkdemo.txt'''
+
'''awk space '/M opening square bracket [ ei closing square bracket ]*ra */{ print}' space awkdemo.txt'''
  
  
Line 115: Line 122:
  
  
<nowiki>* </nowiki>will give one or more occurrences of previous character.
+
|-
 
+
| style="border:1pt solid #000000;padding:0.097cm;"|
 
+
| style="border:1pt solid #000000;padding:0.097cm;"| '''"*"'''  will give one or more occurrences of previous character.
Thus entries with more than one occurrence for i, e and a will be listed
+
 
+
 
+
For ex. Meera
+
  
Mira
 
  
Meeraa
+
Thus entries with more than one occurrence for '''i, e''' and '''a''' will be listed.
  
  
 +
For ex.
 +
*Meera
 +
*Mira
 +
*Meeraa
  
  
 
|-
 
|-
 
| style="border:1pt solid #000000;padding:0.097cm;"|  
 
| style="border:1pt solid #000000;padding:0.097cm;"|  
| style="border:1pt solid #000000;padding:0.097cm;"| '''awk''' supports the extended regular expressions (ERE).
+
| style="border:1pt solid #000000;padding:0.097cm;"| '''awk''' supports the '''extended regular expressions (ERE)'''.
  
Which means we can match multiple patterns separated by a PIPE.
+
Which means we can match multiple patterns separated by a '''PIPE'''.
  
 
|-
 
|-
Line 140: Line 146:
  
 
"awk '/civil|electrical/ {print}' awkdemo<nowiki>" [</nowiki>enter]
 
"awk '/civil|electrical/ {print}' awkdemo<nowiki>" [</nowiki>enter]
| style="border:1pt solid #000000;padding:0.097cm;"| Type at the terminal:
+
| style="border:1pt solid #000000;padding:0.097cm;"| Now type:
  
  
'''awk''' space (in single quotes)(front slash) ‘'''/civil''' (vertical bar)'''|electrical '''(front slash)'''/{print}'''' '''awkdemo.txt'''
+
'''awk space (in single quotes)(front slash) ‘/civil (vertical bar)'''|electrical '''(front slash)'''/{print}'''' '''awkdemo.txt'''
  
  
'''Press Enter.'''
+
Press '''Enter.'''
  
  
Line 156: Line 162:
  
  
| style="border:1pt solid #000000;padding:0.097cm;"| Lets go back to the slides.  
+
| style="border:1pt solid #000000;padding:0.097cm;"| Let us go back to our slides.  
  
  
Awk has some special parameters to identify individual fields of a line.  
+
'''awk''' has some special parameters to identify individual fields of a line.  
  
  
$1(Dollar 1) would indicate the first field.  
+
'''$1(Dollar 1)''' would indicate the first field.  
  
  
Similarly we can have $2, $3 and so on for respective fields.  
+
Similarly we can have '''$2, $3''' and so on for respective fields.  
  
  
$0 represents the entire line.
+
'''$0''' represents the entire line.
  
 
|-
 
|-
Line 181: Line 187:
  
  
A delimiter separates words from each other.
+
A '''delimiter''' separates words from each other.
  
  
A delimiter can also be a single '''whitespace. '''
+
A '''delimiter''' can also be a single '''whitespace. '''
 
+
 
+
To specify a delimiter we have to give '''-F '''flag followed by a delimiter.
+
  
  
 +
To specify a '''delimiter''', we have to give '''- capital F '''flag followed by a '''delimiter'''.
  
  
Line 199: Line 203:
  
  
| style="border:1pt solid #000000;padding:0.176cm;"| Lets go back to the terminal.
+
| style="border:1pt solid #000000;padding:0.176cm;"| Lets go back to the '''terminal'''.
  
  
Line 205: Line 209:
  
  
'''awk''' space minus capital F space  
+
'''awk space minus capital F space''' within double quotes''' PIPE space''' within single quote '''front slash civil PIPE electrical front slash ''' within curly braces''' print space dollar0 space awkdemo.txt'''
 
+
 
+
Within double quotes''' PIPE''' space
+
 
+
 
+
Within single quote front slash '''civil''' '''PIPE''' '''electrical front slash '''
+
 
+
 
+
Within curly braces''' print''' space '''dollar0 '''now outside the quotes space''' awkdemo.txt'''
+
  
 +
Press '''Enter'''
  
 
This print the entire line since we have used '''$0.'''
 
This print the entire line since we have used '''$0.'''
Line 223: Line 219:
  
 
“awk -F"|" '/Pass/ {print $2, $3}' awkdemo<nowiki>” [</nowiki>enter]
 
“awk -F"|" '/Pass/ {print $2, $3}' awkdemo<nowiki>” [</nowiki>enter]
| style="border:1pt solid #000000;padding:0.097cm;"| Notice that, names and stream of students are the second and third fields.
+
| style="border:1pt solid #000000;padding:0.097cm;"| Notice that, '''names''' and '''stream of students''' are the second and third fields.
  
  
Say we only want to print two fields.
+
|-
 +
| style="border:1pt solid #000000;padding:0.097cm;"|
 +
| style="border:1pt solid #000000;padding:0.097cm;"|Say we only want to print two fields.
  
  
We will replace '''$0''' with '''$2,$3''' in the above command.
+
We will replace '''$0''' with '''$2 and$3''' in the above command.
  
  
Let’s try.
+
Press '''Enter '''
  
 
+
Only two fields are shown.
press '''Enter '''
+
  
 
|-
 
|-
 
| style="border:1pt solid #000000;padding:0.097cm;"|  
 
| style="border:1pt solid #000000;padding:0.097cm;"|  
| style="border:1pt solid #000000;padding:0.097cm;"| Though it gives the right result the display is all jagged and unformatted.  
+
| style="border:1pt solid #000000;padding:0.097cm;"| Though it gives the right result, the display is all jagged and unformatted.  
 
+
 
+
  
  
Line 251: Line 246:
  
  
| style="border:1pt solid #000000;padding:0.176cm;"| We can provide formatted output by using the C style '''printf '''statement.  
+
| style="border:1pt solid #000000;padding:0.176cm;"| We can provide formatted output by using the '''C''' style '''printf '''statement.  
  
  
We can also provide a serial number by using a builtin variable NR.  
+
We can also provide a serial number by using a builtin variable '''NR'''.  
  
  
Line 260: Line 255:
  
  
We would write:
+
Now type:
  
 
'''awk''' space '''-F”|”''' space ''''/Pass/{printf “%4d %-25s %-15s \n”,''' '''NR,$2,$3 }'''' space '''awkdemo.txt'''  
 
'''awk''' space '''-F”|”''' space ''''/Pass/{printf “%4d %-25s %-15s \n”,''' '''NR,$2,$3 }'''' space '''awkdemo.txt'''  
Line 267: Line 262:
 
'''Press Enter.'''
 
'''Press Enter.'''
  
 +
We see the difference.
  
Here the NR stands for number of records.
 
  
 +
Here the '''NR''' stands for number of records.
  
records are integers, hence we have written %d.
 
  
 +
Records are '''integers''', hence we have written''' %d'''.
  
Name and stream are strings. So we have used %s.
 
  
 +
'''Name''' and '''Stream''' are '''strings'''. So we have used '''%s'''.
  
Here 25s will reserve 25 spaces for Name field.
 
  
 +
Here '''25s''' will reserve 25 spaces for '''Name''' field.
  
15s will reserve for Stream field.
 
  
 +
'''15s''' will reserve for '''Stream''' field.
  
The minus sign is used to left justify the output.
 
  
 +
The '''minus sign''' is used to left justify the output.
  
 +
 +
|-
 +
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 8
 +
 +
 +
|style="border:1pt solid #000000;padding:0.176cm;"| This brings us to the end of this tutorial.
 +
 +
 +
Let us move back to our slides.
 +
 +
 +
Let us summarize.
 +
 +
In this tutorial we learnt
 +
 +
To print using awk
 +
 +
Regular expression in awk
 +
 +
To list the enteries for a paritcular stream
 +
 +
To list only the second and the third fileds
 +
 +
To display a formatted output
 +
 +
 +
|-
 +
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 9
 +
 +
| style="border:1pt solid #000000;padding:0.176cm;" | As an assignment
 +
 +
Display roll no., stream and marks of Ankti Saraf
  
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 11
+
| style="border:1pt solid #000000;padding:0.097cm;"| Display Slide 10
  
 
Acknowledgement Slide
 
Acknowledgement Slide
Line 302: Line 330:
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| Display Slide 12
+
| style="border:1pt solid #000000;padding:0.176cm;"| Display Slide 11
  
 
Spoken Tutorial Workshops
 
Spoken Tutorial Workshops
Line 319: Line 347:
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| Display Slide 13
+
| style="border:1pt solid #000000;padding:0.176cm;"| Display Slide 12
  
 
Acknowledgement
 
Acknowledgement

Latest revision as of 16:32, 22 March 2018

Title of script: The Awk Command

Author: Sachin Patil and Anirban

Keywords: Selection criteria, action, formatted printing, fields, Regular expressions, Variables


Visual Cue
Narration
Display Slide 1 Welcome to the spoken tutorial on the awk command.
Display Slide 2

Learning Objective

In this tutorial we will learn,

awk command.

We will do this through some examples.

Display Slide 3

System requirement

To record this tutorial, I am using
  • Ubuntu Linux 12.04 OS
  • GNU BASH v. 4.2.24

Please note, GNU Bash version 4 or above is recommended to practice this tutorial.

Display Slide 4


Introduction

Let us start with an introduction to awk.

The awk command is a very powerful text manipulation tool.

It is named after its authors, Aho, Weinberger and Kernighan.

Continue Slide


It can perform several functions.

It operates at the field level of a record.

So, it can easily access and edit the individual fields of the record.

Let us see some examples.

For demonstration purpose, we use the awkdemo.txt file.

Let us see the contents of awkdemo.txt file.


Now open the terminal by pressing

CTRL + ALT and T keys simultaneosuly on your keyboard.


type:

Now open the terminal by pressing CTRL + ALT and T keys simultaneosuly on your keyboard.
type:


"awk '/Pass/ {print}' awkdemo.txt" [enter]

Let us see how to print using awk command.

Type:

awk space (within single quote) (front slash) ‘/Pass (front slash)/(opening curly bracket) {print (closing curly bracket)} (after the quotes) space awkdemo.txt


Press Enter

Here Pass is the selection criterion.


All the lines of the awkdemo where Pass occurs are printed.


The action here is print.

Type

"awk '/M[ei]*ra*/ {print}' awkdemo.txt" [enter]

We can also use regular expressions in awk


Say we want to print records of students with name Mira.


We would type:


awk space '/M opening square bracket [ ei closing square bracket ]*ra */{ print}' space awkdemo.txt


Press Enter.


"*" will give one or more occurrences of previous character.


Thus entries with more than one occurrence for i, e and a will be listed.


For ex.

  • Meera
  • Mira
  • Meeraa


awk supports the extended regular expressions (ERE).

Which means we can match multiple patterns separated by a PIPE.

Type

"awk '/civil|electrical/ {print}' awkdemo" [enter]

Now type:


awk space (in single quotes)(front slash) ‘/civil (vertical bar)|electrical (front slash)/{print}' awkdemo.txt


Press Enter.


Now entries for both civil and electrical are shown.

Display slide 7


Let us go back to our slides.


awk has some special parameters to identify individual fields of a line.


$1(Dollar 1) would indicate the first field.


Similarly we can have $2, $3 and so on for respective fields.


$0 represents the entire line.

Switch to the terminal


Type: cat awkdemo.txt

Note that each word is separated by PIPE in the file awkdemo.txt.


In this case PIPE is called a delimiter.


A delimiter separates words from each other.


A delimiter can also be a single whitespace.


To specify a delimiter, we have to give - capital F flag followed by a delimiter.


Type

“awk -F "|" '/civil|electrical/ {print $0}' awkdemo ” [enter]


Lets go back to the terminal.


So we can write the last command as:


awk space minus capital F space within double quotes PIPE space within single quote front slash civil PIPE electrical front slash within curly braces print space dollar0 space awkdemo.txt

Press Enter

This print the entire line since we have used $0.

Type

“awk -F"|" '/Pass/ {print $2, $3}' awkdemo” [enter]

Notice that, names and stream of students are the second and third fields.


Say we only want to print two fields.


We will replace $0 with $2 and$3 in the above command.


Press Enter

Only two fields are shown.

Though it gives the right result, the display is all jagged and unformatted.


“awk -F"|" '/Pass/ {printf "%4d %-25s %-15s \n",

NR,$2,$3 }' awkdemo” [enter]


We can provide formatted output by using the C style printf statement.


We can also provide a serial number by using a builtin variable NR.


We will see more about builtin variables later.


Now type:

awk space -F”|” space '/Pass/{printf “%4d %-25s %-15s \n”, NR,$2,$3 }' space awkdemo.txt


Press Enter.

We see the difference.


Here the NR stands for number of records.


Records are integers, hence we have written %d.


Name and Stream are strings. So we have used %s.


Here 25s will reserve 25 spaces for Name field.


15s will reserve for Stream field.


The minus sign is used to left justify the output.


Display Slide 8


This brings us to the end of this tutorial.


Let us move back to our slides.


Let us summarize.

In this tutorial we learnt

To print using awk

Regular expression in awk

To list the enteries for a paritcular stream

To list only the second and the third fileds

To display a formatted output


Display Slide 9 As an assignment

Display roll no., stream and marks of Ankti Saraf


Display Slide 10

Acknowledgement Slide


Watch the video available at the link shown below

It summarises the Spoken Tutorial project

If you do not have good bandwidth, you can download and watch it

Display Slide 11

Spoken Tutorial Workshops


The Spoken Tutorial Project Team

Conducts workshops using spoken tutorials

Gives certificates to those who pass an online test

For more details, please write to

contact@spoken-tutorial.org

Display Slide 12

Acknowledgement


Spoken Tutorial Project is a part of the Talk to a Teacher project

It is supported by the National Mission on Education through ICT, MHRD, Government of India

More information on this Mission is available at: http://spoken-tutorial.org\NMEICT-Intro

No Last Slide for tutorials created at IITB

Display the previous slide only and narrate this line.

The script has been contributed by Sachin Patil.

This is Ashwini Patil from IIT Bombay signning off. Thank you for joining.

Contributors and Content Editors

Ashwini, Nancyvarkey