Linux-Ubuntu/C3/Mastering-grep/English
TITLE: Mastering Grep
Author: EduPyramids
Keywords: grep, search, pattern matching, regular expressions, extended regex, case-insensitive search, character classes, anchors, dot operator, asterisk operator, Linux, Ubuntu, Bash, text search, EduPyramids, video tutorial.
| Visual Cue | Narration |
| Slide 1
Title Slide |
Welcome to this spoken tutorial on Mastering grep. |
| Slide 2
Learning Objectives In this tutorial, we will learn to:
|
In this tutorial, we will learn to:
|
| Slide 3
System Requirements |
To record this tutorial, I am using:
Ubuntu OS version 24 point zero 4. |
| Slide 4
Pre-requisites
|
To follow this tutorial,
Learners should have Ubuntu version 24 point zero 4. And should be familiar with basic Linux terminal commands. For the prerequisite Linux tutorials please visit this website. |
| Slide 5
Code files grepdemo.txt grep-commands.txt |
The following code files are required to practice this tutorial.
These files are provided in the Code Files link of this tutorial page. |
| Let us get started with grep commands. | |
| Note: Please type the commands on the terminal don't paste as the double quotes are wrong.
Type grep -e "electronics” -e "civil" grepdemo.txt Press Enter. |
Let us get the details of students from the Civil or Electronics stream.
We will use the same example file, grep demo dot t x t. We can match multiple patterns using the hyphen e option in grep. Type this command and press Enter. Output displays both the civil and electronics students records. |
| Type grep -ie “choudhury” -ie “chowdhari” grepdemo.txt Press Enter
Type clear and press Enter. |
Now we wish to search for people whose title is Choudhury.
The issue is that the title may be spelled in different ways. How can we handle this? In such cases, we can perform a case-insensitive search using the hyphen i option. We can as well combine it with multiple hyphen e options. This helps to match different spellings of the same word. Type this command and press Enter. The output is displayed. However, there can be many other ways to write the name. We could use many -e options, but a better solution is Regular Expressions. Let me clear the screen. |
| There are several special characters used in regular expressions. | |
| Slide 6
charcter-class.png Character class |
A character class is a part of a regular expression.
It allows us to define a group of characters inside square brackets. When a pattern is matched, only one character from this group is selected. For example, square bracket a b c matches either a, b, or c. The pattern square bracket zero to nine matches any one digit from zero to nine. Square bracket a to z matches any one lowercase letter. Character classes are useful when more than one character is allowed. To specify a larger range, we use the format: first character hyphen last character. For example, [0-9] matches any digit. Only one character from the range is matched at a time. |
| Let us look at some examples. | |
| In terminal type grep -i “ch[ao][uw]dh[ua]r[yi]”
grepdemo.txt Press Enter Add annotation for this |
To understand character class type this command and press Enter.
Observe the output. This matches most variations of the name chowdhury spelt differently. However, it still does not match choudhuree with double e. |
| Slide 7
The asterisk (*) operator The * operator is a regular expression operator. It matches zero or more repetitions of the preceding character or pattern. This allows us to match repeated characters, or even when the character is absent. For example, ab* matches a, ab, abb, abbb, and so on. |
The asterisk operator is a regular expression operator.
It matches zero or more repetitions of the preceding character or pattern. This allows matching repeated characters or cases where the character is absent. For example, the pattern ab asterisk (*) matches a, ab, abb, abbb, and so on. |
| Type:
grep -i “m[ei]*ra*” grepdemo.txt press Enter |
Let us match a student's name Mira in the file.
Type this command and press Enter to see the output. We see records of Mira with 3 different spellings here. |
| Slide 8
The Dot Operator In regular expressions, the dot (.) operator is a special character. It matches any single character, except the newline character. For example:* a.c matches abc, a1c, or a_c
|
The dot is a special character in regular expressions.
It matches any single character, except the newline character. For example, the pattern a dot c can match a b c, a 1 c, or a underscore c. However, it will not match a c, because one character must appear between a and c. The dot operator is used when the character is unknown but its position is fixed. |
| Type grep "M… " grepdemo.txt
press Enter |
For example, type this command and press Enter.
It searches for four letter words that start with M. Each dot matches one character. The space after the dots ensures only four-letter matches. This avoids matching words longer than four letters. The output shows records for students Mani and Mira. |
| Slide 9
Anchors (^ and $) An anchor is a special symbol in regular expressions. It specifies where a pattern should match in a line. It does not match any character, it matches only a position. The two most common anchors are ^ and $. To match a pattern at the beginning of a line, we use the caret ^ symbol. To match a pattern at the end of a line, we use the dollar sign $. |
An anchor is a special symbol used in regular expressions.
It specifies where a pattern should match in a line. Anchors do not match any character, they match only a position. The two most common anchors are the caret and the dollar sign. We use caret to match a pattern at the beginning of a line. We use dollar to match a pattern at the end of a line. |
| At the prompt type grep "^A" grepdemo.txt
press Enter Point to the output. |
Let us use anchors.
Now, we will extract entries with roll numbers starting with A. The roll number is the first field in the file. Type this command and press Enter. Only lines with roll numbers starting with A are shown. The caret acts as an anchor for the beginning of the line. Grep matches lines where the first character is A. All other lines, starting with different characters, are ignored. |
| Add annotation.Press Ctrl and L keys together to clear the screen. | Let me clear the screen |
| Type: grep "1$" grepdemo.txtPress Enter.
Type: grep "[78]...$" grepdemo.txt press Enter |
Let us match a pattern at the end of the file.
To match a pattern at the end of a line, we use the dollar sign. Here is the output. To find stipends between 7000 to 8999. Type this command and press Enter. Only lines with stipend numbers ending in the specified digit are shown. That is, the numbers between 7000 and 8999. Here, it searches for 7 or 8 first. Then any 3 characters following it, from the end of the file grep demo dot t x t. |
| Slide 10
Summary In this tutorial, we have learnt to:
|
With this we come to the end of this tutorial.
Let us summarise. |
| Slide 11
Assignment As an assignment# Search for students whose names contain the letters “ra” in sequence.
|
As an assignment, please do the following. |
| Slide 12
Thank you |
This Spoken Tutorial is brought to you by EduPyramids educational services private limited SINE IIT Bombay. Thank you. |