Linux-Ubuntu/C3/Mastering-grep/English

From Script | Spoken-Tutorial
Jump to: navigation, search

TITLE: Mastering Grep

Author: EduPyramids

Keywords: grep, search, pattern matching, regular expressions, extended regex, case-insensitive search, character classes, anchors, dot operator, asterisk operator, Linux, Ubuntu, Bash, text search, EduPyramids, video tutorial.


Visual Cue Narration
Slide 1

Title Slide

Welcome to this spoken tutorial on Mastering grep.
Slide 2

Learning Objectives

In this tutorial, we will learn to:

  • Match more than one pattern
  • Check a word that has a different spelling
  • Character classes
  • Use of * operators
  • Match any one character using dot
  • Match a pattern at the beginning and ending of a line.
In this tutorial, we will learn to:
  • Match more than one pattern.
  • Check a word that has a different spelling.
  • Character classes.
  • Use of asterisk operators.
  • Match any one character using a dot.
  • Match a pattern at the beginning and ending of a line
Slide 3

System Requirements

To record this tutorial, I am using:

Ubuntu OS version 24 point zero 4.

Slide 4

Pre-requisites

https://EduPyramids.org


To follow this tutorial,

Learners should have Ubuntu version 24 point zero 4.

And should be familiar with basic Linux terminal commands.

For the prerequisite Linux tutorials please visit this website.

Slide 5

Code files

grepdemo.txt

grep-commands.txt

The following code files are required to practice this tutorial.

These files are provided in the Code Files link of this tutorial page.

Let us get started with grep commands.
Note: Please type the commands on the terminal don't paste as the double quotes are wrong.

Type grep -e "electronics” -e "civil" grepdemo.txt Press Enter.

Let us get the details of students from the Civil or Electronics stream.

We will use the same example file, grep demo dot t x t.

We can match multiple patterns using the hyphen e option in grep.

Type this command and press Enter.

Output displays both the civil and electronics students records.

Type grep -ie “choudhury” -ie “chowdhari” grepdemo.txt Press Enter

Type clear and press Enter.

Now we wish to search for people whose title is Choudhury.

The issue is that the title may be spelled in different ways.

How can we handle this?

In such cases, we can perform a case-insensitive search using the hyphen i option.

We can as well combine it with multiple hyphen e options.

This helps to match different spellings of the same word. Type this command and press Enter.

The output is displayed. However, there can be many other ways to write the name.

We could use many -e options, but a better solution is Regular Expressions.

Let me clear the screen.

There are several special characters used in regular expressions.
Slide 6

charcter-class.png

Character class

A character class is a part of a regular expression.

It allows us to define a group of characters inside square brackets.

When a pattern is matched, only one character from this group is selected.

For example, square bracket a b c matches either a, b, or c.

The pattern square bracket zero to nine matches any one digit from zero to nine.

Square bracket a to z matches any one lowercase letter.

Character classes are useful when more than one character is allowed.

To specify a larger range, we use the format: first character hyphen last character.

For example, [0-9] matches any digit.

Only one character from the range is matched at a time.

Let us look at some examples.
In terminal type grep -i “ch[ao][uw]dh[ua]r[yi]”

grepdemo.txt

Press Enter Add annotation for this

To understand character class type this command and press Enter.

Observe the output.

This matches most variations of the name chowdhury spelt differently.

However, it still does not match choudhuree with double e.

Slide 7

The asterisk (*) operator

The * operator is a regular expression operator.

It matches zero or more repetitions of the preceding character or pattern.

This allows us to match repeated characters, or even when the character is absent.

For example, ab* matches a, ab, abb, abbb, and so on.

The asterisk operator is a regular expression operator.

It matches zero or more repetitions of the preceding character or pattern.

This allows matching repeated characters or cases where the character is absent.

For example, the pattern ab asterisk (*) matches a, ab, abb, abbb, and so on.

Type:

grep -i “m[ei]*ra*” grepdemo.txt press Enter

Let us match a student's name Mira in the file.

Type this command and press Enter to see the output.

We see records of Mira with 3 different spellings here.

Slide 8

The Dot Operator

In regular expressions, the dot (.) operator is a special character.

It matches any single character, except the newline character.

For example:* a.c matches abc, a1c, or a_c

  • It does not match ac because one character is required between a and c


The dot operator is used when the character is unknown but its position is fixed.

The dot is a special character in regular expressions.

It matches any single character, except the newline character.

For example, the pattern a dot c can match a b c, a 1 c, or a underscore c.

However, it will not match a c, because one character must appear between a and c.

The dot operator is used when the character is unknown but its position is fixed.

Type grep "M… " grepdemo.txt

press Enter

For example, type this command and press Enter.

It searches for four letter words that start with M.

Each dot matches one character.

The space after the dots ensures only four-letter matches.

This avoids matching words longer than four letters.

The output shows records for students Mani and Mira.

Slide 9

Anchors (^ and $)

An anchor is a special symbol in regular expressions.

It specifies where a pattern should match in a line.

It does not match any character, it matches only a position.

The two most common anchors are ^ and $.

To match a pattern at the beginning of a line, we use the caret ^ symbol.

To match a pattern at the end of a line, we use the dollar sign $.

An anchor is a special symbol used in regular expressions.

It specifies where a pattern should match in a line.

Anchors do not match any character, they match only a position.

The two most common anchors are the caret and the dollar sign. We use caret to match a pattern at the beginning of a line.

We use dollar to match a pattern at the end of a line.

At the prompt type grep "^A" grepdemo.txt

press Enter

Point to the output.

Let us use anchors.

Now, we will extract entries with roll numbers starting with A.

The roll number is the first field in the file.

Type this command and press Enter.

Only lines with roll numbers starting with A are shown.

The caret acts as an anchor for the beginning of the line.

Grep matches lines where the first character is A.

All other lines, starting with different characters, are ignored.

Add annotation.Press Ctrl and L keys together to clear the screen. Let me clear the screen
Type: grep "1$" grepdemo.txtPress Enter.

Type: grep "[78]...$" grepdemo.txt

press Enter

Let us match a pattern at the end of the file.

To match a pattern at the end of a line, we use the dollar sign.

Here is the output.

To find stipends between 7000 to 8999. Type this command and press Enter.

Only lines with stipend numbers ending in the specified digit are shown.

That is, the numbers between 7000 and 8999.

Here, it searches for 7 or 8 first.

Then any 3 characters following it, from the end of the file grep demo dot t x t.

Slide 10

Summary

In this tutorial, we have learnt to:

  • Match more than one pattern
  • Check a word that has a different spelling
  • Character classes
  • Use of * operators
  • Match any one character using dot
  • Match a pattern at the beginning and ending of a line.
With this we come to the end of this tutorial.

Let us summarise.

Slide 11

Assignment

As an assignment# Search for students whose names contain the letters “ra” in sequence.

  1. Find entries where the stream is either “Mechanical” or “Electrical.”
  2. List all students whose roll numbers end with the digit 5.
  3. Count how many students have a stipend greater than 5000.
As an assignment, please do the following.
Slide 12

Thank you

This Spoken Tutorial is brought to you by EduPyramids educational services private limited SINE IIT Bombay. Thank you.

Contributors and Content Editors

Ketkinaina