Linux-AWK/C2/MultiDimensional-Array-in-awk/English

From Script | Spoken-Tutorial
Revision as of 09:40, 23 February 2018 by Antarade (Talk | contribs)

Jump to: navigation, search

Title of script: Multi-Dimensional array in awk

Author: Antara Roy Choudhury

Keywords: Multi-dimensional array, SUBSEP, Multi-dimensional scanning, Creating transpose of a matrix


Visual Cue
Narration
Display Slide 1 Welcome to the spoken tutorial on creating multidimensional arrays in awk.
Display Slide 2 In this tutorial we will learn to-
  • Create multidimensional array in awk
  • and scan the multidimensional array

We will do this through some examples.

Display Slide 3

System requirement

To record this tutorial, I am using
  • Ubuntu Linux 16.04 OS and
  • gedit text editor 3.20.1

You can use any text editor of your choice.

Display Slide 4

pre-requisite

To practice this tutorial, you should have gone through previous awk tutorials on array on this website.


You should have some basic knowledge of any programming language like C or C++.


If not, then please go through the corresponding tutorials on our website.

Slide 5: Code Files The files used in this tutorial are available in the Code Files link on this tutorial page.


Please download and extract them.

Display Slide 5a What is a multidimensional array in awk?


We know that in single dimensional arrays, an array element is identified by a single index.


For example, array week is identified by a single index, day.

Display Slide 5b However, in multidimensional array, an element is identified by a sequence of multiple indices.


For example, a two dimensional array element is identified by a sequence of 2 indices.

Display Slide 6a Here, multiple indices are concatenated into a single string,


with a separator between them.


The separator is the value of the built-in variable SUBSEP.

Display Slide 6b This combined string is used as a single index for a simple one dimensional array.
Display slide 7 For example,


suppose we write multi[4,6]="value" in double quotes

Here multi is the name of multi-dimensional array.


Then, the numbers 4 and 6 are converted to a string.

Suppose,


the value of SUBSEP is hash symbol (#).


Then, those numbers are concatenated with a hash symbol (#) between them.

Display slide 8 So, the array element multi within square brackets within double quotes 4 hash 6 is set to value within double quotes.


The default value of SUBSEP is the string within double quotes backslash 034.


It is actually a nonprinting character.


It will not appear usually in most input data.

Display slide 9 Let us try to declare a two dimensional array as shown in the slide.


Row 1 contains two elements A and B


Row 2 has two elements C and D

Open the terminal by pressing Ctrl, Alt and T keys
cd /<saved folder> Go to the folder in which you downloaded and extracted the Code Files using cd command
Type at the terminal:

awk 'BEGIN{a[1,1]="A";a[1,2]="B";a[2,1]="C"; a[2,2]="D"}'


[Enter]

Now define the array as follows.


Type the command carefully as shown here.


Then press Enter.

Show terminal We get a command prompt back without any error.


So, the array is defined.


We do not get any output because we have not given anything to print in the code.


Let us add the print statement.

Press Up key Press the Up arrow key to get the previously executed command in the terminal.
Modify previous code to add the highlighted portion


awk 'BEGIN{a[1,1]="A";a[1,2]="B";a[2,1]="C"; a[2,2]="D";print a[2,2]}'

[Enter]

Just before the closing curly bracket, type a semicolon.


And then type print space a (within square bracket) 2 comma 2


Press Enter to execute the command.

Show the output Notice, we get the output as capital D.
Display slide 10 How to test if a particular index sequence exists in a given multidimensional array?


We can use in operator.

We have already seen it in single-dimensional array earlier in this series.


We have to write the entire sequence of indices within parentheses


and separated by commas.


Let us see this in an example

show test_multi.awk in Gedit I have already written a script named test_multi.awk


The same is available in the Code Files link of this tutorial.

Highlight

a[1,1]="A";a[1,2]="B";a[2,1]="C"; a[2,2]="D"

I have defined a 2 by 2 array as seen in our previous discussion.
Highlight both the if statement Then I have written two if conditions.
Highlight 1st if statement The first if condition checks whether the element at the index one comma one, is present or not.
Highlight (1,1) We have to write the index for multidimensional array within parentheses.
Highlight print "(1,1) is present" If the condition is true, it will print one comma one is present.
Highlight print "(1,1) is absent" Else it will print one comma one is absent.
Highlight 2nd if statement

if((3,1) in a)

Similarly, we will check for the presence of the element at index three comma one.


Let us execute the file.

Type:


awk -f test_multi.awk

Switch to the terminal and type


awk space hyphen small f space test_multi.awk and press Enter.

Show output The output says one comma one is present and three comma one is absent.
Display slide 11 Let us take one more example.


Say, we want to create the transpose of a matrix.


The transpose of a given matrix is formed by interchanging the rows and columns of a matrix.


How can we do this?

Show 2D-array.txt in gedit I have created a two-dimensional matrix in the file 2D-array.txt.
Show transpose.awk in gedit I have written a code named transpose.awk
Highlight the 1st block


{

if (max_nf < NF)

max_nf = NF

max_nr = NR

for (x = 1; x <= NF; x++)

matrix[NR, x] = $x

}

First look at the action section of this awk script.
Highlight if (max_nf < NF)


max_nf = NF

Here we are calculating the maximum number of fields in a row.


And storing the calculated value in variable max_nf.

Highlight max_nr = NR As we know, NR is the number of current records processed by awk.


Value of NR is stored in max_nr variable.

Display slide 12 Awk will process the input file from the first record to the last record.


When awk is processing the first record, max_nr will be equal to 1.


While processing second record, max_nr will be 2 and it continues this way.


When awk is processing last record, max_nr will store the total number of records.

Highlight appropriately

for (x = 1; x <= NF; x++)

matrix[NR, x] = $x

Now we should read the data from input file and store the data in a two dimensional array.


Inside the for loop, we have iterator variable x.


x will traverse from one to NF, and x will be incremented by 1 after each iteration.


For every value of x, $x(dollar x) represents the value at field x.


That value will be stored in array matrix at index NR comma x.


For example, matrix of 1 comma 1 stores the value which is present at index 1 comma 1 from the input file.

Highlight

{

if (max_nf < NF)

max_nf = NF

max_nr = NR

for (x = 1; x <= NF; x++)

matrix[NR, x] = $x

}

So, after awk has processed the entire input file with this code,

matrix array will be completely formed.


It will store entire data of input file in two dimensional array format.

Highlight


END {

for (col = 1; col <= max_nf; col++) {

for (row = 1; row <= max_nr; row++)

printf("%s ", matrix[row, col])

printf("\n")

}


Now, let’s us look inside the END section.


We have written a nested for loop to print the transpose of the matrix.


I assume your familiarity with basic C programming.


So, I am not explaining this portion of code in detail.


Pause the video here to look at the code in detail and understand on your own.

Display slide 13a


Now, we will learn how to scan a multidimensional array.


Awk does not have multi-dimensional array in the truest sense.


So, there cannot be any special for statement to scan the multidimensional array.

Display slide 13b You can have multidimensional way to scan an array.


You can combine the for statement with the split function for this.

Display slide 14a Let us see what the split function is.


split function is used to chop up or split a string into pieces.


And place the various pieces into an array.

Display slide 14b The syntax is as follows.


First argument contains the string to be chopped.


Second argument specifies the name of the array where split will put the chopped pieces into.

Display slide 14c The third argument mentions the separator that will be used to chop the string up.


The first piece is stored in arr[1].


The second piece in arr[2] and so forth.

Retain same screen Suppose, we want to recover the original sequence of indices from an already created array.


How can we do this?

Show multi_scan.awk in gedit I have written a code named multi_scan.awk
Highlight entire code Entire code is written inside the BEGIN section.
Highlight a[1,1]="A";a[1,2]="B";a[2,1]="C"; a[2,2]="D" First we have created an array named a and assigned these values to it.



Highlight for loop Then we have the for loop with an iterator.


The iterator will be set to each of the indices values for each iteration.

Say 1,1 then 1,2 and so on.

Highlight split(iterator,arr,SUBSEP) Then the split function breaks the iterator into pieces separated by a SUBSEP.
Highlight arr inside the split function The pieces will be stored in the array arr.
Highlight arr[1] "," arr[2] So, arr[1] and arr[2] will contain the first index and second index respectively.


Let us execute the file.

Type: awk -f multi_scan.awk

[Enter]

Switch to the terminal and type

awk space hyphen small f space multi_scan.awk

and press Enter.

Show output See the output; the original sequence of indices are recovered.
Display Slide 15

Summary

Let us summarize.


In this tutorial we learnt to-

  • Create a multidimensional array in awk
  • and scan a multidimensional array


Display Slide 16

Assignment


As an assignment, write an awk script to
  • rotate a two dimensional array by 90 degree
  • and print the rotated matrix.


Display Slide 17

About Spoken Tutorial project

The video at the following link summarises the Spoken Tutorial project.


Please download and watch it.

Display Slide 18

Spoken Tutorial workshops

The Spoken Tutorial Project team conducts workshops using spoken tutorials.


And gives certificates on passing online tests.


For more details, please write to us.

Display Slide 19

Forum for specific questions:

Please post your timed queries in this forum.
Display Slide 20

Acknowledgement

Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.


More information on this mission is available at

this link.

The script has been contributed by Antara.


And this is Praveen from IIT Bombay signing off.

Thank you for joining

Contributors and Content Editors

Antarade, Nancyvarkey