Linux-AWK/C2/MultiDimensional-Array-in-awk/English-timed
|
|
00:01 | Hello and welcome to the spoken tutorial on creating multidimensional arrays in awk. |
00:07 | In this tutorial, we will learn to-
create multidimensional array in awk and scan the multidimensional array. |
00:18 | We will do this through some examples. |
00:21 | To record this tutorial, I am using:
Ubuntu Linux 16.04 Operating System and gedit text editor 3.20.1 |
00:33 | You can use any text editor of your choice. |
00:37 | To practice this tutorial, you should have gone through previous awk tutorials on array, on this website. |
00:45 | You should have some basic knowledge of any programming language like C or C++. |
00:52 | If not, then please go through the corresponding tutorials on our website. |
00:58 | The files used in this tutorial are available in the Code Files link on this tutorial page.
Please download and extract them. |
01:08 | What is a multidimensional array in awk? |
01:12 | We know that in single dimensional arrays, an array element is identified by a single index. |
01:19 | For example, array week is identified by a single index, day. |
01:26 | However, in multidimensional array, an element is identified by a sequence of multiple indices. |
01:34 | For example, a two dimensional array element is identified by a sequence of 2 indices. |
01:42 | Here, multiple indices are concatenated into a single string, with a separator between them. |
01:50 | The separator is the value of the built-in variable SUBSEP. |
01:55 | The combined string is used as a single index for a simple one dimensional array. |
02:01 | For example, suppose we write multi within square brackets 4 comma 6 equal to value in double quotes. |
02:11 | Here, multi is the name of multi-dimensional array.
Then, the numbers 4 and 6 are converted to a string. |
02:21 | Suppose, the value of SUBSEP is hash symbol (#). |
02:26 | Then, those numbers are concatenated with a hash (#) symbol between them. |
02:32 | So, the array element multi within square brackets within double quotes 4 hash 6 is set to value within double quotes. |
02:43 | The default value of SUBSEP is the string within double quotes backslash 034. |
02:50 | It is actually a nonprinting character.
It will not appear usually in most input data. |
02:58 | Let us try to declare a two dimensional array as shown in the slide. |
03:03 | Row 1 contains two elements A and B. |
03:08 | Row 2 has two elements C and D. |
03:12 | Open the terminal by pressing Ctrl, Alt and T keys. |
03:17 | Go to the folder in which you have downloaded and extracted the Code Files using 'cd' command. |
03:24 | Now, define the array as follows. Type the command carefully as shown here.
Then press Enter. |
03:35 | We get a command prompt back without any error.
So, the array is defined. |
03:41 | We do not get any output because we have not given anything to print in the code. |
03:47 | Let us add the print statement. |
03:50 | Press the up arrow key to get the previously executed command in the terminal. |
03:56 | Just before the closing curly bracket, type: semicolon space print space a within square brackets 2 comma 2.
Press Enter to execute the command. |
04:13 | Notice, we get the output as capital D. |
04:18 | How to test if a particular index sequence exists in a given multidimensional array? |
04:25 | We can use in operator. |
04:28 | We have already seen it in single-dimensional array earlier in this series. |
04:34 | We have to write the entire sequence of indices within parentheses and separated by commas. |
04:42 | Let us see this in an example. |
04:45 | I have already written a script named test_multi.awk. |
04:51 | The same is available in the Code Files link of this tutorial page. |
04:56 | I have defined a 2 by 2 array as seen in our previous discussion. |
05:02 | Then I have written two 'if' conditions. |
05:06 | The first if condition checks whether the element at the index one comma one, is present or not. |
05:13 | We have to write the index for multidimensional array within parentheses. |
05:18 | If the condition is true, it will print one comma one is present. |
05:23 | Else, it will print one comma one is absent. |
05:28 | Similarly, we will check for the presence of the element at index three comma one.
Let us execute the file. |
05:36 | Switch to the terminal and type: awk space hyphen small f space test underscore multi dot awk and press Enter. |
05:49 | The output says: one comma one is present and three comma one is absent. |
05:55 | Let us take one more example.
Say, we want to create the transpose of a matrix. |
06:02 | The transpose of a given matrix is formed by interchanging the rows and columns of a matrix.
How can we do this? |
06:11 | I have created a two-dimensional array matrix in the file 2D-array.txt. |
06:19 | I have written a code named transpose.awk. |
06:24 | First, look at the action section of this awk script. |
06:29 | Here, we are calculating the maximum number of fields in a row.
And, stored the calculated value in the variable max_nf. |
06:40 | As we know, NR is the number of current records processed by awk.
Value of NR is stored in max_nr variable. |
06:50 | Awk will process the input file from the first record to the last record. |
06:56 | When awk is processing the first record, max_nr will be equal to 1. |
07:03 | While processing second record, max_nr will be 2 and it continues this way. |
07:11 | When awk is processing the last record, max_nr will store the total number of records. |
07:19 | Now, we should read the data from input file and store the data into a two dimensional array. |
07:26 | Inside the 'for' loop, we have iterator variable x. |
07:31 | x will traverse from one to NF and x will be incremented by 1 after each iteration. |
07:39 | For every value of x, $x(dollar x) represents the value at field x. |
07:46 | That value will be stored in array matrix at index NR comma x. |
07:53 | For example, matrix of 1 comma 1 stores the value which is present at index 1 comma 1 from the input file. |
08:02 | So, after awk has processed the entire input file with this code, matrix array will be completely formed. |
08:10 | It will store entire data of input file into a two dimensional array format. |
08:16 | Now, let’s us look inside the END section. |
08:20 | We have written a nested for loop to print the transpose of the matrix. |
08:25 | I assume your familiarity with basic C programming.
So, I am not explaining this portion of code in detail. |
08:34 | Pause the video here to look at the code in detail and understand on your own. |
08:40 | Now, we will learn how to scan a multidimensional array. |
08:45 | Awk does not have a multi-dimensional array in the truest sense. |
08:50 | So, there cannot be any special 'for' statement to scan the multidimensional array. |
08:56 | You can have multidimensional way to scan an array. |
09:00 | You can combine the 'for' statement with the split function for this. |
09:05 | Let us see what the split function is.
split function is used to chop up or split a string into pieces |
09:14 | and place the various pieces into an array. |
09:18 | The syntax is as follows. First argument contains the string to be chopped. |
09:25 | Second argument specifies the name of the array where split() will put the chopped pieces into. |
09:33 | The third argument mentions the separator that will be used to chop the string up. |
09:39 | The first piece is stored in arr 1, |
09:43 | the second piece in arr 2 and so forth. |
09:48 | Suppose, we want to recover the original sequence of indices from an already created array.
How can we do this? |
09:56 | I have written a code named multi_scan.awk. |
10:02 | Entire code is written inside the BEGIN section. |
10:06 | First we have created an array named a and assigned these values to it. |
10:12 | Then we have the for loop with an iterator. |
10:16 | The iterator will be set to each of the indices values for each iteration-
say, 1 comma 1, then 1 comma 2 and so on. |
10:27 | The split() function breaks the iterator into pieces separated by SUBSEP. |
10:34 | The pieces will be stored in the array arr. |
10:38 | So, arr[1] and arr[2] will contain the first index and second index respectively.
Let us execute this file. |
10:48 | Switch to the terminal and type- awk space hyphen small f space multi underscore scan dot awk
and press Enter. |
11:01 | See the output; the original sequence of indices are recovered. |
11:07 | Let us summarize. In this tutorial, we learnt to- create a multidimensional array in awk and
scan a multidimensional array. |
11:18 | As an assignment,
write an awk script to rotate a two dimensional array by 90 degree and print the rotated matrix. |
11:28 | The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
11:36 | The Spoken Tutorial Project team conducts workshops using spoken tutorials.
And, gives certificates on passing online tests. |
11:45 | For more details, please write to us. |
11:49 | Please post your timed queries in this forum. |
11:53 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this mission is available at this link. |
12:05 | The script has been contributed by Antara. And this is Praveen from IIT Bombay, signing off.
Thank you for joining. |