Unix tutorial: All about awk command

Awk is a powerfull command. In this post,i will try to put all about awk command

What is awk in Unix?
a)awk is a Pattern Scanning and Processing Language.

b)awk is a programming language whose basic operation is to search a set of files for
patterns, and to perform specified actions upon lines or fields of lines which contain
instances of those patterns.

An awk program is a sequence of statements of the form:
pattern { action }
pattern { action }

Usage: awk -f pfile [files]
How it works
a) Awk input is divided into “records” terminated by a record separator. The default record
separator is a newline, so by default awk processes its input a line at a time. The number of
the current record is available in a variable named NR.

b)Each input record is considered to be divided into “fields.” Fields are normally
separated by white space — blanks or tabs. Fields are referred to as $1, $2, and so forth,
where $1 is the first field, and $0 is the whole input record itself. Fields may be assigned to
e.g., to swap $5 and $6 awk “{temp=$5; $5=$6; $6=temp; print $0}” filename.
The number of fields in the current record is available in a variable named NF.
We can put all the pattern action in the file and execute against a set of file

Usage: awk ‘program’ [filename]*
awk -f cmdfile [filename]*

What is awk patterns

selector that determines whether action is to be executed

pattern can be:
a)the special token BEGIN or END
b) regular expressions
c) arithmetic relation operators
d) string-valued expressions
e) arbitrary combination of the above

BEGIN and END provide a way to gain control before and after processing, for initialization and wrap-up.
BEGIN: actions are performed before the first input line is read.
END: actions are done after the last input line has been processed.

What is awk fields:
a)Each input line is split into fields.
b)FS: field separator: default is blanks or tabs
c) $0 is the entire line
d)$1 is the first field, $2 is the second field, …. $NF
e) NF is a built-in variable whose value is set to the number of fields.

What is awk records:
a)newline: Default record separator
b)So, by default, AWK processes its input a line at a time.
c)NR is the variable whose value is the number of the current record.
d) RS: record separator
What is Awk Working Methodology
a)Awk reads the input files one line at a time.Each line is called record and Each record is splits into the field
b) For each line, it matches with given pattern in the given order, if matches performs the corresponding action.

cat file1|awk ‘pattern { action }’

c) If no pattern matches, no action will be performed.
d.In the above syntax, either search pattern or action are optional, But not both.
e)If the search pattern is not given, then Awk performs the given actions for each line of the input.
cat file1|awk ‘ { action }’
f)If the action is not given, print all that lines that matches with the given patterns which is the default action.
e)Empty braces with out any action does nothing. It wont perform default printing operation.

Some important function in unix

length” function to compute length of a string e.g. { print length, $0}

substr(s, m, n) produces the substring of s that begins at position m and is at most n characters long.

Various Examples of awk with detailed usage of awk

1) suppose we want to know names of oracle database running on the server, then below command can be used

ps -ef|grep pmon |awk ‘{print $NF}’|awk -F”-” ‘{print $2}’

2) More complex awk scripts need to be run from a file. The syntax for such cases is:

cat input1 | awk -f a.awk > output1

where input1 is the input file, output1  is the output file, and a.awk is a file containing awk commands.

3) In awk
NR – Line number of current input line.

NF – Number of fields in the current line.

The variable FILENAME contains the name of the current input file.

{ print NR, NF, $0 } – if items are not separated by commas the output will be concatenated.

4) awk ‘{print length($5)}’ file – Print length of string in 5nd column

5) Add up first second, print sum and average:
This command in useful in calculating the filesystem size
awk ‘ { s += $2 } END { print “sum is”, s, ” average is”, s/NR }’

6) awk ‘{line = $0} END {print line}‘ – Print the last line

7) cat file1|awk ‘/scott/ {tlines = tlines + 1} END {print tlines}‘ – Print the total number of lines that contain the word scott
8) awk ‘/start/, /stop/’ file – Print all lines between start/stop pairs

9) awk ‘$1 != prev { print; prev = $1 }’ file – Print all lines whose first field is different from previous one

10) awk ‘$2 > $1 {print $3}’ file – Print column 3 if column 2 > column 1

11) Count number of lines where col 3 > col 1
awk ‘$3 > $1 {print i + “1”; i++}’ file

awk ‘{$2 = “”; print}’ file – Print every line after erasing the 2nd field

12 ) If you have another character that delimits fields, use the -F option.
If the dilimiter is |

awk -F”|” ‘$2==”High”{print $4}’ filename
13) The below action can be used to Find maximum and minimum values present in column 5
NR == 1 {m=$5 ; p=$5}
$5 >= m {m = $5}
$5 <= p {p = $5}
END { print “Max = ” m, ” Min = ” p }

14) Example of defining variables, multiple commands on one line
NR == 1 {prev=$4; preva = $1; prevb = $2; n=0; sum=0}
$4 != prev {print preva, prevb, prev, sum/n; n=0; sum=0; prev = $4; preva = $1; prevb = $2}
$4 == prev {n++; sum=sum+$5/$6}
END {print preva, prevb, prev, sum/n}

15) Example of using substrings:

substr($2,9,7) picks out characters 9 thru 15 of column 2

awk ‘{print “jockey”, substr($2,1,7) ” – ” $3, “out.”substr($2,5,3)}’


16) Print command emulates the cat command of unix

{ print $1 >”foo1″; print $2 >>”foo2” } – Output may be diverted to multiple files. There is a limit
on the number of output files; currently it is 10.

The file name can be a variable or a field as well as a constant; for example,
print $1 >$2
uses the contents of field 2 as a file name.


17) length > 72 – prints all input lines whose length exceeds 72 characters.

18) cat file|awk  { print $2, $1 } – Print first two fields in opposite order

19) $0 !~ /Format/ – print all lines which do not have word Format.


20) Just tell Awk to print an extra blank line if the current line is not blank:

awk ‘{print ; if (NF != 0) print “”}’ infile > outfile

The email addresses of various different groups were placed on consecutive lines in the file, with the different groups separated by blank lines. If I wanted to quickly and reliably determine how many people were on the distribution list, then use:

awk ‘NF != 0 {++count} END {print count}’ list