Last Update: 20230923 20190611

 Local File: "C:\DAN\HTM\GoDaddy\dansher\utut\awk\awk_fyi.txt"

   Web Path: https://www.dansher.com/utut/awk/awk_fyi.txt

   See Also: https://www.awk.dev/
             https://www.amazon.com/dp/0138269726

==================================================================
WHY AWK:

Awk is a powerful programming language available on all UNIX systems.  
Since it is uncompiled, an awk program written on HP/UX will run
without modification on SOLARIS, LINUX, or any UNIX-like system.

In addition to branching and loop processing, awk offers the power of
multi-dimensional array processing plus a wealth of built-in string
and math functions.  It can read from and write to multiple files, and
execute system commands.

Good on-line resources for awk are at:

 http://www.grymoire.com/Unix/Awk.html  
 https://www.tutorialspoint.com/awk 
 https://www.datafix.com.au/BASHing/2019-06-07.html
==================================================================

USES FOR AWK:

Basic use is as a slice-'n-dice utility for data files.  awk can
parse any readable input (from a file, a pipe, or the keyboard)
into records and fields.  From these it can generate a nicely formatted
summary report from the bulky original.  The file can be a data file,
either fixed-length or field-delimited, or more free-form such as log
files almost always are.

==================================================================

SIMPLE AWK EXAMPLES:

The lines in a readable file may be counted using either wc or awk:

wc -l datafile.txt
awk 'END {print NR}' datafile.txt

Awk reads input one-line-at-a-time, usually from a file or from a pipe.
A simple (but wasteful) example of the latter is:

wc -l datafile.txt | awk '{print $1}'

Another example (also wasteful) of feeding awk through a pipe:

date '+%m%d%Y' |
awk '{print substr($0,1,2) "/" substr($0,3,2) "/" substr($0,5,4)}'

Note that the pipe feeds awk OK even if the pipe is on the line above
(nice for keeping program lines short enough to avoid overflow).

==================================================================

HOW AWK "SEES" A DATA FILE:

An awk data file (i.e., 'input") can be any text file. Such text
files are often generated as the output of a database query, with
each line in the output file being considered a "record."
Each record (i.e., every line) usually contains the same number of
"fields."  From record to record, a specific field typically contains
the same type of information.  For example, if a data file contains
a complete customer name and address on every line, one could expect
that the next-to-last field would always contain the state (e.g., CA).

Awk always parses its input records into "fields", known as $1, $2, etc.  
As awk passes through the data file, all data on each record is temporarily
assigned to $0 (zero).

By default, awk uses whitespace (one or more spaces or tabs)
as the Field Separator (delimiter), but other delimiter
characters can be used.  Consider the three-line data file below:

1^Attention:^this^is the^caret^delimited^data^file^"datafile.txt".^
12^Every^field^is separated from the next^by^the^caret^(shift 6)^character.^
4^Even though it^has^been^challenging,^every^line^contains only nine^fields.^

In the simple one-line awk program below, note how awk counts the Number of
Fields in each record of the data file using the default delimiter (whitespace):

awk '{print NF}' datafile.txt    # screen output follows:
2
6
5

In the example below, awk is explicitly told (via -F"^") to use the
caret character as the Field delimiter.

awk -F"^" '{print NF}' datafile.txt   # screen output follows:
10
10
10

The built-in awk "NF" variable always equals the actual number of fields
per line, plus one. It is a great mystery why this is so - just deal with it.

awk -F"^" '{print $9}' datafile.txt     # screen output follows:
"datafile.txt".
character.
fields.

Note the arguments to the cut command needed to achieve the same result:

cut -d ^ -f 9 datafile.txt      # screen output follows:
"datafile.txt".
character.
fields.

But cut cannot do math or handle column alignment.

The awk printf() function allows output to be formatted into columns:

awk -F"^" '{printf("Record: %2d, %5s %2.3f\n", NR, $3, ($1/NR))}' datafile.txt
# screen output follows:
Record:  1,  this 1.000
Record:  2, field 6.000
Record:  3,   has 1.333

Another example in which awk reads the colon-delimited UNIX password file:

awk -F":" '{printf("%8s %4d %20s\n", $1,$3,$5)}' /etc/passwd   # screen output follows:

And a final example in which awk checks the UNIX password file for backdoor entries:
awk -F":" '$3 == 0 && $1 != "root" {printf("%8s %4d %20s\n", $1,$3,$5)}' /etc/passwd

==================================================================

SOMETHING MORE COMPLICATED:

An awk program can perform several tasks at once while passing through
the data file:

awk -F"^" -v inf='datafile.txt' '
BEGIN \
{# awk performs all instructions in this section before reading in any data.
 printf("Input File: %s\n", inf)
}
# main()
{# awk performs all instructions in this section on every record in the data file.
 totl += $1
 printf("Length of record %d is %d\n", NR, length($0))
}
END \
{# awk does the things in this section only after all data has been read.
 printf("Records in input file (%s): %d\n", inf, NR)
 printf("Total value of Field #1: %d\n", totl)
}' datafile.txt
# screen output follows:
Input File: datafile.txt
Length of record 1 is 67
Length of record 2 is 76
Length of record 3 is 77
Records in input file (datafile.txt): 3
Total value of Field #1: 17

Awk commands can be entered:
  - from the Command Line (see above)
  - from a shell script   (see see_edf.ksh script)
  - from another file     (as exampled below)

==================================================================

AWK INVOCATION ARGUMENTS:
 
-F"c"            # Sets the field delimiter to any character "c".
-f filename      # tells awk to read its commands from file "filename".
-v v1="$env_var" # Populates the awk variable "v1" with the contents of "env_var"

The -v argument can be used once for each awk variable to be populated.

Consider this awk invocation example:

awk -F"^" -f awk_com1.awk -v avr="$foo" datafile.txt

# awk will read the file "awk_com1.awk" to get the commands,
# with which to process the data file "datafile.txt" which
# is delimited with the caret character (^). The programmer also
# populates the awk variable "avr" with the current contents of
# the environment variable "foo".

Awk is aware of UNIX environment variables, as the example shows:

echo foolie | awk '{printf("Your HOME directory is \"%s\".\n", ENVIRON["HOME"])}'
Your HOME directory is "/home/dmarti22".

The reason for piping in the echo above is so awk will think it is reading a file.

==================================================================

EXAMPLES AND DISCUSSION:

Note these six files (also contained within awk_fyi.zip):

ckurf.ksh        - Korn script that uses awk to summarize a MAF Repoll log.
ckurf_num.txt    - Line-Numbered version of ckurf.ksh, with example output, for discussion.
reseq.ksh        - Korn script that uses awk to re-sequence the first field of a datafile.
reseq_num.txt    - Line-numbered version of reseq.ksh for discussion.
see_edf.ksh      - Korn shell script which calls awk to summarize a lengthy log file.
see_edf_num.txt  - Line-Numbered version of see_edf.ksh, with example output.

==================================================================

DISCUSSION OF reseq_num.txt:

Lines  14 -  19: What is going on?
Line         20: What does it do?  How does it work?
Line         21: Analyze awk invocation.
Lines  22 -  27: Analyze awk actions.

==================================================================

DISCUSSION OF ckurf_num.txt:

Line          6: Why?  How?
Lines   7 -  18: Why?  How?
Lines  21 -  24: Observe the sample data
Lines  44 -  56: Analyze line-by-line.
Line         57: What do you think?
Lines  59 -  71: Analyze line-by-line.

==================================================================

DISCUSSION OF see_edf_num.txt:

Lines  31 -  33: What does each line do?
Line         35: awk is invoked.
                   What do the arguments mean?
                   How are they used in the code?
Lines  37 -  42: Which user-defined awk function(s) is NOT used?
Lines  59 -  80: Why are they there?  Are they worth the space?
Lines 105 - 106: What could have been done instead?
Line        138: Analyze the entire END section,line by line.

Beware a real gotcha: DO NOT use any contraction (e.g., do not use don't)
                      within a comment inside an embedded awk script.
                      awk will see the single quote mark inside the
                      contraction (even though it comes after a # comment
                      mark) as the terminator of the of the awk script.

==================================================================

## USEFUL AWK FUNCTIONS:
# spc(n) slides n [more] spaces along the current line:
function spc(n)  {for (i = 0; i < n; i++) printf(" ")}
# dsh(n) draws a line of n dashes along the current line:
function dsh(n)  {for (i = 0; i < n; i++) printf("-")}
# eql(n) draws a line of n equalses along the current line:
function eql(n)  {for (i = 0; i < n; i++) printf("=")}
# skip(n) jumps to the next line and skips down an additional (n-1) lines:
function skip(n)  {for (i = 0; i < n; i++) printf("\n")}
# ctr(ll, str) centers and prints string "str" based on line-length "ll"
function ctr(ll, str)
{stl = length(str)
 lmr = int(((ll - stl)/2))
 for (i = 0; i < lmr; i++) printf(" ")
 printf("%s\n", str)
}

==================================================================

EOF