Last Update: 20230923 20190611 Local File: "C:\DAN\HTM\GoDaddy\dansher\utut\awk\awk_fyi.txt" Web Path: https://www.dansher.com/utut/awk/awk_fyi.txt See Also: https://www.awk.dev/ https://www.amazon.com/dp/0138269726 ================================================================== WHY AWK: Awk is a powerful programming language available on all UNIX systems. Since it is uncompiled, an awk program written on HP/UX will run without modification on SOLARIS, LINUX, or any UNIX-like system. In addition to branching and loop processing, awk offers the power of multi-dimensional array processing plus a wealth of built-in string and math functions. It can read from and write to multiple files, and execute system commands. Good on-line resources for awk are at: http://www.grymoire.com/Unix/Awk.html https://www.tutorialspoint.com/awk https://www.datafix.com.au/BASHing/2019-06-07.html ================================================================== USES FOR AWK: Basic use is as a slice-'n-dice utility for data files. awk can parse any readable input (from a file, a pipe, or the keyboard) into records and fields. From these it can generate a nicely formatted summary report from the bulky original. The file can be a data file, either fixed-length or field-delimited, or more free-form such as log files almost always are. ================================================================== SIMPLE AWK EXAMPLES: The lines in a readable file may be counted using either wc or awk: wc -l datafile.txt awk 'END {print NR}' datafile.txt Awk reads input one-line-at-a-time, usually from a file or from a pipe. A simple (but wasteful) example of the latter is: wc -l datafile.txt | awk '{print $1}' Another example (also wasteful) of feeding awk through a pipe: date '+%m%d%Y' | awk '{print substr($0,1,2) "/" substr($0,3,2) "/" substr($0,5,4)}' Note that the pipe feeds awk OK even if the pipe is on the line above (nice for keeping program lines short enough to avoid overflow). ================================================================== HOW AWK "SEES" A DATA FILE: An awk data file (i.e., 'input") can be any text file. Such text files are often generated as the output of a database query, with each line in the output file being considered a "record." Each record (i.e., every line) usually contains the same number of "fields." From record to record, a specific field typically contains the same type of information. For example, if a data file contains a complete customer name and address on every line, one could expect that the next-to-last field would always contain the state (e.g., CA). Awk always parses its input records into "fields", known as $1, $2, etc. As awk passes through the data file, all data on each record is temporarily assigned to $0 (zero). By default, awk uses whitespace (one or more spaces or tabs) as the Field Separator (delimiter), but other delimiter characters can be used. Consider the three-line data file below: 1^Attention:^this^is the^caret^delimited^data^file^"datafile.txt".^ 12^Every^field^is separated from the next^by^the^caret^(shift 6)^character.^ 4^Even though it^has^been^challenging,^every^line^contains only nine^fields.^ In the simple one-line awk program below, note how awk counts the Number of Fields in each record of the data file using the default delimiter (whitespace): awk '{print NF}' datafile.txt # screen output follows: 2 6 5 In the example below, awk is explicitly told (via -F"^") to use the caret character as the Field delimiter. awk -F"^" '{print NF}' datafile.txt # screen output follows: 10 10 10 The built-in awk "NF" variable always equals the actual number of fields per line, plus one. It is a great mystery why this is so - just deal with it. awk -F"^" '{print $9}' datafile.txt # screen output follows: "datafile.txt". character. fields. Note the arguments to the cut command needed to achieve the same result: cut -d ^ -f 9 datafile.txt # screen output follows: "datafile.txt". character. fields. But cut cannot do math or handle column alignment. The awk printf() function allows output to be formatted into columns: awk -F"^" '{printf("Record: %2d, %5s %2.3f\n", NR, $3, ($1/NR))}' datafile.txt # screen output follows: Record: 1, this 1.000 Record: 2, field 6.000 Record: 3, has 1.333 Another example in which awk reads the colon-delimited UNIX password file: awk -F":" '{printf("%8s %4d %20s\n", $1,$3,$5)}' /etc/passwd # screen output follows: And a final example in which awk checks the UNIX password file for backdoor entries: awk -F":" '$3 == 0 && $1 != "root" {printf("%8s %4d %20s\n", $1,$3,$5)}' /etc/passwd ================================================================== SOMETHING MORE COMPLICATED: An awk program can perform several tasks at once while passing through the data file: awk -F"^" -v inf='datafile.txt' ' BEGIN \ {# awk performs all instructions in this section before reading in any data. printf("Input File: %s\n", inf) } # main() {# awk performs all instructions in this section on every record in the data file. totl += $1 printf("Length of record %d is %d\n", NR, length($0)) } END \ {# awk does the things in this section only after all data has been read. printf("Records in input file (%s): %d\n", inf, NR) printf("Total value of Field #1: %d\n", totl) }' datafile.txt # screen output follows: Input File: datafile.txt Length of record 1 is 67 Length of record 2 is 76 Length of record 3 is 77 Records in input file (datafile.txt): 3 Total value of Field #1: 17 Awk commands can be entered: - from the Command Line (see above) - from a shell script (see see_edf.ksh script) - from another file (as exampled below) ================================================================== AWK INVOCATION ARGUMENTS: -F"c" # Sets the field delimiter to any character "c". -f filename # tells awk to read its commands from file "filename". -v v1="$env_var" # Populates the awk variable "v1" with the contents of "env_var" The -v argument can be used once for each awk variable to be populated. Consider this awk invocation example: awk -F"^" -f awk_com1.awk -v avr="$foo" datafile.txt # awk will read the file "awk_com1.awk" to get the commands, # with which to process the data file "datafile.txt" which # is delimited with the caret character (^). The programmer also # populates the awk variable "avr" with the current contents of # the environment variable "foo". Awk is aware of UNIX environment variables, as the example shows: echo foolie | awk '{printf("Your HOME directory is \"%s\".\n", ENVIRON["HOME"])}' Your HOME directory is "/home/dmarti22". The reason for piping in the echo above is so awk will think it is reading a file. ================================================================== EXAMPLES AND DISCUSSION: Note these six files (also contained within awk_fyi.zip): ckurf.ksh - Korn script that uses awk to summarize a MAF Repoll log. ckurf_num.txt - Line-Numbered version of ckurf.ksh, with example output, for discussion. reseq.ksh - Korn script that uses awk to re-sequence the first field of a datafile. reseq_num.txt - Line-numbered version of reseq.ksh for discussion. see_edf.ksh - Korn shell script which calls awk to summarize a lengthy log file. see_edf_num.txt - Line-Numbered version of see_edf.ksh, with example output. ================================================================== DISCUSSION OF reseq_num.txt: Lines 14 - 19: What is going on? Line 20: What does it do? How does it work? Line 21: Analyze awk invocation. Lines 22 - 27: Analyze awk actions. ================================================================== DISCUSSION OF ckurf_num.txt: Line 6: Why? How? Lines 7 - 18: Why? How? Lines 21 - 24: Observe the sample data Lines 44 - 56: Analyze line-by-line. Line 57: What do you think? Lines 59 - 71: Analyze line-by-line. ================================================================== DISCUSSION OF see_edf_num.txt: Lines 31 - 33: What does each line do? Line 35: awk is invoked. What do the arguments mean? How are they used in the code? Lines 37 - 42: Which user-defined awk function(s) is NOT used? Lines 59 - 80: Why are they there? Are they worth the space? Lines 105 - 106: What could have been done instead? Line 138: Analyze the entire END section,line by line. Beware a real gotcha: DO NOT use any contraction (e.g., do not use don't) within a comment inside an embedded awk script. awk will see the single quote mark inside the contraction (even though it comes after a # comment mark) as the terminator of the of the awk script. ================================================================== ## USEFUL AWK FUNCTIONS: # spc(n) slides n [more] spaces along the current line: function spc(n) {for (i = 0; i < n; i++) printf(" ")} # dsh(n) draws a line of n dashes along the current line: function dsh(n) {for (i = 0; i < n; i++) printf("-")} # eql(n) draws a line of n equalses along the current line: function eql(n) {for (i = 0; i < n; i++) printf("=")} # skip(n) jumps to the next line and skips down an additional (n-1) lines: function skip(n) {for (i = 0; i < n; i++) printf("\n")} # ctr(ll, str) centers and prints string "str" based on line-length "ll" function ctr(ll, str) {stl = length(str) lmr = int(((ll - stl)/2)) for (i = 0; i < lmr; i++) printf(" ") printf("%s\n", str) } ================================================================== EOF