Text processing with awk awk is an utility to process text files. Text files are in IC design: SPICE netlists, LEF files, DEF files, Verilog, SPEF, and so on. Overview This tutorial will use the following file to demonstrate awk. It is a small SPICE netlist and an example of a file encountered by many IC designers: A quick example Say you want to see from which machine you are coming from. First you do this: skew% who jgrad pts/1 Oct 20 19:34 (adsl-68-72-165-127.dsl.chcgil.ameritech.net) jgrad pts/7 Oct 20 19:44 (skew:3.0) jgrad pts/6 Oct 10 13:44 (skew:3.0) jgrad pts/8 Oct 20 20:01 (skew) We get more information than we asked for. We get a detailed report for each user on the system. Really all we wanted was the 6th word of the line that contains "adsl", which is our hostname. So we use awk and tell it to give us only those lines where the 6th word contains "adsl". And of all lines that match, print only the 6th word: skew% who | awk '$6~/adsl/ {print $6}' (adsl-68-72-165-127.dsl.chcgil.ameritech.net) We see that awk processes text line by line. It breaks each line down into words and gives them to use in variables $1, $2 etc. Then we tell awk what it should do, in this case its a simple "print" command. For this example we also used the "|" symbol, which is called a "pipe" in Unix. It tells Unix to not print the output of "who" on the screen but to "pipe" it into "awk" for further processing. We could even pipe the output of "awk" into another program: skew% who | awk '$6~/adsl/ {print $6}' | tr '-' '.' (adsl.68.72.165.127.dsl.chcgil.ameritech.net) Now the output of "awk" goes into "tr", which means "Text Replace". We use it to replace every "-" with ".". We use it to make the string look more like an IP address. We can use the parameter "-F" in awk to give it a character at with to split the line. By default that is a white space, but now we use ".". That means $1 will be "adsl", $2 will be "68" and so on: skew% who | awk '$6~/adsl/ {print $6}' | tr '-' '.' | awk -F. '{print $2 "." $3 "." $4 "." $5;}' 68.72.165.127 This is a convenient way to extract the 4 components of the IP address. We could put this line into our ".cshrc" startup script and always see our IP when we login. Save me some typing! When working with busses it happens quite frequently that we need to write out all the bus bits one by one. For example in a Pathmill config file. Or in any Spice level tool, since Spice doesn't support busses. Now imagine you have a 16 bit bus. That means a lot of typing (imagine 64 bits!). Thankfully, awk can do it for us: skew% awk 'BEGIN{for(i=0;i<16;i++)> a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9 a_10 a_11 a_12 a_13 a_14 a_15 We use awk just like C. In a for-loop we count from 0 to 15 and each iteration we print a bus bit. When we are done we use "print;" to get a new line and "exit" to exit from awk. Now all we have to do is copy and past the text into where ever we need it. And within seconds we can repeat this for bus "b" and "c". And it takes the same amount of time to print a 64-bit bus, just change the for-loop from "16" to "64". This command will save you hours of typing. In this example we use "BEGIN" to give awk a command to execute right away. Otherwise it would sit there and wait for text input. On Solaris we need the final "exit". On Linux it is not needed. Print all capacitors in the netlist! skew% awk '$1~/^C/' test.sp C142 VDD! 0 11.32551E-15 M=1.0 C143 0 17 2.29684E-15 M=1.0 This command will output only those lines in "test.sp" that start with a "C". This will give us all capacitors, since in SPICE we define a capacitor by using "C" as the first character. Note that "$1" contains the first word of the line. And "/^C/" is a regular expression that means "starts with C". Putting them together means "match all those lines where the first character starts with "C". Print all the MOS transistors! We could try the same as above: skew% awk '$1~/^M/' test.sp M172 VDD! 84 92 VDD! TSMC20P L=200E-9 W=2E-6 AD=999E-15 M173 VDD! 2 COUT net3 TSMC20P L=200E-9 W=2E-6 AD=999E-15 M399 0 16 96 0 TSMC20N L=200E-9 W=2E-6 AD=999E-15 M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 But note that each transistor in the Spice file contains 2 lines. With our command only the first line of each transistor is printed, because only the first line starts with an "M". We need to change our command to this: When you see an "M", start printing. And continue printing when the line starts with a "+". That is because in SPICE a "+" continues a statement from a previous line. skew% awk '$1!~/\+/ {state=0} $1~/M/ {state=1} {if(state==1) print $0}' test.sp M172 VDD! 84 92 VDD! TSMC20P L=200E-9 W=2E-6 AD=999E-15 +AS=599E-15 PD=3E-6 PS=600E-9 M=1 M173 VDD! 2 COUT net3 TSMC20P L=200E-9 W=2E-6 AD=999E-15 +AS=599E-15 PD=3E-6 PS=600E-9 M=1 M399 0 16 96 0 TSMC20N L=200E-9 W=2E-6 AD=999E-15 +AS=299E-15 PD=3E-6 PS=300E-9 M=1 M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 +AS=999E-15 PD=300E-9 PS=3E-6 M=1 We use the variable "state" to remember our state between lines. We also use the expression "$1!~/\+/". First, "!~" means "does not match". And "/\+/" is really "/+/", but because "+" has a special meaning we need to "escape it" by putting "\" in front of it. So we match 2 strings: Those not starting with "+" and those starting with "M". When we hit a line starting with "M" we set "state=1". This means "now we enter a part of the file we want to print". When we hit a line that does not start with "+" then we have completed whatever statement we were in, so we set "state=0". The actual printing of the line is done with "{if (state==1) print $0}". Note how similar awk is to C. We check our variable and if it is "1" then we print the line, which awk gives us as variable $0. I admit it looks akward in the beginning but this type of command comes in extremely handy. This type of programmable filtering often cannot be done any other way than with awk (and its big brother perl).
|