Text processing with awk awk is an utility to process text files. Text files are in IC design: SPICE netlists, LEF files, DEF files, Verilog, SPEF, and so on. Overview This tutorial will use the following file to demonstrate awk. It is a small SPICE netlist and an example of a file encountered by many IC designers: A quick example Say you want to see from which machine you are coming from. First you do this: skew% who jgrad pts/1 Oct 20 19:34 (adsl-68-72-165-127.dsl.chcgil.ameritech.net) jgrad pts/7 Oct 20 19:44 (skew:3.0) jgrad pts/6 Oct 10 13:44 (skew:3.0) jgrad pts/8 Oct 20 20:01 (skew) We get more information than we asked for. We get a detailed report for each user on the system. Really all we wanted was the 6th word of the line that contains "adsl", which is our hostname. So we use awk and tell it to give us only those lines where the 6th word contains "adsl". And of all lines that match, print only the 6th word: skew% who | awk '$6~/adsl/ {print $6}' (adsl-68-72-165-127.dsl.chcgil.ameritech.net) We see that awk processes text line by line. It breaks each line down into words and gives them to use in variables $1, $2 etc. Then we tell awk what it should do, in this case its a simple "print" command. For this example we also used the "|" symbol, which is called a "pipe" in Unix. It tells Unix to not print the output of "who" on the screen but to "pipe" it into "awk" for further processing. We could even pipe the output of "awk" into another program: skew% who | awk '$6~/adsl/ {print $6}' | tr '-' '.' (adsl.68.72.165.127.dsl.chcgil.ameritech.net) Now the output of "awk" goes into "tr", which means "Text Replace". We use it to replace every "-" with ".". We use it to make the string look more like an IP address. We can use the parameter "-F" in awk to give it a character at with to split the line. By default that is a white space, but now we use ".". That means $1 will be "adsl", $2 will be "68" and so on: skew% who | awk '$6~/adsl/ {print $6}' | tr '-' '.' | awk -F. '{print $2 "." $3 "." $4 "." $5;}' 68.72.165.127 This is a convenient way to extract the 4 components of the IP address. We could put this line into our ".cshrc" startup script and always see our IP when we login. Save me some typing! When working with busses it happens quite frequently that we need to write out all the bus bits one by one. For example in a Pathmill config file. Or in any Spice level tool, since Spice doesn't support busses. Now imagine you have a 16 bit bus. That means a lot of typing (imagine 64 bits!). Thankfully, awk can do it for us: skew% awk 'BEGIN{for(i=0;i<16;i++)> a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9 a_10 a_11 a_12 a_13 a_14 a_15 We use awk just like C. In a for-loop we count from 0 to 15 and each iteration we print a bus bit. When we are done we use "print;" to get a new line and "exit" to exit from awk. Now all we have to do is copy and past the text into where ever we need it. And within seconds we can repeat this for bus "b" and "c". And it takes the same amount of time to print a 64-bit bus, just change the for-loop from "16" to "64". This command will save you hours of typing. In this example we use "BEGIN" to give awk a command to execute right away. Otherwise it would sit there and wait for text input. On Solaris we need the final "exit". On Linux it is not needed. Print all capacitors in the netlist! skew% awk '$1~/^C/' test.sp C142 VDD! 0 11.32551E-15 M=1.0 C143 0 17 2.29684E-15 M=1.0 This command will output only those lines in "test.sp" that start with a "C". This will give us all capacitors, since in SPICE we define a capacitor by using "C" as the first character. Note that "$1" contains the first word of the line. And "/^C/" is a regular expression that means "starts with C". Putting them together means "match all those lines where the first character starts with "C". Print all the MOS transistors! We could try the same as above: skew% awk '$1~/^M/' test.sp M172 VDD! 84 92 VDD! TSMC20P L=200E-9 W=2E-6 AD=999E-15 M173 VDD! 2 COUT net3 TSMC20P L=200E-9 W=2E-6 AD=999E-15 M399 0 16 96 0 TSMC20N L=200E-9 W=2E-6 AD=999E-15 M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 But note that each transistor in the Spice file contains 2 lines. With our command only the first line of each transistor is printed, because only the first line starts with an "M". We need to change our command to this: When you see an "M", start printing. And continue printing when the line starts with a "+". That is because in SPICE a "+" continues a statement from a previous line. skew% awk '$1!~/\+/ {state=0} $1~/M/ {state=1} {if(state==1) print $0}' test.sp M172 VDD! 84 92 VDD! TSMC20P L=200E-9 W=2E-6 AD=999E-15 +AS=599E-15 PD=3E-6 PS=600E-9 M=1 M173 VDD! 2 COUT net3 TSMC20P L=200E-9 W=2E-6 AD=999E-15 +AS=599E-15 PD=3E-6 PS=600E-9 M=1 M399 0 16 96 0 TSMC20N L=200E-9 W=2E-6 AD=999E-15 +AS=299E-15 PD=3E-6 PS=300E-9 M=1 M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 +AS=999E-15 PD=300E-9 PS=3E-6 M=1 We use the variable "state" to remember our state between lines. We also use the expression "$1!~/\+/". First, "!~" means "does not match". And "/\+/" is really "/+/", but because "+" has a special meaning we need to "escape it" by putting "\" in front of it. So we match 2 strings: Those not starting with "+" and those starting with "M". When we hit a line starting with "M" we set "state=1". This means "now we enter a part of the file we want to print". When we hit a line that does not start with "+" then we have completed whatever statement we were in, so we set "state=0". The actual printing of the line is done with "{if (state==1) print $0}". Note how similar awk is to C. We check our variable and if it is "1" then we print the line, which awk gives us as variable $0. I admit it looks akward in the beginning but this type of command comes in extremely handy. This type of programmable filtering often cannot be done any other way than with awk (and its big brother perl).
Are all my wells tied off? In IC design it is very important to ensure that all PMOS devices have their body tied to VDD and all NMOS devices to VSS. This is done using NWELL and PWELL contacts in the layout. Miss one contact and your device will float, causing all sorts of hazards. It is impossible to check all well contacts by looking at the layout. We need to automate this, for example as a mandatory check before tape-out. Awk can very simply detect this condition: skew% awk '$6~/TSMC20P/ && $5!~/^VDD\!$/' test.sp M173 VDD! 2 COUT net3 TSMC20P L=200E-9 W=2E-6 AD=999E-15 Sure enough, awk has detected a floating device. We have to check for 2 conditions: field number 6 (which contains the device model) indicates a PMOS (which is called "TSMC20P" in our netlist). And field number 5 (which is the body terminal) is not equal to "VDD!". Using "&&" we tell awk that we want all lines that match both conditions. Note that "!" is another special character, so we escaped it with "\". We already know that "^" means "starts with". Now we also know that "$" means "ends with". So "^VDD!$" means "exactly VDD!". Note that the net name "VDD!1" would match "^VDD!" but it does not match "^VDD!$". We can do the same thing for the NMOS devices. We just filter all the "TSMC20N" devices and check if the body is "0". Then we go back to the layout and fix the floating devices: skew% awk '$6~/TSMC20N/ && $5!~/^0$/' test.sp M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 If we want to run this check often then it would be nice to have an actual error message, not just the failing line: skew% awk '$6~/TSMC20P/ && $5!~/^VDD\!$/ {print "Error in " $1 ": Should be VDD! not " $5;}' test.sp Error in M173: Should be VDD! not net3 To produce this nice message we used the "print" command and build an error message from the different fields that awk has given us. $1 is the first word, which is the device instance name. And $5 is the body terminal. Have all my devices minimum length? This is another common check, especially in digital designs. Usually all devices should have minimum length. If there is a typo in the schematic it can quickly happen that the length has a different value. This is hard to catch because designers get into the habit of only looking at the width in the schematic. skew% awk '$1~/^M/ && $7!~/200E-9/' test.sp M400 96 3 8 net4 TSMC20N L=200 W=2E-6 AD=299E-15 In our design, 200n is the minimum length. So we filter all lines starting with "M" and where field #7 does not contain 200E-9. Sure enough, we found one. Which device widths did I use? Here is a way to quickly get a report of the different device widths used in the design: skew% awk '{print $8}' test.sp | sort |uniq W=2E-6 In our example all devices have a width of 2u. Here we used 3 programs in sequence: We use "awk" to filter only the 8th field of every line. That is the width in our netlist. It will be blank in other lines, but we will ignore that. Then we send the output to "sort" and then to "uniq". To see what "sort" does, consider this: skew% awk '{print $8}' test.sp | sort W=2E-6 W=2E-6 W=2E-6 W=2E-6 We get all our width values sorted. The blank lines come from those lines that are blank in the 8th field. This would be a very large report for a typical netlist, because the width of each device would be printed, just in sorted order. Now when we use "uniq" we print each value only once, repeat lines are supressed. This gives us a very nice and short report of the values we used. Put my awk code into a script! Usually we want to run awk more than once. Then it makes sense to put the code into a script. Then we can give the script a convenient name and others can easily use it. Consider this example: skew% cat wellcheck #!/bin/awk -f $6~/TSMC20P/ && $5!~/^VDD\!$/ { print "Error in " $1 ": Should be VDD! not " $5; } $6~/TSMC20N/ && $5!~/^0$/ { print "Error in " $1 ": Should be 0 not " $5; } skew% chmod 755 wellcheck skew% ./wellcheck test.sp Error in M173: Should be VDD! not net3 Error in M400: Should be 0 not net4 Here we do three things. We created a file called "wellcheck" in a text editor, then we used "cat" to display the contents. Note the first line, which tells the Unix shell to interpret the file as awk code. Also note how much easier the code is to read when we separate it over multiple lines. Awk command lines are hard to read, but awk scripts are very easy to read. Then we use "chmod 755 wellcheck". This makes our script executable, meaning we can start it like a program. Now we can start it like "./wellcheck test.sp". It looks just like any other Unix program. Anybody can run it, they don't have to know or understand the awk code behind it. Personally I have lots of these scripts for all kinds of mundane tasks. |
No comments:
Post a Comment