Wednesday, March 07, 2007

Perform Satistical Operations on Columns in CSV Files

This is a simple but powerful way to process files in Unix, using the humble program awk.

To calculate sum or average of a numerical column of a comma separated file, create a text file like so:

BEGIN { FS = "," }
{ s += $3 }
END { printf "sum = %.2f, avg = %.2f, hits = %d\n", s, s/NR, NR }


Use your creative juices to save it with a meaningful name, say, test.awk.

Call awk with this file, like so:

awk -f test.awk mycsvfile.csv


You should see the sum, average and number of lines processed. In this example, it is assumed that the values in each line are separated by commas, and the numerical column is the third one.

For more information on awk, RTFM or Google it.

Technorati Tags: , , , , , , ,