Monday, December 16, 2013

The Art of Unix Programming

http://www.faqs.org/docs/artu/

Join command combines two data file on similar field

http://blog.comsysto.com/2013/04/25/data-analysis-with-the-unix-shell/

It is also possible to make joins in the Unix shell with the command called join. The join command assumes that input data is sorted based on the key on which the join is going to take place. You can find another dataset on github which contains countries. This dataset is a comma separated list as well. The 14th column in the country dataset represents the capital id which is similar to the id in the city data set. This makes it possible to create a list of countries with their capitals.

1
2
3
4
5
6
7
8
9
bz@cs ~/data/ $ cat city | head -n 2
    1,Kabul,AFG,Kabol,1780000
    2,Qandahar,AFG,Qandahar,237500
bz@cs ~/data/ $ cat country | head -n 2
    AFG,Afghanistan,Asia,Southern and Central Asia,652090,1919,22720000,45.9,5976.00,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
    NLD,Netherlands,Europe,Western Europe,41526,1581,15864000,78.3,371362.00,360478.00,Nederland,Constitutional Monarchy,Beatrix,5,NL
bz@cs ~/data/ $ join -t "," -1 1 -2 14 -o '1.2,2.2' city country | head -n 2
    Kabul,Afghanistan
    Amsterdam,Netherlands

Fwd: unix tips




Get some sampled data of a very big file

awk     'BEGIN { srand(systime()); } {if (rand() < 0.3) { print $0; } }'    data.csv


Tuesday, December 3, 2013

# Print a multiplication table.


printf "%3d %3d %3d %3d %3d %3d %3d %3d %3d %3d\n" $( echo {1..10}*{1..10}\; | bc )