Monday, December 16, 2013

Join command combines two data file on similar field

http://blog.comsysto.com/2013/04/25/data-analysis-with-the-unix-shell/

It is also possible to make joins in the Unix shell with the command called join. The join command assumes that input data is sorted based on the key on which the join is going to take place. You can find another dataset on github which contains countries. This dataset is a comma separated list as well. The 14th column in the country dataset represents the capital id which is similar to the id in the city data set. This makes it possible to create a list of countries with their capitals.

1
2
3
4
5
6
7
8
9
bz@cs ~/data/ $ cat city | head -n 2
    1,Kabul,AFG,Kabol,1780000
    2,Qandahar,AFG,Qandahar,237500
bz@cs ~/data/ $ cat country | head -n 2
    AFG,Afghanistan,Asia,Southern and Central Asia,652090,1919,22720000,45.9,5976.00,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
    NLD,Netherlands,Europe,Western Europe,41526,1581,15864000,78.3,371362.00,360478.00,Nederland,Constitutional Monarchy,Beatrix,5,NL
bz@cs ~/data/ $ join -t "," -1 1 -2 14 -o '1.2,2.2' city country | head -n 2
    Kabul,Afghanistan
    Amsterdam,Netherlands

No comments:

Post a Comment