Search This Blog

2014-02-22

Removing Duplicate Column Fields in awk

Lets say we have a text file (called test) that looks like the following:
tiger bear fountain
house bear kitchen
peanuts headphone car

Now, we only care about whether or not a field has appeared or not. If bear appears twice, we don't care, we just want to know that it has occurred.

The way in which we can do this is using awk.
awk '!x[$2]++' test
tiger bear fountain
peanuts headphone car

We can see that we that it has gone through and looked for whether or not the second column has occurred more than once and removes occurrences that exist more than once.

The method in which awk achieves this is explained in the following: http://stackoverflow.com/questions/10842118/explain-this-duplicate-line-removing-order-retaining-one-line-awk-command

No comments:

Post a Comment