how to remove repeated rows in awk

I have a big text file like this example:

example:

chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1

there are some repeated lines and I want to take only one repeat of them. for the above example the expected output would look like this:

chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1

I am trying to do that in awk using the following command:

awk myfile.txt | uniq > uniq_file_name.txt

but the output is empty. do you know how to fix it?

4 Answers

EDIT: Since hek2mgl sir mentioned in case you need to remove continuous similar lines then try following.

Let's say following is Input_file:

cat Input_file
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1

Run following code now:

awk 'prev!=$0;{prev=$0}' Input_file

Output will be as follows.

chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1

The following snippet will remove all duplicate lines, not only repeating lines

awk '!a[$0]++' Input_file

Append > output_file to above command in case you want to take output into a separate file.

Explanation: Adding explanation for above code now. This is only for explanation purposes for running code use above mentioned one only.

awk '
!a[$0]++ ##Checking condition here if current line is present in array a index or NOT, if not then increase its value by 1. ##So that next time it will make condition as FALSE, since we need to have only unique lines. ##awk works on method of condition and action, so if condition is TRUE it will do some action mentioned by programmer. ##Here I am not mentioning action so by default print of current line will happen, whenever condition is TRUE.
' Input_file ##mentioning Input_file name here.

This is to show the difference between uniq, awk '!a[$0]++' and sort -u.

uniq: removes the consequitive duplicate lines, keeps order :

$ echo "b\nb\na\nb\nb" | uniq
b
a
b

awk !a[$0]++: removes all duplicates, keeps order

$ echo "b\nb\na\nb\nb" | awk '!a[$0]++'
b
a

sort -u: removes all duplicates and sorts the output

$ echo "b\nb\na\nb\nb" | sort -u
a
b

Your command:

$ awk myfile.txt | uniq > uniq_file_name.txt

and more precisely this part:

$ awk myfile.txt

will hang as there is no program or script for awk to execute. The minimum you need to do to print all the lines is:

$ awk 1 myfile.txt

But since you had no awk script, I assume you don't need awk, then just use uniq (depending on your need, either):

$ uniq myfile.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1

$ sort myfile.txt | uniq

which for that input will produce the same output.

Update:

Regarding the discussion in the comments about why sort: If repeated lines means all duplicated records in the file, use sort. If it means consecutive duplicated lines forget the sort.

Using Perl

> cat user106.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
> perl -ne ' print if $kv{$_}++ == 1 ' user106.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
>

To remove repeated lines

> echo "a\nb\nb\nb\nc\nc\nd\na" | perl -ne ' print if $prev ne $_ ; $prev=$_ ' -
a
b
c
d
a
>

Pop Glow

how to remove repeated rows in awk

4 Answers

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

What exactly happens when something mutates?

Will any Pokemon appear in a horde?

How do I efficiently lower my karma?

What is the difference between a texture pack and a shader?