how to remove repeated rows in awk

I have a big text file like this example:

example:

chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1

there are some repeated lines and I want to take only one repeat of them. for the above example the expected output would look like this:

chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1

I am trying to do that in awk using the following command:

awk myfile.txt | uniq > uniq_file_name.txt

but the output is empty. do you know how to fix it?

4

4 Answers

EDIT: Since hek2mgl sir mentioned in case you need to remove continuous similar lines then try following.

Let's say following is Input_file:

cat Input_file
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1

Run following code now:

awk 'prev!=$0;{prev=$0}' Input_file

Output will be as follows.

chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109472560 109472561 -4732 CLCC1


The following snippet will remove all duplicate lines, not only repeating lines

awk '!a[$0]++' Input_file

Append > output_file to above command in case you want to take output into a separate file.

Explanation: Adding explanation for above code now. This is only for explanation purposes for running code use above mentioned one only.

awk '
!a[$0]++ ##Checking condition here if current line is present in array a index or NOT, if not then increase its value by 1. ##So that next time it will make condition as FALSE, since we need to have only unique lines. ##awk works on method of condition and action, so if condition is TRUE it will do some action mentioned by programmer. ##Here I am not mentioning action so by default print of current line will happen, whenever condition is TRUE.
' Input_file ##mentioning Input_file name here.
5

This is to show the difference between uniq, awk '!a[$0]++' and sort -u.

uniq: removes the consequitive duplicate lines, keeps order :

$ echo "b\nb\na\nb\nb" | uniq
b
a
b

awk !a[$0]++: removes all duplicates, keeps order

$ echo "b\nb\na\nb\nb" | awk '!a[$0]++'
b
a

sort -u: removes all duplicates and sorts the output

$ echo "b\nb\na\nb\nb" | sort -u
a
b
0

Your command:

$ awk myfile.txt | uniq > uniq_file_name.txt

and more precisely this part:

$ awk myfile.txt

will hang as there is no program or script for awk to execute. The minimum you need to do to print all the lines is:

$ awk 1 myfile.txt

But since you had no awk script, I assume you don't need awk, then just use uniq (depending on your need, either):

$ uniq myfile.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1

or

$ sort myfile.txt | uniq

which for that input will produce the same output.

Update:

Regarding the discussion in the comments about why sort: If repeated lines means all duplicated records in the file, use sort. If it means consecutive duplicated lines forget the sort.

5

Using Perl

> cat user106.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
chr1 109477498 109477499 206 CLCC1
> perl -ne ' print if $kv{$_}++ == 1 ' user106.txt
chr1 109472560 109472561 -4732 CLCC1
chr1 109477498 109477499 206 CLCC1
>

To remove repeated lines

> echo "a\nb\nb\nb\nc\nc\nd\na" | perl -ne ' print if $prev ne $_ ; $prev=$_ ' -
a
b
c
d
a
>
2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like