[Trisquel-users] Sort and Uniq fail to remove all duplicates from a list of hostnames and their IPv4 addresses

amenex at amenex.com amenex at amenex.com
Thu Feb 14 21:45:30 CET 2019


As apache server software on shared servers routinely performs hostname  
lookups
on data requests made to the hosted domains on their servers, I'm compiling a
database of the thousands of example.com hostnames that are on the Internet.

I've reached an impasse: LibreOffice's Calc spreadsheet will filter _most_ of
the many duplicated lines in my lists, but a great many pairs, triplicates,
and quadruplicates of the lines in my lists still remain. There are enough of
them that their manual removal is tedious.

I've tried uniq -d to try to print one of each duplicated line, followed by
uniq -u to print only the unique lines, but the outputs retained these
duplicated lines nevertheless.

Here's a sample of my predicament:

jaholper1.example.com 	95.182.79.24
jaholper1.example.com 	95.182.79.24
jaholper1.example.com 	95.182.79.33
jaholper1.example.com 	95.182.79.33
jaholper4.example.com 	109.248.200.4
jaholper7.example.com 	109.248.203.131
jaholper7.example.com 	109.248.203.188
jaholper7.example.com 	109.248.203.189
jaholper7.example.com 	109.248.203.191
jaholper7.example.com 	109.248.203.198
jaholper7.example.com 	185.186.141.79
jaholper7.example.com 	185.186.142.10
jaholper7.example.com 	185.186.142.10
jaholper7.example.com 	185.186.142.100
jaholper7.example.com 	185.186.142.100
jaholper7.example.com 	185.186.142.101
jaholper7.example.com 	185.186.142.101

uniq -d returns only one line: jaholper7.example.com 	185.186.142.101
uniq -u keeps everything _but_ the last two lines.

Reversing the positions of the two columns in LibreOffice only makes
matters worse: Get single line output or complete erasure of the file.

It's been suggested that the IPv4 addresses can each be presented as a
single decimal number, but the thought of doing that for my thousands
of IPv4 addresses makes manual editing look pretty good.

George Langford



More information about the Trisquel-users mailing list