[Trisquel-users] Grep consumes all my RAM and swap ding a big job

Wed Jul 24 16:17:12 CEST 2019

Magic Banana wonders:

 > However, your input looks wrong: on line 674 of HNs.bst_.lt_.txt, the  
second column only contains the character 0...
 > and your 'grep' selects (among others) all the lines that include this  
character. I assume you want whole domain matches.

I checked the original webalizer data. Right after the "ws-68.oscsbras.ru"  
entry there is a hostname "." that also
appears in my own Recent Visitor data because my shared server gratuitously  
performs hostname lookups on every IPv4
address that appears on its doorstep. I have complained to my ISP, and a  
couple of times they turned off that "feature."
... but the server quickly reverts to the hostname lookup, which Apache  
actually deprecates. Something in Leafpad or in
LibreOffice Calc is converting that "." to a zero character. Maybe it's a  
warning to do the processing of the webalizer
data sets without resorting to LibreOffice Calc. Some day I'll try searching  
my voluminous nMap results to see if I can
put IPv4 address(es) with those "." hostnames.

Regarding the second half of the second part of the proposed solution:

 >> join -1 2 - HNusage/HNs.bst.lt/temp0

Grep would have included the file names of the data sets encompassed by *.txt  
in the curent directory and by "-" in this
script, but this syntax of join does not. Is there a way of maintaining the  
association between the "hits" in the joined
file and the data set wherein they reside ?

