[Trisquel-users] Grep consumes all my RAM and swap ding a big job

amenex at amenex.com amenex at amenex.com
Sun Jul 28 14:29:21 CEST 2019


Magic Banana requested clarification of my future plans:

In reply to what I said:
 > Repeating it for the other 43 combinations should now be a breeze, as I can  
switch the file names around with Leafpad.

 >> I am not sure I understand what you want to do (join every file with the  
union of all other files?) ...

Yes: Join each file in turn with all the others, so all the repeatedly used  
hostnames are represented. Once the 45
results are in hand, I'll concatenate them _without_ using one final join,  
thereby keeping the baby from getting
thrown out with the bathwater. I have a script that does a nice job of  
grouping the duplicated hostnames, but it
won't separate them with blank lines ... (yet). See 142510.

 >> ... but Leafpad is certainly not the best solution.

Understood, but it's visual, and I can use "undo" nearly without end. Also, I  
want to guard against double-counting,
as with 01j01.txt or 01j02.txt vs 02j01.txt, and that requires some  
heavy-duty concentration. My non-geek work will
not take more than an hour or so, and the next stage will take just 45 blinks  
of an eye. I'll retain all the stages
of the processing script: > time awk --file  
Joins/Script-Joins-sorted-07272019.txt (see 142538) so they won't have
to be re-created when (or if) more Webalizer data comes to light, when they  
can just be appended.

At the end, there's still the task of identifying the often-proliferating  
IPv4 address(es) that go with each hostname.

I'v separated all (?) the untranslated IPv4 addresses beforehand, but the  
nMap scans are taking too long. I'll need
to perform our "join" magic on those 45 data sets also to reduce the sheer  
quantity of addresses to be nMap'ed.


More information about the Trisquel-users mailing list