[Trisquel-users] Grep consumes all my RAM and swap ding a big job

amenex at amenex.com amenex at amenex.com
Fri Jul 26 20:14:53 CEST 2019


Taking Magic Banana's cue, I applied the join command in round-robin fashion:

 > HNs.Ed.tropic.ssec.wisc.edu.txt  join -1 2 -2 2  join -1 2 -2 2  join -1 2  
-2 2  join -1 2 -2 2  join -1 2 -2 2  time cat `ls  
/home/george/Desktop/June2019/DataSets/HNs-TwoColumn/HNusage/HNs.Ed.tropic.ssec.wisc.edu`  
 >  
/home/george/Desktop/June2019/DataSets/HNs-TwoColumn/HNusage/HNs.Ed.tropic.ssec.wisc.edu.visitors.txt

Produces 1.4 MB output, but without any filenames; this can be remedied by  
adding the filename to each file in turn
with a print statement or (shudder) in Leafpad, one file at a time, 44 times  
over ... but there's another way:

See this link:  
https://unix.stackexchange.com/questions/117568/adding-a-column-of-values-in-a-tab-delimited-file

 >> awk '{print $0, FILENAME}' file1 file2 file3 ...

 > awk '{print FILENAME,"\t",$0}' 01.txt 02.txt 03.txt ... which works because  
I renamed the files to fit this format.

I kept the roster list ...

Here's the whole awk script, which concatenates all 44 files and inserts the  
file name associated with the data as desired:
 > time awk '{print FILENAME,"\t",$0}' 01.txt 02.txt 03.txt 04.txt 05.txt  
06.txt 07.txt 08.txt 09.txt 10.txt 11.txt 12.txt 13.txt 14.txt 15.txt 16.txt  
17.txt 18.txt 19.txt 20.txt 21.txt 22.txt 23.txt 24.txt 25.txt 26.txt 27.txt  
28.txt 29.txt 30.txt 31.txt 32.txt 33.txt 34.txt 35.txt 36.txt 37.txt 38.txt  
39.txt 40.txt 41.txt 42.txt 43.txt 44.txt >  
Backup/ProcessedVisitorLists/FILENAME.txt

It's 1.8 MB and in a pretty format, not yet processed as mentioned elsewhere  
... a luncheon date beckons.

Timing ? The join commands took about an hour to set up, ca. 1 or 2 seconds  
real time for each one (after copy
and paste into the console, about 15 seconds for each of the 44 join commands  
==> 11-1/2 minutes, and this last
monstrocity took 0.05 second real time, not to mention all morning struggling  
with a prettier method of reading
what's been in the current directory all along. Repeating it for the other 43  
combinations should now be
a breeze, as I can switch the file names around with Leafpad.

All because I haven't yet spent those ten hours that've been mentioned every  
so often ...



More information about the Trisquel-users mailing list