[Trisquel-users] Re : Grep consumes all my RAM and swap ding a big job

lcerf at dcc.ufmg.br lcerf at dcc.ufmg.br
Sun Jul 28 23:34:29 CEST 2019


You write a lot but the problem is still unclear to me.  My last script is a  
solution to the following problem:

"Given text files where each line is a number followed by a tabulation and a  
hostname (unique in a given file), take the files one by one and list, for  
each of its hostnames, how many other files have it, what are these other  
files and the numbers they relate the hostname to.  The hostnames are  
reordered in decreasing order of the number of other files listing it.  Any  
hostname in one single file (no other file has it) is unlisted."

Indeed, if you copy-paste the script of  
https://trisquel.info/forum/grep-consumes-all-my-ram-and-swap-ding-big-job#comment-142554  
in "/usr/local/bin/multi-join" (for instance) and make that file executable  
('sudo chmod +x /usr/local/bin/multi-join'), then you can execute that script  
on all the files ('multi-join *.txt', if the files are all those with txt  
suffix in the working directory) and will get as many new files bearing the  
same names suffixed with "-joins".  For instance, giving as input the three  
files in the original post, you get, within 0.1s, three new files,  
"HNs.bst_.lt_.txt-joins", "HNs.www_.barcodeus.com_.txt-joins" and  
"HNs.www_.outwardbound.net_.txt-joins".  "HNs.bst_.lt_.txt-joins", for  
instance, contains:

2 webislab40.medien.uni-weimar.de 1 HNs.www_.barcodeus.com_.txt 2  
HNs.www_.outwardbound.net_.txt
2 vmi214246.contaboserver.net 2 HNs.www_.barcodeus.com_.txt 2  
HNs.www_.outwardbound.net_.txt
(...)
1 107-173-204-16-host.colocrossing.com 15 HNs.www_.outwardbound.net_.txt
1 104-117-158-51.rev.cloud.scaleway.com 1 HNs.www_.barcodeus.com_.txt

The first line means:

2 files, besides "HNs.bst_.lt_.txt", have "webislab40.medien.uni-weimar.de":  
"HNs.www_.barcodeus.com_.txt", which relates this hostname with the number 1,  
and "HNs.www_.outwardbound.net_.txt", which relates this hostname with the  
number 2.

Since, here, there are three files, 2 is the maximal number of other files  
(all files have the hostname) and 1 is the minimum (because any hostname in  
one single file is unlisted): in this example, all *-join files list, first,  
lines starting with 2, second, lines starting with 1.

If that is not the problem, please express it clearly.  As I did above.  Not  
with ten paragraphs.  Not with digressions.


More information about the Trisquel-users mailing list