[Trisquel-users] Re : Grep consumes all my RAM and swap ding a big job

lcerf at dcc.ufmg.br lcerf at dcc.ufmg.br
Sun Jul 28 17:49:55 CEST 2019


When I collapsed the script into a one-liner

Why would you do that?  Copy-paste the script in a file with a meaningful  
name.  Then turn that file executable (e.g., using your file manager or with  
'chmod +x').

it would start ... but nothing ensued for about ten minutes.

It works in less than 0.1 s on the three files you gave in the original post.

awk '{print FILENAME,"\t",$0}'

That prints the file name followed by a space, a tab, another space and the  
original line.  I doubt it is what you want.  Also, the tiny AWK program I  
proposed changed the order of the fields so that there the join field is  
first and no option needs to alter the default behavior of 'join'.

 >> Can't you write *.txt?!
FILENAME has specific meaning in awk, so I was sure that I would be getting  
that for which I was asking.

I am talking about the arguments given to awk: 01.txt 02.txt 03.txt 04.txt  
05.txt 06.txt 07.txt 08.txt 09.txt 10.txt 11.txt 12.txt 13.txt 14.txt 15.txt  
16.txt 17.txt 18.txt 19.txt 20.txt 21.txt 22.txt 23.txt 24.txt 25.txt 26.txt  
27.txt 28.txt 29.txt 30.txt 31.txt 32.txt 33.txt 34.txt 35.txt 36.txt 37.txt  
38.txt 39.txt 40.txt 41.txt 42.txt 43.txt 44.txt.  Typing that is a waste of  
time and is prone to error.

Also, if, in my previous post, I have understood what you wanted to do with  
Leafpad and 43 manual executions (i.e., "join every file with the union of  
all other files"), here is a slightly modified script that does everything in  
one single execution:

#!/bin/sh
if [ -z "$2" ]
then
     printf "Usage: $0 file1 file2 [file3 ...]
"
     exit
fi
TMP=$(mktemp -u)
trap "rm $TMP 2>/dev/null" 0
mkfifo $TMP
for searched in "$@"
do
     shift
     cut -f 2 "$searched" | sort > $TMP &
     awk '{ print $2, $1, FILENAME }' "$@" | sort -k 1,1 | join - $TMP >  
"$searched-joins"
     set "$@" "$searched"
done

Again: if you want to reorder the fields and change the spaces into tabs, you  
can pipe the output of every join to something like:
awk '{ print $2 "\t" $1 "\t" $3 }'

Notice the absence of commas (which would insert OFS, a space by default).


More information about the Trisquel-users mailing list