[Trisquel-users] The join command is missing the IPv4 addresses in long mixed lists of strings

amenex at amenex.com amenex at amenex.com
Fri Jul 5 16:36:11 CEST 2019


In the process of proving myself wrong, I did the following experiment:

1. Make a shuffled version of the file GBsmt-front.txt.
2. Make its number of lines divisible by four (by temporarily deleting one  
line).
3. Divide GBsmt-front-shuf into four parts (A,B,C,D); replace the deleted  
line into part D.
4. Sort each of the four parts with the console (previously I had been  
believing the sorted output of LibreOffice Calc).
5. Sort GBsmt-front.txt (again, just to be sure, with the console).
6. Run the join command four times, with the A, B,C, & D portions of  
GBsmt-front-shuf, against GBsmt-front.txt.
7. Interim reality check: The sum in kB of the four output files equals the  
size of the original GBsmt-front.txt file.
8. For-sure reality check: Concatenate the four outputs of the above join  
command, sort, and compare to the original GBsmt-front.txt list.

After all this manipulation, the two files (GBsmt-front.txt and  
GBsmt-front-shuf-(A,B,C,D-join-concatenate-sort.txt) are identical.

Then I tried the original task, and now the IPv4 addresses appear in the  
joined output. As long as I sort each of the files
to be joined right before the join operation, the command doesn't complain  
... and the IPv4 data appear in droves.

Thanks to Magic Banana for confirming that join doesn't have undisclosed  
limitations.

Another tidbit: Sorting a file with "sort [file]" alone sends the sorted  
output to the console; "sort [file] > itself" gives O bytes output.
My two-step "solution": "sort [file] > [file-newname]" then "mv  
[file-newname] [file]" preserved the original file and its name and left
no residue. The correct way to do this in line with the join command is "join  



More information about the Trisquel-users mailing list