[Trisquel-users] The join command is missing the IPv4 addresses in long mixed lists of strings

amenex at amenex.com amenex at amenex.com
Fri Jul 5 22:40:28 CEST 2019


Magic Banana wrote:

 > I did not know it was OK to redirect twice the standard input; to avoid  
touching the disk I would have created named pipes,
 > as in this short (untested) script ...

After spending the intervening time making about 150 joins, combining  
eighteen sets of four-month Webalizer data every which
way, I see that your suspicions may be well founded. The smaller output files  
(30 to several hundred kB) are clean-looking,
but the largest ones (1500 kB down to ~700 kB) have duplicated rows, nearly  
exclusively. They'll have to have the duplicates
removed during post-processing ... and be checked for errors.

Before I start that processing, I'll see if I can try out your script; the  
extra steps won't be any drag on the joining,
as the longest times for any joins were still in the blink-of-an-eye category  
(0.044 sec. system time).

I've been pairing up the most recent data with all of the prior data, one  
pair at a time, and that's getting tedious. The
"info join" page says that one of the target fields (but not both !) can be  
read from standard input. In these repetitive
joins that I'm doing now, can one of the target fields be read from a file  
that lists the other target files ? I've got
fifteen more sets of data, so this file list can grow ... up to thirty-two  
now, but almost without end if one looks at the
number of Webalizer data sets that are available.



More information about the Trisquel-users mailing list