[Trisquel-users] Grep consumes all my RAM and swap ding a big job

amenex at amenex.com amenex at amenex.com
Mon Jul 29 16:27:32 CEST 2019


The scholar, focusing on the mathematics, admonishes the "pragmatic  
idealist":

 >> That is not a problem. That is an algorithm whose first step is not even  
clear:
 >> a Google search on {stats "view all sites"} returns a "normal response", a  
list of 224,000 pages from different websites.

OK: ["usage statistics" "view all sites"] returns nearly 100% viable results.  
Open them one at a time; change the URL from
".../stats/usage_201904.html" to ".../stats/site_201906.html" from which the  
first and last columns should be selected and
saved as a two-column table.

We've developed this to a stage at which one could probably repeat the  
exercise as soon as each month's data becomes
available ... makes a good homework problem. Getting all the IPv4 addresses  
makes it a dissertation.

Here's another stab at stating the problem:

Too many visitors to our Internet domains are up to no good, as evidenced by  
the many unresolvable hostnames appearing in
the lists of Recent Visitors that we find in our own domains' logs. Internet  
traffic is identified by a numerical protocol,
either IPv4 (currently) or IPv6 (future, to gain more address space). Our  
ISP's gratuitously allow these numerical adddreses
to be translated into visually more easily recalled hostnames. Unfortunately,  
operators of Internet servers are allowed to
assign arbitrary names to each numerical address in the server's address  
space, known as "pointers" a.k.a. PTR's. Each server
address can host any number of subdomains, whose names must be registered;  
these are known as A records. Sadly, during the
process of transmitting information on the Internet, expedient nameservers  
are used to cache repeatedly accessed requests
for the IP addresses of hostnames and thereby accumulate both A records and  
PTR records which are treated equivalently. If
a PTR is named the same as more than one other PTR, the A records at either  
one's IP address become unresolvable unless one
knows the IP address of its official nameserver. Therefore,we can protect our  
own Internet domains by blocking all the
offending IP addresses that we can identify from the repetitive status of the  
hostnames that appear to be most
indiscriminately accessing similar domains to our own.


More information about the Trisquel-users mailing list