[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

About those "AUDITORY list edit URL" messages



Dear List -

This morning, you will each have received a message giving the URL to
the web page for changing your AUDITORY list registration details.
These messages were sent accidentally, but *nothing bad has happened*.
Below are the details of how these messages came to be sent, which
I include for the curious...

First some background: If you visit your auditory registration
information page (from the audindex.html page off the main
www.auditory.org web site) *without* specifying the identity
confirmation field (the ID=... part of the URL), you are given a
read-only page.  This is the point of collecting this information - so
other list members can refer to it, via the audindex.html page.

However, each such page includes a link which can be clicked to send
an email -- including the ID=... tag -- back to the email address listed
in the record.  This way, if you want to edit your page, but don't
have the full edit URL, you can find your record via audindex.html,
click the link, and get your full edit URL emailed back to you.  This
is all automatic, so that manual intervention doesn't slow down the
process.

Now, to a computer, this link used to look like a plain URL.  When I
first put this in place, we had a problem when the page was crawled by
a web search engine, which followed every one of these links.  To
prevent this from happening again, I specified a variety of patterns
in the site's "robots.txt" file, which is used to tell web crawling
engines which pages to ignore.

Unfortunately, this morning the site was crawled from the IP address
211.92.138.3 (which appears to be somewhere in China) without any
regard for the robots.txt file: every link was tried, which was
why each of you were sent your edit URL.

Note that the main index page, audindex.html, isn't actually linked
from the main web page - in an attempt to protect the privacy of list
members, I have sought to avoid this page becoming well known.
However, I *have* mentioned the full URL in several postings to the
list.  Moreover, the web-based archive of postings on www.auditory.org
automatically converts URLs embedded in messages to links.  So, when
the rogue web-crawling robot from 211.92.138.3 crawled the entire
postings archive, it also found links to the audindex.html page, and
hence crawled that too.  I hadn't realized that the page was being
'published' in this way.

So, in the hope of preventing future recurrences of this event, I have
made two changes:

 - the "Email me my edit URL" link on the member info display page is
   now a button rather than a plain link.  I think most web crawling
   programs will ignore buttons.  I should have done this from the
   beginning; it was just a design flaw.

 - I've modified the archived messages that refer to the audindex
   page so that the URLs are no longer well-formed, and hence they
   won't be converted to links when the archive pages are viewed.

I have also refrained from giving the full URL to the audindex page in
this message.  It's just www.auditory.org followed by audindex.html,
but if you've read this far, you'll understand why I haven't included
the slashes to make it into a full URL!

Anyway, my apologies for the confusion, and I hope we can avoid
further repeats in the future.

Best wishes to you all,

-- DAn Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/
   Dept. of Elec. Eng., Columbia Univ., New York NY 10027 (212) 854-8928