About those "AUDITORY list edit URL" messages (Dan Ellis )


Subject: About those "AUDITORY list edit URL" messages
From:    Dan Ellis  <dpwe(at)EE.COLUMBIA.EDU>
Date:    Sun, 14 Apr 2002 13:30:17 -0400

Dear List - This morning, you will each have received a message giving the URL to the web page for changing your AUDITORY list registration details. These messages were sent accidentally, but *nothing bad has happened*. Below are the details of how these messages came to be sent, which I include for the curious... First some background: If you visit your auditory registration information page (from the audindex.html page off the main www.auditory.org web site) *without* specifying the identity confirmation field (the ID=... part of the URL), you are given a read-only page. This is the point of collecting this information - so other list members can refer to it, via the audindex.html page. However, each such page includes a link which can be clicked to send an email -- including the ID=... tag -- back to the email address listed in the record. This way, if you want to edit your page, but don't have the full edit URL, you can find your record via audindex.html, click the link, and get your full edit URL emailed back to you. This is all automatic, so that manual intervention doesn't slow down the process. Now, to a computer, this link used to look like a plain URL. When I first put this in place, we had a problem when the page was crawled by a web search engine, which followed every one of these links. To prevent this from happening again, I specified a variety of patterns in the site's "robots.txt" file, which is used to tell web crawling engines which pages to ignore. Unfortunately, this morning the site was crawled from the IP address 211.92.138.3 (which appears to be somewhere in China) without any regard for the robots.txt file: every link was tried, which was why each of you were sent your edit URL. Note that the main index page, audindex.html, isn't actually linked from the main web page - in an attempt to protect the privacy of list members, I have sought to avoid this page becoming well known. However, I *have* mentioned the full URL in several postings to the list. Moreover, the web-based archive of postings on www.auditory.org automatically converts URLs embedded in messages to links. So, when the rogue web-crawling robot from 211.92.138.3 crawled the entire postings archive, it also found links to the audindex.html page, and hence crawled that too. I hadn't realized that the page was being 'published' in this way. So, in the hope of preventing future recurrences of this event, I have made two changes: - the "Email me my edit URL" link on the member info display page is now a button rather than a plain link. I think most web crawling programs will ignore buttons. I should have done this from the beginning; it was just a design flaw. - I've modified the archived messages that refer to the audindex page so that the URLs are no longer well-formed, and hence they won't be converted to links when the archive pages are viewed. I have also refrained from giving the full URL to the audindex page in this message. It's just www.auditory.org followed by audindex.html, but if you've read this far, you'll understand why I haven't included the slashes to make it into a full URL! Anyway, my apologies for the confusion, and I hope we can avoid further repeats in the future. Best wishes to you all, -- DAn Ellis <dpwe(at)ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/ Dept. of Elec. Eng., Columbia Univ., New York NY 10027 (212) 854-8928


This message came from the mail archive
http://www.auditory.org/postings/2002/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University