Calling archiving the web for researchers dirty work is a bit much.
Unless something has changed since I was there, the crawler didn't intentionally bypass any paywalls.
The crawler obeyed robots.txt, throttled itself when visiting slow sites to avoid overloading them and announced its user agent clearly with a URL explaining what it was and how to block it if desired.
Calling archiving the web for researchers dirty work is a bit much.
Unless something has changed since I was there, the crawler didn't intentionally bypass any paywalls.
The crawler obeyed robots.txt, throttled itself when visiting slow sites to avoid overloading them and announced its user agent clearly with a URL explaining what it was and how to block it if desired.
[dead]