1.3 hours
Gates Computer Science Building Room 119
Free Tickets Available
Wed, 22 Oct, 2025 at 12:00 pm to 01:15 pm (GMT-07:00)
Gates Computer Science Building Room 119
353 Serra Mall, Stanford, United States
Preserving Humanity's Knowledge and Making it Accessible | Addressing Challenges of Public Web Data
Visit our website to learn more about the event agenda, speakers, and other details
The Common Crawl Foundation is dedicated to preserving humanity's knowledge and making it accessible through its free public web dataset, a vital resource since 2008. As AI development accelerates, concerns have emerged regarding the accessibility and transparency of public web data, impacting open datasets in three key ways: robots.txt exclusions, legal demands, and "bot defenses." Two of these are not visible in public and are not very well understood. We will present insights from a new data product that utilizes Common Crawl's crawl metadata to visually explore these three problems, advocating for greater transparency and informed solutions for the future of public web data.
Details:
Time: 12:00 pm - 1:15 pm PT
Location: Gates Computer Science Building, Room 119, 353 Jane Stanford Way, CA 94503.
Also check out other Workshops in Stanford.
Tickets for HAI Seminar with Common Crawl can be booked here.
Ticket type | Ticket price |
---|---|
In-Person Ticket | Free |
Stanford Institute for Human-Centered Artificial Intelligence (HAI)
Are you the host? Claim Event