28th June 09:20 – 09:40
Speaker: Nicole Coleman
Abstract: Innovative technological solutions in information retrieval can sometimes obscure simpler upstream solutions. Stanford Libraries and the Center for Spatial and Textual Analysis at Stanford have been working together on a project to discover and make publicly accessible data within California Law Enforcement Agency policy manuals. In that process, the team has applied large language models in a retrieval augmented generation (RAG) workflow for study of concepts like “reasonableness” and attempted pulling semi-structured data out of the documents into a more easily analyzable and accessible form. The conclusion is that applying AI to information retrieval of this kind not easy and requires costly verification work to ensure accuracy.
The policy manuals that are the subject of this study were made public in 2020 by an act of California legislation. The goals of the legislation are to help educate the public about such information by making it easily accessible, to increase communication and community trust, enhance transparency, and to save costs and labor by reducing the quantity of individual information requests. As it turns out, compliance with the law as written does not necessarily mean the materials are easily accessible online. The strategy of Simple Sabotage, outlined in a pamphlet from a pre-CIA United States government spy agency, has served as a useful reference in designing an alternative system for information access that limits the amount of intentional or inadvertent interference with public access to government documents.
Anticipating the ongoing need for access to this material, the Stanford project began work on a recommendation for document design to make the materials more accessible, aligned with FAIR data principles (Findable, Accessible, Interoperable, Reusable) and CARE data principles (Collective Benefit, Authority to Control, Responsibility, Ethics) and, importantly, more compatible with information retrieval using computational agents.
Bio: Nicole is Digital Research Architect for the Stanford University Libraries and Research Director for Humanities+Design, a research lab at the Center for Spatial and Textual Analysis. Nicole works at the intersection of the digital library and digital scholarship as a lead architect in the design and development of practical research services. She is currently leading an initiative within the Library to identify and enact applications of artificial intelligence – machine perception, machine learning, machine reasoning, and language recognition – to make the collections of maps, photographs, manuscripts, data sets and other assets more easily discoverable, accessible, and analyzable.
At Humanities + Design she has led the design and development of numerous tools for data visualization and analysis including Palladio, Breve, and, most recently, Data Pen. The lab encourages and supports collaboration between researchers from the humanities and design to encode interpretive method in tools for data analysis. Lessons learned in that work have proven essential to designing human-centered applications of machine intelligence in support of research.