Data Protection and AI: how to apply the data protection principles to the use of information in AI systems.

Speaker: Alister Pearson, Principal Policy Advisor for AI and Data Science, Information Commissioner’s Office (ICO).

Abstract: Artificial Intelligence (AI) has the potential to bring significant benefits for researchers. LUSTRE has identified that AI can be used to identify sensitive materials in a mass of born-digital records to make non-sensitive materials accessible and can also serve to search vast amounts of data when keyword searches would not be effective. However, the use of AI can also introduce or exacerbate risks to people’s right to privacy as well as other rights that relate to the processing of their personal information. In this presentation, I will discuss some of the main data protection risks of using AI in a research context, as well as ways to mitigate these risks. This will include considering risks during both the development and deployment of AI. My presentation will also include the ways that the Information Commissioner’s Office can further help researchers via the different services that we offer, including our recently launched Innovation Advice service. One of the aims of the presentation is to illustrate that data protection provides a framework to process people’s personal data, rather than a barrier

Bio: Alister Pearson is a Principal Policy Advisor for the AI and Data Science Team at the Information Commissioner’s Office (ICO). He was part of the team that produced the ICO’s Guidance on AI and Data Protection and led on the development of the AI and data protection risk toolkit, which won the Accountability prize at the Global Privacy Assembly Awards 2022.


Using AI to improve Excel skills

Speaker: Angie Campbell, Preservation Manager, Public Record Office of Northern Ireland (PRONI).

Abstract: Using AI (Chat GPT) as a tool to assist with Excel problems and, hopefully, improve skills. Engaging in a conversation with Chat GPT, non-technical individuals can use step-by-step guidance on troubleshooting Excel issues, learn new features and functions, and gain practical knowledge to enhance their Excel proficiency.

Bio: Angie Campbell is a Preservation Manager within PRONI. Angie has over 12 years experience in collections management, including preservation, storage, and oversight of the document production service. Angie’s team includes a mix of professional/curatorial, administrative and support grade staff. Her current role involves the assessment of collections held in out-storage.


I, Historian: Researching the past in the age of Artificial Intelligence

Speaker: Dr David Brown, Senior Researcher at the Virtual Treasury of Ireland

Abstract: Although Artificial Intelligence (AI) burst into the public consciousness towards the end of 2022 with the launch of ChatGPT, a chatbot made by a company with close links to Microsoft, AI technology has been a core technology for the Virtual Treasury of Ireland since the early days of the project. Starting in 2018, the Virtual Treasury of Ireland has developed a suite of deep-learning ‘models’, perfectly curated transcriptions, to train an AI system to read digital images of historical sources relating to Ireland and convert these into searchable text files. The automatic conversion of handwritten historical documents into searchable text is the latest major step in the digitisation of our written cultural heritage. Digital images can be expensive to create, costly to store, and can degrade over time. Moreover, as digital images are so easy to produce, they have a habit of proliferating yet are no more searchable than the original records they were intended to be surrogates for. AI enables us to preserve all of the benefits of digital images while eliminating these drawbacks. The further digitisation of the images into text makes them searchable, portable and easy to store. This new ability to find a person, place or event among thousands or millions of pages of handwritten documents is the first step in an exciting AI-enabled world that will enable ever more complex historical hypotheses to be tested and questions to be answered. Emerging technologies such as the Large Language Models (LLMs), upon which ‘human-seeming’ interfaces such as ChatGPT are based, promise to enable ever more complex research questions at the touch of a button. Current research at the VRTI is demonstrating that LLMs can improve the accuracy of our transcriptions, summarize the most important information contained in dense and lengthy documents, extract people and places from text to incorporate into our Knowledge Graph and even sort documents into chronological order.

Bio: Dr David Brown is Senior Researcher at the Virtual Treasury of Ireland, a project based at Trinity College Dublin that aims to recreate a digital model of the Public Record Office of Ireland with its contents, destroyed in 1922. David has published widely on early modern empires with a particular focus on trade and finance and has been involved in digitisation projects, in various guises, for over 30 years.


RAISE: Responsible AI iS an Enabler 

Speaker: Michaela Black, Professor of Artificial Intelligence at Ulster University

Abstract: The presentation will look at what is Responsible AI is and how it can help build trust in adoption of AI.

By doing so more data providers will be keen to contribute their data and thus more benefits can be grown from the learning from AI.

Speaker’s bio: Michaela Black, Professor of Artificial Intelligence at Ulster University, her career has seen her apply Artificial Intelligence and gamification to a wide range of domains including: telecoms, healthcare, finance, education and marketing with 60+ publications and securing £7M+ in research and pedagogic funding. She has actively engaged in a wide range of AI projects securing funding from a wide range of funders including coordinating an EU-funded project MIDAS (Meaningful Integration of Data, Analytics and Services) 2016-2020. This 40-month project, secured €4.5m in funding from the European Union’s Horizon 2020 programme. It delivered Data Science hardware and software solutions to connect fragmented health data and enable policymakers across 4+ European countries to analysis data and enhance policies across a range of diverse topics such as: mental health, obesity, diabetes and looked after children, as well as sharing and ensuring uptake of excellent data access practices such as MIDAS Honest Broker Service. She is also a member of a consortium who has just secured European funding for LUCIA: Understanding Lung Cancer related risk factors and their Impact launched January 2023. Michaela is a current co-director of the engage technology, a stakeholder participation system, proven and effective platform for gaining participant input and key consensus on relevant topics using real-time digital technologies. engage has delivered successful engagements in partnership with organisations across government, industry, professional bodies, PPIs, NGOs and academia, on the island of Ireland and across Europe. This work has influenced topics such as Brexit, work force planning in Department of Health and Matrix NI Policy on Women in STEM. An active STEM ambassador, Michaela strives to encourage more women to join this field. Actively part of the Athena Swan (AS) team at UU to promote diversity and equality. She secured EPSRC funding with University of Bath Inclusion Matters Scheme, to host a datathon which aimed to improve equality, diversity and inclusion within the engineering and physical sciences.


Two alternative pathways for the application of AI for recordkeeping purposes in live systems

Speaker: James Lappin

Abstract: Originating organisations have digital material at all stages of the records lifecycle: in live systems, in legacy systems, and awaiting appraisal for possible transfer to a historical archive.  Organisations will need help from AI at each stage, but they will face different challenges at each stages in developing AI models; in deploying AI models; and/or in basing decisions and actions on AI models. 

This talk will look at the application of AI models in live systems such as the Microsoft 365 cloud suite, where end-users are working and adding content into a wide variety of aggregations (email accounts, SharePoint sites, OneDrive accounts, Teams, Chat accounts etc.).  It will look at the special challenges and opportunities the deployment of AI in live systems poses. 

Two alternative pathways for the application of AI models in live systems will be compared and contrasted:

  • One pathway uses the existing way that content is aggregated as its starting point, working within existing aggregations to make them more precise, useful and manageable; 
  • The other pathway works across the entirety of an organisation’s ‘digital heap’ (or over the entirety of the organisations M365 implementation) to identify and label important content.

The talk uses both recordkeeping theory, and experience with digital records over the past thirty years, to arrive at some predictions as to which of the two pathway is likely to provide the safer and more predictable route to using AI to achieve improvements in recordkeeping .

Speaker Bio
: James Lappin has worked in the field of archives and records management for thirty years as a practitioner, consultant, researcher, policy advisor, presenter, blogger, podcaster and cartoonist.  


LUSTRE Workshops 1 & 2 report

Over the course of nine months, the AHRC-funded LUSTRE project has made significant progress in its exploration of using Artificial Intelligence (AI) to unlock born-digital and digitised government archives. The project has already delivered a range of activities, including more than 50 semi-structured interviews with professionals in the GLAM sector, computer scientists, and scholars. Additionally, there have been four online lunchtime talks and two hybrid workshops. 

In this blog post, we will focus on the two hybrid workshops organised in collaboration with our project partner, the Cabinet Office. These workshops took place at the Central Digital and Data Office in Whitechapel, London, as well as online. The talks from both workshops have been recorded and can be accessed on the LUSTRE website through the following links: 

LUSTRE Workshop 1 

LUSTRE Workshop 2 

The primary objective of these workshops was to foster the establishment of a network of professionals in the archival field and academic researchers who share an interest in exchanging knowledge regarding the opportunities, challenges, and potential risks associated with AI for accessing, managing, and using born-digital archives. 
Summary of Workshop 1

Invited speakers to the first workshop in January 2023 were Professor Stephanie Decker and Dr Adam Nix from the University of Birmingham; Dr Jenny Bunn from The National Archives; and Dr Tony Russell-Rose from Goldsmiths University of London. 
The day began with a presentation by Professor Stephanie Decker and Dr Adam Nix, who discussed the practices of digital archival discovery, focusing specifically on the use of AI to connect context and content in emails. Their talk provided valuable insights from a user perspective, emphasising the significance of meaningful access to born-digital archives for researchers, particularly those in the humanities and social sciences. 

Next, Dr Jenny Bunn delved into the ethical dimension of using advanced technologies, which are often categorised under the umbrella term of AI. She highlighted the challenges that the introduction of AI presents to the delicate balance between transparency, accountability, and fairness in recordkeeping activities. Dr Bunn explored new approaches that can maintain transparency and accountability in recordkeeping practices amidst the use of AI, emphasising the importance of good governance. 

The third talk of the day was led by Dr Tony Russell-Rose, and it showed how the LUSTRE project effectively brings together researchers and professionals from the GLAM sector, digital humanities, and computer science. Dr Russell-Rose’s presentation focused on access to databases and proposed approaches that move away from traditional command-line query builders. His talk Searching, fast and slow: rethinking the query builder paradigm centred on a platform called 2Dsearch which redefines ‘advanced search’, reducing syntactic errors, enhancing semantic transparency, and providing support for reuse and optimisation. 

The day concluded with a thought-provoking roundtable discussion featuring Lise Jaillant, Adam Nix, Stephanie Decker, and Jenny Bunn, centred around ethical issues in AI and government records. The speakers further expanded on the ethical dimension of conducting research on digital archives using AI-assisted technologies. They unanimously emphasised the need for transparent, explainable, and accountable processes and technologies. 
Adam Nix drew attention to the risks and contingent liabilities of already available collections when novel forms of searches unveil new materials. Stephanie Decker reflected on the challenges of anonymisation and privacy, highlighting the delicate balance between anonymising data and preserving valuable contextual research information. Jenny Bunn emphasised the active effort archivists must undertake to avoid embracing the ‘myth of objectivity’ when adopting new technologies, asserting that archival practice is inherently subjective and necessitates the archivist’s engagement as a gatekeeper. She also discussed the increased sensitivity risks associated with larger and more complex datasets. 

Overall, the first workshop provided valuable insights into the intersection of AI and born-digital archives, exploring both the practical and ethical dimensions of using  AI technologies for archival research. 

Summary of Workshop 2

The second LUSTRE workshop, titled ‘AI and Born-Digital Archives in the Government Sector and Beyond: Challenges and Opportunities’, was once again held  in Whitechapel. The workshop took place in May 2023 and, like the previous one, it was a hybrid event with five presentations, including two remote presenters. 

Invited speakers included Dr Keegan McBride from the Oxford Internet Institute, James Lappin from Loughborough University, Dr Lise Jaillant from Loughborough University (PI of the LUSTRE project), John Sheridan from The National Archives, and Professor Jason R Baron from the University of Maryland. 

Dr Keegan McBride commenced the day with his talk, ‘Artificial Intelligence in the Public Sector: Separating Myth from Reality’. He focused on differentiating the development of AI in the private sector from its applications in the public sector. Dr McBride highlighted the necessary building blocks for effective AI usage in the public sector, including technical, infrastructural, and legislative aspects. He also presented best practice case studies to illustrate AI implementation. 

James Lappin followed with a presentation on ‘AI and the Management of Email Accounts Over Time’, drawing on his doctoral research. He explored the impact of AI on access permissions and retention rules within digital corporate systems, particularly focusing on email. His talk covered various aspects, including the impact of email on recordkeeping, the effective organisation of correspondence, and key strategic choices when applying AI to email management. 

Dr Lise Jaillant, the Principal Investigator of the LUSTRE project, provided her perspective on the challenges posed by the transition from traditional archives to digital archives for professionals and academic researchers. She shared results from interviews conducted with GLAM professionals, researchers, and archive users. The participants highlighted two pressing obstacles in accessing digital records: mistrust between stakeholders and mistrust of technology. Dr Jaillant emphasised the importance of facilitating knowledge transfer, fostering collaborations between GLAM professionals, academics, and other stakeholders, and enabling access to archival materials as well as the discovery of their existence. 

Following a brief question and answer session involving both in-person and online participants, the workshop broke for lunch. The afternoon session commenced with a talk by John Sheridan, the Digital Director of The National Archives, titled ‘Navigating the Maelstrom (Looking for Lighthouses)’. He reflected on the rapid advancement of AI and large language models and the uncertainties these technologies bring. He presented strategic interventions that digital archives can implement to leverage AI’s opportunities. These interventions encompassed content classification and organisation, new text generation (including summaries and transcriptions), image/video generation based on instruction, and data transformation/manipulation based on instruction. 

The day concluded with a remote guest speaker, Professor Jason R Baron (University of Maryland). His talk focused on the process which should lead to the transfer of permanent federal government records to the National Archives and Records Administration (NARA) solely in electronic or digital formats by June 2024. 
He discussed the challenges NARA faces in meeting the 2024 deadline and the limitations presented by the Freedom of Information Act (FOIA) in providing timely access to archival records. Professor Baron also explored how AI tools can assist in ensuring real public access to government archives. He emphasised the role of AI in supporting archivists, records managers, and lawyers in filtering sensitive content from public records and recommended continued experimenting with Technology Assisted Review methods for efficiently discovering records and accurately segregating personal content. He also suggested exploring how generative AI could provide narratives explaining why documents or portions of documents have been withheld under freedom of information laws. Professor Baron concluded his talk by encouraging the embrace of AI, considering it not merely as a black box, but as a ‘gift’ to archivists to improve access to vast digital collections.


Navigating the maelstrom (looking for lighthouses)

13:30 – 14:00

Speaker: John Sheridan (The National Archives)

Abstract: We are navigating a maelstrom. The pace of advancement in AI, in particular of large language models, is bewildering, exhilarating, and terrifying. The scale of the advances made over just the last few months seems immense. No-one seems to know the limits of what is now possible, or might be.

For digital archives, this is a moment of enormous opportunity. There are also risks and threats. To quote Clausewitz, “A sensitive and discriminating judgment is called for; a skilled intelligence to scent out the truth.” (1). So, where are the lighthouses to try to navigate by, or, if there are none apparent, where at least might they be built? This presentation will suggest there are some beacons of light to guide us. Borrowing ideas from maths, computing, law and economics, it will posit what they are and what they mean for archivists. From there it will extrapolate some strategic interventions that digital archives can make to take advantage of the enormous opportunities ahead of us in the brave new world of artificial intelligence.

John Sheridan is the Digital Director at The National Archives.

1 – Carl von Clausewitz, Vom Kriege, Book 1, Chapter 3.


Is NARA Ready Yet? The Newly Extended 2024 Start Date For Accessioning Records into the US National Archives Only in Electronic and Digital Formats, and What That Means

14:00 -14:30 (online)

Speaker: Jason R. Baron, Professor of the Practice, College of Information Studies, University of Maryland 

Abstract: For over a decade, the US Archivist has been attempting to mandate a future start date for the accessioning of all permanent federal government records into the National Archives and Records Administration (NARA) solely in electronic or digital formats.  Although the “no more hard copy records” date for doing so has been extended twice, in the interim much has been accomplished in promoting governmental policies that involve the management of hundreds of millions if not billions of electronic records in individual agencies, including most prominently the archiving of e-mail records. Three questions we should be asking: how will NARA meet the challenge of accessioning especially born digital records when the deadline does finally arrive?  Are freedom of information laws up to the task of providing timely access in the future to those records? And how can AI tools assist in ensuring that public access to the government’s archives remains real and not just aspirational? 

Bio: Jason R. Baron is a Professor of the Practice in the College of Information Studies at the University of Maryland. Between 2013 and 2020, he acted as Of Counsel to the eDiscovery & Information Governance group at Faegre Drinker, LLP. During his prior 33 years in government service, he served as the first appointed director of litigation at the National Archives and Records Administration, and before that as a trial lawyer and senior counsel at the Department of Justice. In those capacities, Mr. Baron acted as lead counsel on landmark lawsuits involving the preservation of White House email, and also played a leading role in improving federal electronic recordkeeping policies. He is a recipient of the international Emmett Leahy Award for his career contributions in records and information management. Mr. Baron received his B.A. magna cum laude with honors from Wesleyan University, and his J.D. from the Boston University School of Law.


The Impact of AI on Records Management 

11: 00 – 11:30

Speaker: James Lappin, Loughborough University

Abstract: It seems likely that AI will, at some point in the relatively near future, provide originating organisations with the capability to re-aggregate records in any (or all) of their systems.  The organisations concerned could then, if they wished, begin to apply access permissions and retention rules to records on the basis of an entirely different set of aggregations to those used by the end-users that created and received them. 

This opens up a theoretical question: 

  • will the coming of such a capability mean that the way that records are aggregated in the day-to-day systems that end-users used to create and receive them (document management systems, collaboration systems, email systems, chat systems etc.) cease to have anything more than a purely temporary significance in recordkeeping? OR 
  • will that ‘original order’ of records instead place constraints and limits on the ways that original organisations can make use of artificial intelligence capabilities for improving the precision of the application of access permissions and retention rules? 

This talk will explore the impact of AI on the application of access permissions and retention rules within digital corporate systems, on the basis of a set of thought experiments carried out in James’ doctoral research project. 

Bio: James Lappin is a doctoral researcher with Loughborough University’s Centre for Information Management. He has worked in the field of archives and records management for thirty years as a practitioner, consultant, researcher, policy advisor, presenter, blogger, podcaster and cartoonist. 


Artificial Intelligence in the Public Sector: Separating Myth from Reality

10:30 – 11:00

Speaker: Dr Keegan McBride

Abstract: Artificial Intelligence (AI) developments in the private sector have quickly permeated throughout all levels of society. Daily we are – either directly or indirectly – interacting with AI. Acknowledging the benefits that AI has brought to the private sector, particularly regarding efficiency gains, many have started to talk about the potential benefits that AI may have when applied in and by the public sector. Certainly, some benefits do exist. However, it is also important to realize that the public sector and the private sector are different, with different requirements, goals, and aims. While the private sector may be concerned with customer acquisition, maximizing profits, or market growth, the public sector from does not have the same motivations. Thus, the reasons for applying AI, the ways in which AI is applied, and the areas where AI can be applied will be similarly different. To maximize the impact that AI can have, these differences must be acknowledged. Starting from this need for differentiation, this talk will help to demystify AI, by defining and outlining what AI is and is not. Following this, the talk will highlight the necessary building blocks for the usage of AI in the public sector: technically, infrastructurally, and legislatively. Finally, the talk will conclude by discussing how the public sector can best use and build AI by discussing procurement best practices and presenting best practice case studies.  

Bio: Keegan McBride is a departmental research lecturer in AI, Government, and Policy and Course Director of the Social Science of the Internet MSc program at the Oxford Internet Institute. He is an expert on the topic of digital government and has published research in leading journals and conferences on topics such as Artificial Intelligence in the public sector, open government data, government digitalization, and government interoperability and data exchange systems. In his research he aims to develop an understanding about the future trajectory of the state in the digital age by exploring the complex and co-evolutionary relationships between technology, society, and the state.