From 99ad3967d5b21392a94b29fac0622d87cbd15906 Mon Sep 17 00:00:00 2001 From: vak2ve Date: Mon, 6 Jan 2025 13:35:38 -0500 Subject: [PATCH 1/3] fix: september update @joewiz ready for html insertion of presentation --- hac.xml | 781 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 781 insertions(+) diff --git a/hac.xml b/hac.xml index 9b13a3d..b51cea2 100644 --- a/hac.xml +++ b/hac.xml @@ -309,6 +309,787 @@

Please select a meeting from the list in the left sidebar.

+
+ September 2024 +

+ Advisory Committee on Historical Diplomatic Documentation + September 9-10, 2024 +

+

Minutes

+ + + Committee Members + + James Goldgeier, Chair + Kristin Hoganson + Sharon Leon + Nancy McGovern + Timothy Naftali + Deborah Pearlstein + Elizabeth Saunders + Kori Schake + Sarah Snyder + + + + Office of the Historian + + Kristin Ahlberg + Carl Ashley + Margaret Ball + Forrest Barnum + Sara Berndt + Josh Botts + Tiffany Cabrera + Mandy Chalou + Elizabeth Charles + Kathryn David + Cynthia Doell + Thomas Faith + Stephanie Freeman + David Geyer + Renée Goings + Ben Greene + Michelle Guzman + Charles Hawley + Kerry Hite + Adam Howard + Richard Hulver + Alina Khachtourian + Virginia Kinniburgh + Laura Kolar + Aaron Marrs + Michael McCoyer + Brad Morith + Christopher Morrison + David Nickles + Nicole Orphanides + Paul Pitman + Alexander Poster + John Powers + Kathleen Rasmussen + Matthew Regan + Amanda Ross + Seth Rotramel + Daniel Rubin + Ashley Schofield + Nathaniel Smith + Douglas Sun + Claudia Swain + Brooks Swett + Melissa Jane Taylor + Chris Tudda + Dean Weatherhead + Joseph Wicentowski + Alex Wieland + Tristan Williams + James Wilson + Louise Woodroofe + + + + Bureau of Administration + + Jeff Charlston + Corynne Gerow + Timothy Kootz + Mallory Rogoff + + + Department of Defense + J.D. Smith + + + + National Archives and Records Administration + + William Bosanko + William Fischer + David Langbart + Don McIlwain + Mark Sgambettera + + + Public + Over 50 members of the public + +

Open Session, September 9

+

Presentation on the Office of the Historian’s Experiments + Using Today’s Artificial Intelligence Tools for Historical + Inquiry

+

James Goldgeier opened the session by introducing himself and by welcoming + all attendees, in person and online. He then noted that Adriane Lentz-Smith + had rotated off the Committee after the June meeting and introduced + Elizabeth Saunders of Columbia University as her replacement. He then turned + it over Adam Howard.

+

Howard introduced Joe Wicentowski’s presentation on AI He noted that + Wicentowski’s talk followed previous presentations on artificial + intelligence (AI) by the Department of State and other interagency partners. + Howard said he will moderate the Q&A following the presentation.

+

Wicentowski began his presentation on the Office of the Historian’s + experiments using today’s artificial intelligence tools for historical + inquiry, including the use of large language models and retrieval augmented + generation interfaces to published documents from the Foreign Relations of + the United States (FRUS) series and the use of multimodal models to + transcribe handwritten historical documents.

+

The presentation began with a preface celebrating the 15th anniversary of + OH’s public website, history.state.gov. Through a partnership with the + University of Wisconsin Madison, the Office digitized and converted + publications to TEI (Text Encoding Initiative) format, making all printed + volumes and legacy publications available online. The history.state.gov + website receives over 10 million visitors annually, with significant + international traffic, and plans are underway to make FRUS digital editions + accessible via the Libby e-book lending app.

+

Wicentowski started his introduction to AI, which focused on generative AI + tools and LLMs, not the broader field of machine learning. Wicentowski said + he was initially skeptical about the reliability and provenance of + AI-generated information, as these tools are trained on vast, undisclosed + internet data. However, practical applications have since emerged, such as + tools like ChatGPT and Claude, and can be used to interrogate documents by + asking questions in plain English. Despite limitations like context window + limits, which restrict the amount of information the tool can process at + once, these tools showed potential for asking questions about specific + documents.

+

Wicentowski went on to note that an AI tool developed by Amazon was tested + for reviewing annotations against a complex style guide, showing promise in + identifying mistakes and inconsistencies. The tool correctly identified some + mistakes and flagged issues worth checking, revealing some inconsistencies + in the style guide. This led to the exploration of using AI to improve the + style guide itself, making it more reliable for humans.

+

Wicentowski then shared that the most impressive development was the use of + multimodal AI models, particularly Google’s Gemini 1.5 Pro, to transcribe + challenging handwritten documents. Traditional Optical Character Recognition + (OCR) struggles with handwritten documents, but multimodal AI models can + transcribe handwritten text from images. The Gemini 1.5 Pro model + successfully transcribed a challenging set of handwritten index cards, + significantly improving accessibility and searchability. The tool processed + each card in 15–20 seconds and completed all 8,600 scanned images in 48 + hours, producing usable draft transcriptions that will need to be + reviewed.

+

Wicentowski concluded the results of these experiments show that generative + AI tools have great potential for some portions of the work of transcribing, + annotating, and querying historical documents and sources. The tools exhibit + clear shortcomings, making them inappropriate for some tasks, but for other + tasks, the office was able to mitigate these issues and derive utility + through persistent and careful experimentation and close review.

+

Howard then opened the floor for questions and noted that James Wilson would + moderate any online questions and comments.

+

David Nickles noted that people from India made up the 2nd most users of the + OH website and that China was not even in the top 10. If governments won’t + allow their citizens to access the OH site, how do we know who cannot access + our website?

+

In response, Wicentowski shared that it is very difficult to answer, since + our analytics tools only tell us how many visits we do receive.

+

Goldgeier said it was great that Wicentowski and OH are doing this kind of + work on the possibilities of using AI. He noted that the Consular Record + cards reminded him of his research in the Anthony Lake Papers at the Library + of Congress. He noted how many handwritten notes were on index cards and how + he was stunned how many references there were to Haiti and Somalia, rather + than Russia and/or China, and wondered if AI could analyze those cards and + determine that number. He then asked how receptive Google and Amazon + personnel have been to Wicentowski’s efforts to use AI, and whether they + give feedback on their tools.

+

Wicentowski replied that Google and Amazon engineers have not only been + receptive to his feedback, questions, and suggestions, but also encouraged + it and are eager for feedback.

+

Bill McAllister asked online about whether a researcher could ask an AI tool + to find all the important documents on a given subject and therefore + eliminate the need for FRUS compilers.

+

Wicentowski replied that this is a difficult question to answer. For example, + since compilers research and select documents differently, it’s unclear how + an AI tool could be trained to perform the work of a compiler. He invited + Kathleen Rasmussen to offer her thoughts.

+

Goldgeier said he would like to hear Rasmussen’s response to the question and + shared that he believes that AI can help, not replace, compilers. Can a + compiler ask an AI tool about assessing the importance of a set of documents + being consider for inclusion in a FRUS volume?

+

Rasmussen said it’s a great question, how can AI help rather than overtake or + replace historians? Can AI help us researching in giant documentary + databases such as State’s e- records? Some historians have indeed + experimented in seeing how this can succeed but the results have been + unclear so far.

+

Sharon Leon asked Wicentowski whether enriching published FRUS volumes with + technologies such as Named Entity Recognition (NER) and topic modeling might + offer concrete benefits to readers of FRUS. Wicentowski agreed that these + technologies are promising and worth investigating. Until the OH website is + able to offer such capabilities, he encouraged any interested parties to + access the FRUS source files on GitHub and apply their own tools to these + materials. FRUS is in the public domain and can be used by anyone for data + mining, and OH would welcome such experimentation and sharing of results. + Leon noted that running topic modeling software live on public-facing + websites can require extensive resources and investments; therefore, rather + than adding live, dynamic topic modeling capabilities, OH could pre-process + FRUS volumes and offer static views of the data instead.

+

Saunders said she’d learned more about AI from this presentation than any + other she’d experienced. She noted her first writing project relied on + trying to interpret President Kennedy’s handwriting, so she understands how + difficult it is for any tool, including AI, to decipher handwritten + documents. Her question was will AI get stupider over time? In other words, + how can an AI tool overcome or interpret the biases and egos and the like + that are reflected in documents that are, after all, created by human + beings? She then mentioned an upcoming journal article in which the author + examines Presidential Daily Briefs and identified the use of racial tropes + in the material submitted to the Presidents. Can the FRUS corpus of + documents also be analyzed by an AI tool to uncover such biased language? + How do human biases get reflected in FRUS documentation?

+

Wicentowski replied that he recognizes that there could be a problem with the + language used in original documents, and since the text of all FRUS volumes + is online, it is almost certainly to have been included in the set of + materials used to train most large language models. However, the entire text + of published FRUS is so small compared to the total size of the corpus + needed to train large language models that it alone would not likely + contribute meaningfully to the bias encoded in a large language model.

+

Saunders asked if the AI tool accepts the language as “how people will talk”? + Wicentowski replied that his hunch is that researchers using AI have to ask + the right research questions so that the tool can analyze the documents the + way the researcher wants. For example, you could ask the tool “show me an + example of racist imagery?” But we don’t know if the tool can distinguish or + recognize such language. Thus, the more specific the question can be posed + to the AI tool, the more likely it is that the tool will return an accurate + answer. The fine- tuning process used to train the AI tool to respond in a + “pleasing” way to the questioner can lead it to skew its answers to meet the + user’s preconceptions, so we need to both craft our responses explicitly and + judge its responses critically.

+

Snyder asked about the Consular Records cards that had been scanned and + when/if those will be made available on OH’s website.

+

Wicentowski said that OH doesn’t have a timeline for making them available + but would like to once they can be reviewed.

+

Snyder noted that there is a lot of manual work involved in augmenting AI + work.

+

The session ended at 10:58 a.m.

+

Opening of the Meeting

+

Goldgeier opened the session by asking for and receiving approval of the + minutes from the June meeting. He then welcomed new FSI Deputy Director + Maria Brewer and invited her to offer comments to the Committee and meeting + attendees.

+

Remarks from FSI Deputy Director Maria Brewer

+

Brewer noted she was pleased to attend and was looking forward to getting to + know the Committee members better. She said FSI and the Department are + committed to continue making positive progress on the issues under the + Committee’s purview. Brewer went on to provide information on her own + background and her current responsibilities at FSI.

+

Brewer welcomed Elizabeth Saunders to the HAC and also welcomed new FSI/OH + historians hired since the last HAC meeting. She noted that FSI/OH had made + 15 hires in the last year and two more were expected before the December HAC + meeting, while progress was also being made on filling the recently vacated + Assistant to the General Editor position. She also called attention to the + recent release of the latest volume, Foreign Relations of the United States, + 1981–1988, Volume XXXVIII, International Economic Development; International + Debt; Foreign Assistance.

+

Brewer explained to the Committee that FSI was in the process of reviewing + and working through how to act on the latest HAC report’s recommendations. + She closed with a brief discussion of FSI’s recent creation of and hiring + for a new Provost position and their supporting team.

+

Goldgeier thanked Brewer for her remarks and invited Howard to make comments. + Howard welcomed the new historians that FSI/OH had recently hired and called + attention to the release of the latest FRUS volume. He explained that this + volume was one of the first produced under the FRUS modernization initiative + and that Renée Goings deserved great credit for her leadership in making + FRUS modernization happen.

+

Remarks from the General Editor

+

Rasmussen discussed the recently released FRUS volume in greater detail. She + highlighted the significance of the topics it covers and described some of + the volume’s structure and specific content. Rasmussen also congratulated + the expansive team that contributed to the research, compilation, review, + declassification, editing, and publication of the volume.

+

Remarks from the Director of Declassification, Publishing, + and Digital Initiatives

+

Powers welcomed two new members to the Declassification, Publishing, and + Digital Initiatives (DPD) team that he leads. He also highlighted how many + staff members contribute to getting each volume published.

+

Report from the Office of Information Programs and + Services

+

Mallory Rogoff, Agency Records Officer and Division Chief of the Records and + Archives Management Division in the Office of Information, Programs, and + Services (IPS) described progress and developments in the Department’s + records management program. She explained that they had established an + Overseas Records Branch six months ago in recognition of the unique records + management challenges at overseas posts. Consequently, the Department now + has a team specifically dedicated to better engaging and serving overseas + posts, and this initiative will further enhance the Department’s + transparency efforts.

+

Rogoff explained that there are over 200 overseas missions that have a wide + range of challenges. Some have incredibly high turnover, while others are + “micro-missions.” The new Overseas Records Branch is learning each mission’s + operating environment to better assist them with records management. Rogoff + noted that in the 1980s and 1990s records management staff engaged in some + overseas travel to assist posts, but that ended with cutbacks in the 1990s. + Now, overseas records management travel is returning and they also are + making use of virtual platforms to assist posts. Rogoff noted that posts + greatly appreciate this new level of support. She concluded by providing + some examples of the issues they address, including disposition of very old + paper records, emergency preparedness regarding records, and the transition + to electronic records management.

+

Kristin Hoganson asked Rogoff how they were addressing overseas posts’ + practices in relation to the Office of Management and Budget (OMB) + electronic records mandate. Rogoff responded that the Department had + submitted an “exception request” to NARA to keep some permanent paper + records in place but noted that records management staff often finds that + posts’ paper records are temporary and thus eligible for destruction. Rogoff + said that NARA had not yet responded to the request.

+

Hoganson also asked Rogoff for an update on the status of the transfer of the + 1982 Central Foreign Policy Files to NARA. Rogoff responded that Agency + Records Officer Tim Kootz would address this in depth during the Committee’s + closed session.

+

Closed Session

+

Report from Information Programs and Services + (IPS)

+

Deputy Assistant Secretary Timothy Kootz stated that the number one problem + the Department faces is not the process of transferring the records to NARA + but digitalizing the content. He wondered if a public-private partnership + idea was feasible but the Department in the near term does not have enough + funding to digitalize these records. The United Nations and NATO had + recently accomplished this, and a public-private partnership created State’s + own Diplomacy Center.

+

David Langbart commented that a major roadblock is that State had not + completed review of the P reel index and had no capability to create + withdrawal slips as they had done in the past. Also, no meeting had been + scheduled between the Department and NARA.

+

Timothy Naftali asked if State had a foundation, or another way for outside + organizations to fund the digitalization. The project could be presented as + a public-private transparency initiative and generate positive publicity for + contributors. Kootz replied that this was a great idea.

+

Hoganson noted that these ventures would be expensive and that historical + organizations lack the resources to help. A concern is that contributors to + these projects might place digitalized public records behind private + paywalls. Kootz agreed that this would be a dealbreaker and observed that + DOD was currently working with the University of Maryland. He restated that + State would fully coordinate the records transfer with NARA. For instance, + State was working with NARA to solve an open-ended problem with the P reel + index.

+

Kootz reported that the FOIA staff in Charleston, South Carolina, was finally + in place and in full operation. The staff closed 5,000 more FOIA cases than + the previous year, but also is receiving more FOIA cases than ever before. + IPS policy is to tackle the new requests first, then deal with the backlog. + Bot requests are skewing the volume of FOIA requests and this is happening + across the federal government. State is working with the DOJ and other + agencies to resolve this challenge.

+

Snyder asked if FOIA cases are closed through resolution or non-response, and + whether there is any indication as to the agenda of the bot creators, if + they believe it is deliberate?

+

Kootz confirmed it was deliberate. The requests are overly broad and use the + FOIA requestors page which currently has no Captcha filter. IPS issues + “still interested” letters and often older requests have been overtaken by + events. With limited resources and a large FOIA backlog, where should IPS + devote their efforts? One initiative IPS is using is more personal contact + with requestors. This way they can often narrow the scope of the request and + close the case faster. It helps that there are many new and very motivated + FOIA employees. IPS is also going through a reorganization and the plan is + to have this completed by January 2025.

+

Naftali concurred that there should not be a perpetual paywall, but an + arrangement might be made for immediate information closed digital access + for those records at the presidential libraries, NARA, and the Bunche + Library—and free online access after five years behind a paywall. Kootz + agreed there was merit to these ideas.

+

Langbart added that this was essentially what NARA did with their digital + partner initiatives 15- 20 years ago. Ancestry digitalized genealogy records + and put them behind a paywall but the records were free at any NARA + facility. After 5–7 years Ancestry gave the digitalized records (sans + metadata) to NARA.

+

Leon stated that when Ancestry partnered with individual states, they also + did not supply them with the metadata. With only images, it is more + difficult to navigate the records. It is important to get everything you + need out of these partnerships.

+

McGovern noted the cultural clash between the library and archival + communities and the difference between library skills and archival skills. + Kootz acknowledged that federal records staffers are better at collecting + records than providing access. State’s goal with their FOIA reading room is + to deliver better access to the public.

+

Goldgeier asked about the Department’s progress reviewing documents using + machine learning and AI considering reports that other countries are + developing similar capabilities. Kootz agreed that there was a problem + because it was possible to compare redactions across thousands of documents + and use their inconsistencies to reveal excised information in other + documents.

+

Kootz disagreed that the solution was to release less or create unclassified + summaries of the affected documents.

+

Kootz then introduced J.D. Smith of the Department of Defense (DOD).

+

Smith described the DOD approach to using AI for declassification. Smith + opened by noting that AI could contribute to exceptionally grave threats to + national security. He pointed out that the interagency discussion on the use + of AI for declassification has been led by Kootz at State and Smith at DOD + and stressed the need for a federal approach.

+

Goldgeier asked whether the revision of the Executive Order related to AI. + Smith stated that the draft EO contained language on AI but that there was + no money for it. Kootz added that the draft EO included language about what + to do but not how to do it. Powers added that the EO will represent an + interim compromise. The federal government needs to know about its + vulnerabilities before adversaries do, and while some see AI as a threat, + Powers suggests that AI represents an opportunity. Kootz noted that the + government needs to provide the public with tools to counter deep fakes, and + that pulling things back may send people to false sources.

+

Report from the Department of Defense

+

Smith began his official report, and Goldgeier noted the HAC’s appreciation + of Records and Declassification Division’s (RDD) successful effort to + overcome the backlog in FRUS declassification reviews. Smith stated that it + “took a village” to take on the backlog. Within RDD, Scott Beaton worked out + a procedure for processing the 1,800 backlog cases; RDD also developed an + electronic system to track cases and generate response letters. Smith also + praised OH declassification staffers who helped understand the FRUS mission + and provided invaluable support. DOD has 26 separate declassification + offices. RDD can handle equities from the Office of the Secretary of Defense + (OSD) and the Joint Staff (JS) directly, but it must refer out documents + that contain other equities. RDD’s FRUS reviews release about 70% in full + and 30% in part. Only about 1% denied in full. Direct informal exchanges + with compilers may make it possible to release additional historically + significant material. RDD’s goal is to produce reviews that generate no + appeals, making it possible to get FRUS volumes out sooner. Overall, Smith + and the RDD team seek to work as partners, rather than impediments.

+

Naftali asked how DOD could apply the results of FRUS reviews to other + declassification reviews. Smith said that at present the results are + provided to human reviewers.

+

Howard asked how RDD could get more waivers from other DOD components. Smith + said that it was a matter of building relations and earning trust.

+

At the end of the discussion, Smith underlined that RDD’s success was due in + large part to input from OH for which he remains grateful.

+

September 10

+

Closed Session

+

Meeting with the Archivist of the United States

+

The Archivist of the United States, Dr. Colleen J. Shogan, made a brief + presentation to the committee discussing various initiatives that NARA is + undertaking. After her presentation, the members of the committee and Dr. + Shogan had a question-and-answer period covering various aspects of NARA’s + new initiatives, records issues, staffing, and more.

+

Transcript of Wicentowski’s Presentation on AI

+

WICENTOWSKI: Thank you, Adam. Good morning, everyone. I always appreciate the + opportunity to speak at this forum about the Office of the Historian’s + digital initiatives.

+

First, I’d like to briefly note an anniversary before moving on to the main + topic. Next slide, please. This year, the office’s public website, + history.state.gov, turned 15 years old. I recall presenting the website at a + HAC meeting shortly after its launch in 2009. As I said then in introducing + the website, our goal in creating history.state.gov was to uphold the best + traditions of diplomatic documentary editing while leveraging the + flexibility and power of the internet to give readers new tools for + researching with FRUS and our other publications and datasets that were + impossible or impractical with print. Next slide, please.

+

I just have a couple of slides here showing the website. Here is a landing + page of a FRUS volume with the ability to search within the volume, download + PDF and eBook versions of the volume. Next slide. Here is a document view + showing the text of the document, links to the original page images, and a + virtual table of contents helping you know where you are in the volume + however you get to it. Next slide. This is continuing in the document view + showing the persons sidebar, which is a virtual kind of dynamic list of the + people who were mentioned in the document and the descriptions of the people + from the person’s list in the volumes front matter.

+

The key to achieving that vision was adopting a new electronic format for our + volumes that could capture the content, structure, and semantics of our + volumes. The new format couldn’t be too rigid or it wouldn’t have been able + to accommodate the natural variations that marked FRUS over its 150 years, + now 163 years, in print. Among the various choices for digital formats, we + adopted the Text Encoding Initiative, or TEI, the de facto standard for + digital text projects in the humanities. And here is an example. We’ll stay + on this slide for the next paragraph or so, showing the underlying TEI + behind the document that we just viewed.

+

Having selected TEI as our format, we turned to digitizing and releasing our + volumes. Thanks to a partnership with the University of Wisconsin-Madison, + which had already scanned 100 years’ worth of our publications, 1861 through + 1960, we were able to create TEI editions of our volumes without needing to + re-scan these books. Next slide. Month by month, year by year, our + digitization vendors gradually converted all of our printed publications to + the new format, which we, namely Virginia Kinniburgh here, reviewed for + accuracy. Next slide. Our website and GitHub repositories, shown here, now + offer all printed volumes in the FRUS series, as well as all of our legacy + electronic-only publications and five of 13 legacy microfiche supplements, + the remaining eight of which are in the pipeline for release as resources + allow. Each volume encoded in TEI can be browsed like a book or searched + like a database. Next slide, please. Together readers can search across the + entire corpus, now over 310,000 documents, using keywords and dates, as + shown here.

+

Each year—next slide, please—each year, the website receives over 10 million + visitors and is among the Department of State’s top five public engagement + websites. It’s notable that nearly half of our visitors come from outside + the United States. In the coming months, we are excited to make the FRUS + digital edition even more accessible by bringing FRUS to Libby, the e-book + lending app used by many libraries. More to come on that soon.

+

Today, I am presenting on an area of great promise and great uncertainty, + artificial intelligence, or AI. To be precise, I am discussing the so-called + generative AI tools and large language models, LLMs, made popular by + OpenAI’s ChatGPT, not the larger field of machine learning, or ML. I will + describe our experiments with these tools, which began toward the end of + last year. First, a caveat. For the rest of my talk, my mentions of ChatGPT + and the other tools in this field doesn’t constitute an endorsement of these + products.

+

ChatGPT was released in November 2022, and it was quickly joined by a robust + group of competitors. Next slide, please. But for a full year—possibly + because I suffer from a lack of imagination—I couldn’t see a direct + application of this technology to our work. When you ask one of these tools + a question, they present a confident answer. But where did this answer come + from? How were these tools trained? Whereas Kathy Rasmussen, the General + Editor of the FRUS series, put the question to a team of engineers we were + meeting with: “Where did this AI go to school?” The answer is that ChatGPT + and similar generative AI tools are trained on a vast swath of the internet + at the cost of tens of millions of dollars for each generation of the tool. + But the companies that produce these tools don’t list the precise sources + that they consult. When you ask the tool to provide citations for an answer, + they will. But as with any factual questions you may put to these tools, the + answers may well be made up or hallucinated. So obtaining a reliable + provenance of information about the training inputs and generated outputs is + a major question for generative AI tools.

+

It’s easy to fall under the illusion that the tool producing words on the + screen is a thinking sentient being containing the sum of all human + knowledge. But despite their impressive abilities, these generative AI tools + don’t actually think at all. They merely use a statistical model to generate + an answer, one word at a time, each next word being what the tool judges to + have the highest statistical likelihood of being used in the context of your + conversation with it and its vast training set and data from the internet. + This is why the tools are called large language models or LLMs. They are + remarkable, but they are models of linguistic probability produced by + training on large amounts of text.

+

To address the training and provenance problems, we could theoretically train + our own model on our own documents, but the costs of training models make + this option prohibitive, not to mention that truly vast amounts of data are + required to train a model from scratch, far larger than say the corpus of + FRUS.

+

So short of training our own model, what could we use these generative AI + tools for? Many reports in the media praise ChatGPT’s ability to write in + the style of Shakespeare or in the voice of a pirate. Should we instruct + ChatGPT to compose a sonnet in the style of Henry Kissinger? This was hardly + a compelling idea, and I haven’t tried. These are the reasons why I + initially dismissed tools like ChatGPT as an impressive technical feat, but + ultimately a parlor trick without direct applications for historical + inquiry. What could you do with a tool whose output, you have to assume, is + a hallucination that might some of the time be accurate?

+

My thinking began to change one year later, this past November, when four + developments came to my attention in rapid succession. These discoveries led + me and several colleagues to dedicate time since then to investigate the + potential uses of AI for historical inquiry in general and in documentary + editing specifically.

+

What capabilities grabbed our attention? Next slide, please. Tools like + ChatGPT and Claude began to allow users to upload a file containing, say, a + document or an article, and ask the tool questions about the file. What is + this article about? What does the author argue? Finally, we could focus the + tool on our own data, on the data of interest to us, instead of relying on + the tool’s undisclosed training materials for answers of dubious origin, we + could do something never possible before, interrogate our own documents + using plain English, thanks to the LLM’s ability to process natural + language. This was starting to get interesting. If you haven’t done it + before, I would encourage you to go to ChatGPT, Claude, or Gemini, upload an + article, and ask the tool to summarize the article, or ask a question about + some of the arguments in the article.

+

Still, this technique has some limits. Next slide, please. These tools can + only keep a certain amount of information in their head during a + conversation. Specifically, they have a fixed limit on the number of words + they can keep in their short-term memory. This limit is called the context + window limit. The longer you talk to the tool, the more likely it is that + the sum of the words in your conversation will exceed this limit, and the + tool will literally forget the beginning of your conversation. As of my last + survey a few months ago, ChatGPT’s limit was approximately 25,000 words, or + 80 pages. The practical effect of this context window limit is that you + might upload an article or a document that is so long that the tool can’t + keep the whole thing in its memory, or quickly exhausts the limit after a + few questions. As a result, it won’t be able to provide a comprehensive + answer. Competing tools offer higher limits than ChatGPT, Anthropic’s Claude + Opus model boasts a context window six times higher, 150,000 words, or 500 + pages. Google’s Gemini 1.5 Pro offers a three times limit higher still, + 750,000 words, or 2,500 pages. That is finally enough to encompass even the + largest FRUS volume and is probably enough room for two average-sized FRUS + volumes.

+

Besides limits on the amount of data that the tools can accept as input, the + tools also limit the number of words they can produce as output. ChatGPT is + capped at 1,500 words, or about five pages. Claude doubles that, 3,000 + words, or about 10 pages. Gemini doubles that again, 6,000 words, or about + 20 pages. But no matter which you use, the length of the answers to your + questions is finite. Nonetheless, the possibilities are intriguing.

+

Can you think of any uses for asking questions about articles or documents? + One political scientist who uploaded all of his published papers to Google’s + Gemini wrote that he was impressed with the tools’ quality of answers and + confirmed that they reflected the conclusions he had reached in his works. + But for the Office of the Historian, with our collection of over 550 FRUS + volumes, even the largest models lack a sufficiently large context window + limit to be able to answer arbitrary questions about our corpus, or to ask + questions about even larger collections of data, such as the archives we do + our research in. So for now, the context window limit prevents us from + directly interrogating large corpora. On the other hand, we are confident + that since historians are hardly the only profession that would benefit from + the ability to ask questions of a large corpus of data, the companies that + produce these tools have a massive commercial incentive to serve these + markets. So we expect these context window limits to be eased more or less + gradually.

+

In the meantime, engineers have developed a technique to partially address + the context window limit, which enables an expanded range of historical + inquiry. Next slide, please. The technique is called Retrieval Augmented + Generation, or RAG. This was the second development I learned about that + caused me to get excited about the possibilities of generative AI tools.

+

The idea behind RAG is to pair a large language model with a database + containing your corpus of articles or documents. When you ask a question, + the tool first searches the database for portions of the documents that were + semantically most relevant to your question. Then it presents these excerpts + of the matching documents to the AI tool, and the LLM uses these excerpts to + compose an answer. This technique sidesteps the context window limit by + feeding excerpts to the model instead of complete documents. The idea has + the potential to be able to select the right information to answer your + question—potential.

+

After experimenting with many tools, which couldn’t handle a corpus the size + of FRUS, an engineer from Google introduced me to one of their tools that + was able to index all of FRUS. It’s called Google Vertex AI Search Agent. + That’s a mouthful. I set it up, pointed it at history.state.gov, and after + several hours during which the tool indexed every FRUS document, I was able + to begin submitting questions. Next slide, please. It limits its answers to + the top 10 most relevant document segments and composes answers four to five + paragraphs long with citations to each document it used for its claims. Here + you see a question that I asked at the very top and the AI tool’s + multi-paragraph answer. There’s another paragraph not shown here. At certain + points along the way, you’ll see a circle with a link icon. When you click + on that, it reveals what’s shown at the end of that first pink arrow, which + is a card summarizing or displaying the source FRUS document that it drew + that information from. If you follow that link to the original document, + I’ve taken a screenshot of the text in the document that you can verify it + drew from for its answer, rather than using its general internet training + set, it used this document to formulate its answer. This means that + questions that require 10 document fragments or more, or more than 10 + document fragments, will be unavoidably incomplete. So this answer shows + three little citation blocks. Each one of those might have one or more + links, but you’re only going to see 10 citations.

+

So this RAG approach with Google’s implementation has a limit of 10 links or + 10 excerpts that it can draw from to formulate an answer. So to use it + effectively, you really have to think. Is the question I’m answering, could + it be answered with 10 fragments or fewer? Or am I asking a question that’s + much broader and would require consulting a larger base of documents? + Besides its multi-paragraph answers and citations, it also presents a list + of all documents that it thinks are relevant. So more than 10, thousands + maybe, it would show you the list of all the relevant documents. You could + go and look to yourself for more information about the question you asked, + but its summary answer only includes 10 fragments.

+

In my preliminary testing, it has been quite impressive, with better results + than competing tools I had tried. I wouldn’t advise copying and pasting any + AI answer directly into an email or paper, but I think that with training in + the tool’s limitations, my colleagues here could already use this tool to + help them with their own research. Extensive testing and refinement would be + needed though before we could offer such a tool on our public website.

+

The third development emerged from a fortuitous meeting with an Amazon + engineer. I explained that we have a complex style guide, a lengthy manual + for annotating FRUS documents. My colleague James Wilson championed the idea + of an AI tool that could review our draft annotations for FRUS volumes + against the rules in the style guide. The engineer happened to have created + a proof of concept for just such a tool for a different style guide and + offered to show it to us and adapt it to our style guide. We provided him + with an excerpt of our guide and samples of FRUS annotations containing + intentional mistakes we had introduced to test the tool’s ability to catch + those mistakes. Next slide, please.

+

I don’t have a screenshot of the resulting tool, but this slide just shows + you a sample annotation sheet and the deletion that James inserted where he + removed some words in the document heading. In the lower portion of the + screenshot, I have a cutout of the style guide which indicates how the + headings should be formulated for meeting minutes. The tool did successfully + tell us that the document heading was lacking the “Minutes of a…” prefix. It + was able to apply that rule based on these examples listed in the style + guide. The tool correctly identified some of the mistakes, missing required + components of archival citations. In other cases, it flagged some issues + that were not mistakes but that our editors deemed worth checking. The tool + also revealed some inconsistencies in the style guide itself that inevitably + slipped in over the years that we had not detected. As a result, we are + exploring using AI not only to check annotations against our style guide but + also to improve the style guide itself, so it’s more reliable for humans, + too. Still, we’re quite some distance from being able to use such a tool in + our internal systems.

+

The fourth and final development that piqued my interest was the appearance + of tools that could transcribe handwritten historical documents. For + decades, we’ve enjoyed the use of digital scanning technologies and optical + character recognition or OCR technology for recognizing the letters and + words in typed and printed documents and allowing us to search and mine + them. As good as OCR technology is, it still produces output that can be + riddled with errors, and such levels of error may be unacceptable for + certain use cases. But for others, we have to laboriously proofread OCR + output to achieve a certain level of quality. Next slide, please.

+

In this slide, I have a scanned image of a document from the Reagan Library. + It looks very clean. You would think this would be amenable to OCR, but one + of the leading OCR tools produced the result in the bottom. That is the raw + OCR output. It’s not hard to see problems. If traditional OCR still has room + for improvement on typed or printed documents, the situation is much worse + for handwritten texts. Traditional OCR struggles or completely fails to + recognize and extract handwritten text. Although many documents produced + over the last century have been typed, we still have enormous quantities of + handwritten documents and continue to produce them. If text can’t be + extracted, we can’t search its contents and readers with visual disabilities + are hindered from accessing the information. Being able to effectively + digitize paper records and make them accessible and ready for research is a + challenge not just for historians, but for any organization seeking to take + advantage of the power of digital tools for searching and analyzing + text.

+

There are two ways in which generative AI tools are improving on traditional + OCR. First, generative AI tools are able to take error-laden OCR text, such + as that shown here, and try to fix the errors in the text. Next slide, + please. So in the top portion, we see the original OCR output riddled with + errors, and in the bottom text shows the result of the prompt asking ChatGPT + to correct obvious errors.

+

Still, I wished for a tool that could perform its own OCR that wouldn’t be + limited to cleaning up bad OCR. In the last few months, ChatGPT and its + peers began releasing models with this capability. These models are called + “multimodal” meaning that they can take as their input not only text, but + also images, audio, and video. To trigger the tool’s document vision + capabilities, you can’t upload a PDF. You have to upload an image of a + document, a JPEG or a PNG. Once you upload one of these types of images, you + can ask the tool for a transcription.

+

But how good are the results? Next slide, please. As a test, I chose a + particularly challenging set of documents, a set of 6,500 handwritten index + cards that we had scanned a decade ago, but could not effectively exploit + because OCR tools could not decipher the handwriting. The index cards + contain listings of consular officials at U.S. diplomatic and consular posts + from 1789 to 1960. Next slide, please. I have a slightly zoomed-in version + so you can see more detail. The cards are nearly all handwritten in ornate + cursive. Each card contains a mix of tables and marginal notes and + headings.

+

After experimenting with ChatGPT and Cloud with mixed results, I finally + derived very impressive results from Google’s Gemini 1.5 model. Next slide, + please. It transcribed the text of many cards perfectly and was able to + capture the card’s mixture of tabular and non-tabular comments and + marginalia, a feat that no other model matched. So here we see the results + of Gemini’s transcription of this card. If you look closely, you will see a + few mistakes or variations, like the middle initial of the person. But it + did a very respectable job with the text, and it captured the structure of + the card— the heading of the card, and the column headings, the cell + boundaries—it did quite a good job. Next slide, please.

+

And unlike other tools, this is the transcription of the bottom portion of + the card where there’s a new heading and several comments that are not part + of the table that are sort of inserted over the table. It captured those + non-tabular remarks perfectly. It wasn’t flawless. For some cards, it merged + the contents of adjoining cells or omitted certain columns. For about 5 + percent of the cards, it produced scrambled results for reasons we have not + yet had the chance to investigate. Next slide, please. But in most cases, it + correctly or nearly correctly transcribed the names, birthplaces, and dates + and places of service of the officials listed on the cards. The tool took + around 15 to 20 seconds per card and completed all 8,600 scanned images in + 48 hours, overnight as I slept. We were astounded. It took 48. I was able to + sleep longer because of this tool, yes.

+

We were astounded that after such a short time, we could search the cards in + ways that we had been unable to for a full decade or even before in their + original paper form. The cards will need to be reviewed, but this review + will be starting from a respectable first draft. And here in this image, + you’ll see the transcription of one row, George H. Jackson, and we + noticed—this experiment wasn’t the first to notice this phenomenon—that in + the cards, some individuals appended to their name in red pen was a + designation colored. But having the searchable form of it, we were able to + search for all instances of that word in the cards. And the 25 cards that + had that designation, sometimes in ditto marks underneath, revealed the + names of employees we did not previously know about in our lists that we’ve + been compiling of Black Americans in the State Department or employees of + the State Department. In most cases, it correctly or nearly correctly + transcribed the names. Next slide, please.

+

In conclusion, the results of these experiments show that generative AI tools + have great potential for some portions of the work of transcribing, + annotating, and querying our historical documents and sources. We can use + natural language to query individual documents or articles. We can perform + semantic search across large corpuses of data and obtain draft answers to + questions based on short excerpts of relevant texts. We can receive + automated feedback on our annotations, and we can produce usable draft + transcriptions of complex historical documents. The tools exhibit clear + shortcomings, making them inappropriate for some tasks. But for other tasks, + we were able to mitigate these issues and derive utility through persistent + and careful experimentation and close review.

+

In addition, we noticed these tools improving during the course of our + experiments. So if you begin experimenting and hit some disappointing + results, you might put your experiments down and wait a few weeks or months. + By the time you try again, a new model may have emerged that addressed the + flaws of the previous generation. Given the massive and growing scale of + commercial investment that I mentioned before, we can anticipate that many + of the limitations that we see today will dissipate, and new paradigms will + quickly replace today’s offerings. New capabilities are sure to come to + these tools, and we should be ready to evaluate them. In the meantime, we + are finding valid use cases for these tools in certain limited scenarios, + when paired with a healthy dose of caution and skepticism.

+

Thank you.

+
June 2024

From 57c06c32e3096b02ac5067cd3be0517bd2d6fffe Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Tue, 14 Jan 2025 20:00:07 -0500 Subject: [PATCH 2/3] Add video of September 2024 public presentation --- hac.xml | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/hac.xml b/hac.xml index b51cea2..07ce19a 100644 --- a/hac.xml +++ b/hac.xml @@ -415,8 +415,9 @@

Open Session, September 9

Presentation on the Office of the Historian’s Experiments - Using Today’s Artificial Intelligence Tools for Historical - Inquiry

+ Using Today’s Artificial Intelligence Tools for Historical Inquiry + (see video and transcript + below)

James Goldgeier opened the session by introducing himself and by welcoming all attendees, in person and online. He then noted that Adriane Lentz-Smith had rotated off the Committee after the June meeting and introduced @@ -750,7 +751,25 @@ undertaking. After her presentation, the members of the committee and Dr. Shogan had a question-and-answer period covering various aspects of NARA’s new initiatives, records issues, staffing, and more.

-

Transcript of Wicentowski’s Presentation on AI

+

Below is a video edition and transcript of the lecture presented by Dr. + Joseph Wicentowski during the public session.

+

+ Experiments using artificial intelligence tools for + historical inquiry (Video and transcript)

+ +
+
+