ES EN

The Herculaneum papyri (#9 of 2024)

2024-05-24

👋👋 Hi, I'm Domingo!

Here we are, another Friday, with the novelty I promised in my last issue: a special issue.

Why a special issue? Because of my usual tendency to dig and accumulate browser tabs. In this case, I started looking into the Herculaneum papyri as a possible news item for the late-April issue. But one website led me to another, and another, and another, and in the end the whole thing got out of hand.

Let us get to it, and thank you very much for reading me.

On April 25 I read an intriguing headline: “A Herculaneum papyrus reveals the precise location of Plato's tomb” [Deciphered Herculaneum papyrus reveals precise burial place of Plato - arstechnica.com]. I had already read or heard about these papyri before, for example in Maria Ribes's explanation in episode 451 of Coffee Break. But I had never looked into them in depth. I wanted to understand the story properly, started searching the web, and ended up finding a fascinating story that I had to share: how an Artificial Intelligence competition has made it possible to reveal writings contained in papyri carbonized by the eruption of Vesuvius more than 2,000 years ago.

And, by the way, it turned out that the original news item had nothing to do with the papyri we are going to talk about here.

Nat Friedman's post on X about the news of Plato's tomb and the Herculaneum papyri.

Origin

What are these papyri? What happened in Herculaneum?

The story begins in the 1st century BC, in the Roman city of Herculaneum. A wealthy Roman nobleman, Lucius Calpurnius Piso Caesoninus, who would become the father-in-law of Julius Caesar, built a luxurious villa on the outskirts of the city, the so-called Villa of the Papyri [Villa of the Papyri - wikipedia.org].

It was a large residence, full of artistic elements such as frescoes, mosaics, and sculptures, and equipped with a large library containing a vast collection of classical texts, in Greek and Latin, covering a wide range of philosophical and literary topics. These texts were written on rolled papyri, stacked horizontally on shelves.

All that wealth would have disappeared had it not been for the eruption of Mount Vesuvius in 79 AD. The eruption covered Pompeii in ash, and Herculaneum as well. This ash, paradoxically, acted as both destroyer and preserver. The lava and ash that buried the villa created a time capsule that preserved the papyri along with other valuable objects. This phenomenon made it possible for archaeologists and specialists, almost two millennia later, to rediscover those treasures.

The following video shows how the flood of ash and hot volcanic material covered the rolls and carbonized them.

Discovery and first attempts to read the papyri

The 1908 book by archaeologist Ethel Ross Barker, Buried Herculaneum [Buried Herculaneum - archive.org], describes in detail the history of the excavations at Herculaneum and the discovery of the buried treasures and the papyri. In the mid-18th century, the villa and its surroundings began to be excavated, and in the autumn of 1752 the first finds appeared: 21 rolls and fragments contained in two wooden shelves. They looked like burnt charcoal logs, and some had been mistaken for exactly that and thrown away. In the following years many more were found, up to a total of 1,806. Of those, 341 were complete rolls, 500 were fragments, and the remaining 965 were in an intermediate state of preservation.

The following photographs show different examples of these rolls and fragments.

In many of the remains, the writing contained in the papyri could still be seen, as in the example shown in the next image.

Other complete rolls were separated by cutting them in half or carefully unrolling them, turning them into fragments that researchers then tried to classify and reorder, as happened with papyrus no. 10.

As a result of the analysis of all these remains, it was possible to determine that the great majority of the papyri are Greek texts belonging to the Epicurean philosopher Philodemus of Gadara, a 1st-century BC philosopher who lived in the region. These works deal extensively with topics related to ethics, poetry, music, and logic. The collection also includes around twenty Latin papyri that have not yet been deciphered.

Scrolls still awaiting decipherment

More than 1,000 rolls and fragments still remain to be deciphered. Many of them are completely carbonized rolls that look like lumps of charcoal. That is the case with PHerc 1667, an intact part of the interior of a roll, with an approximate diameter of 3 cm and a length of 8.5 cm. The outer parts of the roll were separated from it in an attempt to “unroll” it.

Papyrus PHerc 1667, image taken from the technical notes for the 2023 data capture.

Another example is roll PHerc 332, where you can make out the individual rolled layers that compose it. This papyrus measures 7.7 cm in length and 2.6 cm in diameter.

Papyrus PHerc 332, image taken from the technical notes for the 2023 data capture.

And one last example of another roll, deformed and solidified by carbonization.

Image taken from Brent Seales's video: Herculaneum scrolls: A 20-year journey to read the unreadable.

At first glance it seems impossible to extract the slightest information from these carbonized blocks. It does not seem plausible that the ink could have survived that carbonization. But what if we perform a tomography of the papyrus? Could we reconstruct its interior and examine it without damaging it? Could we find traces of ink and decipher the writing?

First tomography of one of the papyri

Brent Seales, professor of computer science at the University of Kentucky [Brent Seales - uky.edu], tried to solve the problem in 2009. Together with his team, he traveled to the Institut de France to perform the first micro-computed tomography of one of the rolls. The result is a sequence of scanned images like the one below, obtained at a resolution of 14 micrometers, 0.014 mm.

The inside of the roll seems to have been preserved, and the layers of the rolled papyrus can be observed, but not with enough definition to separate them automatically and find ink. The problem was simply too complex. In their 2011 article [Analysis of Herculaneum papyri with x-ray computed tomography - scholar.google.com], they conclude by saying:

We have encountered serious difficulties in analyzing the data because of the complex nature of the papyrus's internal structure. Automatic separation of the papyrus layers has proved virtually impossible. A manual reconstruction of a small region was attempted, but it was not possible to make the ink visible.

But Seales is not discouraged and remains convinced that the approach is the right one and that it will eventually work. What is needed is more resolution and better algorithms. And the approach also needs to be validated on a simpler problem.

Brent Seales deciphers the En-Gedi scroll

In 2015 Brent Seales and his team showed that the answer is yes in the case of the En-Gedi scroll [En-Gedi scroll - wikipedia.org]. The scroll is made of animal skin, specifically leather, unlike the Herculaneum rolls, which are papyrus. It was discovered in 1970 in a synagogue in En-Gedi, Israel, and dates from the 3rd or 4th century AD.

The En-Gedi scroll had also been carbonized and was found in a very fragile state. Even so, Seales's team managed to apply its method, demonstrating that it was possible to virtually unroll it from its three-dimensional tomographic image.

Professor Seales explains it very well in the following video.

We extracted a few fragments from the video to detail the phases of the process.

  1. First, a three-dimensional scan is made using a micro-computed X-ray tomography technique, with micrometer resolution. This yields a three-dimensional volume of the scroll and its interior.

  1. Then a line is selected in the three-dimensional image corresponding to a section containing one sheet of the parchment, and a patch of the parchment is reconstructed. In that reconstruction the writing can already be seen.

  2. The same process is repeated for all possible patches. Once they have been obtained, overlapping regions are checked and fitted together, completing a kind of puzzle that reconstructs as much of the parchment as possible.

The success of the project led to the publication of several scientific articles [From damage to discovery via virtual unwrapping: Reading the scroll from En-Gedi - science.org] and to a $14 million grant from the National Science Foundation [UK Awarded $14 Million NSF Grant to Launch World-Class Cultural Heritage Lab - uky.edu] to create the EduceLab center, led by Seales himself, with the fundamental goal of reconstructing and preserving texts from antiquity and, specifically, the Herculaneum papyri.

Detecting ink in a papyrus fragment

The techniques that worked on the En-Gedi scroll, however, did not produce good results on the Herculaneum papyri. For one thing, the material was different. Papyrus is a kind of woven plant material, and until then no one had succeeded in recovering ink from carbonized fabric. In addition, the papyri were much more tightly rolled than the En-Gedi parchment, and the layers to be virtually unfolded were much more intricate.

In 2016 two newly graduated computer science students joined the project, Seth Parker and Stephen Parsons, and they ended up being decisive for its success. The first specialized in processing the data obtained from the X-ray tomography, while the second focused on 3D reconstruction and machine learning with neural networks.

Parsons's goal was to obtain a convolutional neural network capable of extracting preserved ink in papyrus from the volumetric data of a 3D scan. To train the neural network, they used separated papyrus fragments in which the ink was visible, together with their volumetric 3D scans.

Training data for the neural network that recognizes ink in papyrus. Stephen Parsons's doctoral thesis.

After numerous attempts, Parsons managed to develop a model which, when applied to these papyrus fragments, produced promising initial results. The following figure shows the final output of the neural network on the left, where the white points represent ink. There is still a lot of noise, but some letters are correctly identified.

In 2019 a new scan of a complete roll was performed at the United Kingdom's particle accelerator Diamond Light Source. Using more energetic X-rays, they obtained a scan with a resolution of 8 micrometers, 0.008 mm, almost twice the resolution of the first scan.

Using Parker's software, the team improved the tools used to process the data and worked out a possible workflow for reconstructing part of the roll. It is very similar to the one used for the En-Gedi parchment.

  1. The layer of the roll to be virtually unwrapped is selected manually:

  2. After selecting that layer in consecutive slices, a surface corresponding to a fragment of papyrus is obtained:

  3. Finally, that surface is expanded by thickening it with the data above and below in the original scan, producing a surface with some volume:

  4. And it is on that volume that the neural-network model is applied, in the hope of detecting the ink points:

The problem was that, despite all the work and the high resolution of the data, the model did not perform well enough to extract a complete word. Even so, it represented a robust starting point from which that goal could eventually be reached. As Parker and Parsons wrote at the end of the article they published in 2019 [From invisibility to readability: Recovering the ink of Herculaneum - plos.org]:

With the proven ability of our machine-learning pipeline to detect the carbon-ink signal and render it photorealistically, the scholarly community may be one step closer to witnessing “a bursting forth of genius from the dust”1 from Herculaneum.

Although they did not manage to decipher any words from the roll, Parsons and Parker laid the groundwork for the next advances. And, most importantly, they also finished their doctoral theses: Parsons in 2023 [Hard-Hearted Scrolls: A Noninvasive Method for Reading the Herculaneum Papyri] and Parker in 2024 [Flexible Attenuation Fields: Tomographic Reconstruction From Heterogeneous Datasets].

The Vesuvius Challenge competition

At the end of 2022, the computer scientist and entrepreneur Nat Friedman [Nat Friedman - wikipedia.org, nat.org] learned about the project, got in touch with Seales, and proposed making all the datasets public and organizing a competition to improve the neural-network model and, eventually, to read complete scrolls.

Nat Friedman's first post on X referring, without naming it, to what would become the Vesuvius Challenge.

The idea of organizing a competition to obtain or improve an AI model is a common one in the field. For example, the website Kaggle, founded in 2010 [Kaggle - wikipedia.org], has organized hundreds of competitions in which tens of thousands of enthusiasts and specialists have participated.

Friedman, who had led major software projects and companies such as Ximian, Xamarin, and GitHub, knew that the competition needed to be organized very carefully, monitoring its development and structuring it in such a way that collaboration among participants and the sharing of results would be encouraged. Offering an attractive prize was not enough, it was initially $500,000, $250,000 from him and $250,000 from entrepreneur Daniel Gross [dcgross.com], it also had to be managed meticulously and every detail supervised closely.

In November, Friedman published a call looking for a technical lead for the project [Hiring tech lead to help solve major archaeological puzzle - nat.org] and ended up hiring JP Posma [I can announce it now - x.com], who organized the competition website and set up its presence across the different social networks where it would be launched.

Finally, on March 15, 2023, the Vesuvius Challenge website was launched.

Current homepage of the Vesuvius Challenge competition website.

That same day the competition was also launched on several social platforms, Discord and X, a newsletter was created, and the competition opened on Kaggle.

In a very short time the competition became highly popular, donations rose to more than a million dollars, and many participants were drawn to work with the tools and data made available.

To win the final prize of $700,000, competitors had to decipher, before December 31, 2023, four separate passages of text, each containing at least 140 characters of continuous text. But, in order to foster cooperation, “progress prizes” of between $1,000 and $10,000 were also proposed every two months. To win these prizes, participants had to publish their code or research openly, thereby benefiting the progress of the whole community.

  • On April 15 the first four open-source prizes were awarded, $2,500 each, for the creation and improvement of tools and contributions to the community.

  • On June 27 progress prizes were awarded for segmentation contributions and the automatic extraction of papyrus patches, one of the most difficult problems. It was decided to hire “segmenters” to extract different patches and make them available to the community.

  • In the following months quite a few progress prizes were awarded, but it did not seem that clear progress was being made. Until, on October 13, 21-year-old student Luke Farritor found the first word in one of the papyrus pieces: ΠΟΡΦΥΡΑϹ, porphyras. It means “purple,” and it is a rather rare word in ancient texts.

    First word found in one of the Herculaneum papyrus rolls.

    Luke himself tells the story of the excitement of the discovery in this video.

  • After Luke's success, Youseff Nader, an Egyptian PhD student in Berlin, tried a new neural-network model on the same patch where Luke had found the word. He used Luke's results to reinforce the model's learning and obtained a surprising result: an image in which the word previously found appeared much more clearly, and two more words could also be read, one above it and one below it.

    Image resulting from applying Youseff's neural network to the same fragment in which Luke found the first word.

  • On February 5, 2024, the winners of the $700,000 grand prize were announced: a super-team formed by the previous winners, Youseff and Luke, together with the young Swiss participant Julian Schilliger, managed to obtain 15 columns, 11 more than were required, and more than 2,000 characters in total. This represents only about 5% of the total content that must be present in the papyrus.

The challenge had been achieved. Two thousand years after being buried by the eruption of Vesuvius, three young people had read for the first time one of the carbonized papyrus rolls. They had shown that the project Brent Seales had begun more than fifteen years earlier was viable. That carbonized papyri could be virtually unwrapped and read with a neural network.

The following figure is the image submitted by the winning team.

Image with the reconstruction of the papyrus text provided by the winning team.

A team of experts assembled by the Vesuvius Challenge studied the text, transcribed it, and translated it, validating that its content is related to the collection of Epicurean texts found in the Villa. On the competition page you can find the fragments transcribed into Greek and translated.

Column from the reconstructed image, with the text highlighted after processing, and its Greek transcription on the right.

The three winners of the challenge received the main prize, but there were also many additional prizes, not only the “open source” and “progress” awards, but also prizes for three other finalist teams, who each received $50,000.

Winning team of the final 2023 prize in the competition.

The complete list of prizes awarded so far, up to April 2024, amounts to $1,236,500. On the competition website you can find the full list, with links to the GitHub pages corresponding to each award.

Celebration and future

On March 16, 2024, an event was held at the Getty Villa in Malibu [Getty Villa - wikipedia.org] where the prizes of the Vesuvius Challenge competition were awarded. The villa, designed by billionaire Paul Getty and inspired by the Villa of the Papyri, is a museum devoted to the study of Greek, Roman, and Etruscan antiquities. The museum is also one of the collaborators in Brent Seales's EduceLab project.

Historians specializing in the Herculaneum papyri took part in the event, along with Nat Friedman and Brent Seales himself. It was a celebration of a success achieved through the combination of traditional methods and innovative approaches: on the one hand, collaboration among academic institutions and public funding, and on the other, a rather radical idea promoted by an entrepreneurial technologist and supported by donations from Silicon Valley, the open-source community, and the enthusiasm of young specialist hobbyists connected online.

As for the future, the Vesuvius Challenge is still ongoing, led by Nat Friedman and now with Stephen Parsons as technical director. The challenge newsletter is still active, and you can subscribe to receive updates.

They continue awarding progress prizes every two months and have set a major challenge for 2024: $100,000 for the first team capable of reading 90% of the scrolls. There are also, as in 2023, $30,000 prizes for the first letters of scrolls 2, 3, and 4.

The deadline for submissions is December 31, 2024. Just like last year, the results seem to be taking time and nobody has yet managed to win any of these prizes. A certain sense of pessimism is beginning to be noticeable, but there is still plenty of time left before the end of the year. If the grand prize is achieved, it will be a historic success and a giant step toward the broader plan of scanning the remaining 300 papyri, most of them in Naples.

Nat Friedman has a long-term vision which he calls “The Master Plan”. Its final part, phase four, is to excavate the Villa of the Papyri again in order to recover the entire library, with the thousands of papyri that may still be buried there.

A very ambitious vision. We will see what can be achieved, and whatever happens, we will keep telling the story here.

Until next time, see you then! 👋👋

1

The quote "a bursting forth of genius from the dust" comes from the poem September, 1819 by the famous English Romantic poet William Wordsworth (1770-1850). In that poem, Wordsworth reflects on nature and the passage of time, evoking poetic images of resurgence and discovery. The phrase appears in a passage where the poet speaks about the excitement of discovering ancient literary fragments, specifically those buried by the eruption of Vesuvius at Herculaneum.