Data Mining in the Humanities
Mar 1, 2022 • 1 min read

What I learned with XML

XML is a markup language that is similar to HTML, but in this language you can define any tags for any type of use. It’s like a marriage between HTML and Python.

The lab we did is to transcribe century old letters into the XML format. The aim of this lab is to show how old documents can be preserved. Transcription has been a thing since ancient times. Since paper can’t last forever, people transcribe historical texts onto a new paper. Nowadays, people save them into a database.

Some patterns in the letters were that they were mostly about receiving letters. There was a big letter which was a diary of the writer’s experiences in the First World War. There was also a letter between Carl Maar and a relative, Charles Maar. He could be a brother, a father or a cousin. Carl seems to address Charles with respect, so he could be an older relative.

I think I find the results compelling, it was interesting to read out the letter and see the language of a century ago. To transcribe them to make sure that the record of this person, Carl Maar, is preserved.

One persistent failure of XML was the “could not create schema” error. Everything I typed led to that error banner popping up, but in the end, it didn’t affect the end product which was the XML typescript.

My findings about the texts is that Carl Maar, the writer, seems like a very upbeat and jolly man. He isn’t a person who argues with anyone. He’s the kind of person who would get along with everyone. He writes with enthusiasm and reports his experiences with detail like a skilled writer. He also seems very courteous when speaking to family elders or his superiors.

screenshot

Guest post by: Pranav G.