Data Mining in the Humanities
Mar 1, 2022 • 1 min read

XML Encoding Lab!

Coding in XML!

    For this lab, students were required to encode letters related to Rutgers alumni using TEI-XML encoding. By encoding the letters using XML, it makes it easier to store a bunch of information in a convenient, compacted space; you could even carry a room full of letters in a thumb drive!

    I’ve had some experience with coding in the past, and “coding” in XML gave me déjà vu. It is surprisingly similar to HTML, and after doing further research, it makes sense why they’re so similar. Both HTML and XML are derived from the same language: SGML or Standard Generalized Markup Language!

    The results of the lab were very compelling, as the process of entering all the correct information using XML felt very satisfying. The process itself went very smoothly, but there were times where I encountered a hiccup. When we were required to do research on the people mentioned in the letter, I wanted to find more information about Daniel Smart’s sister, Elizabeth Smart. However, there was no information that I could find on Google. Luckily enough, with Professor Giannettti’s assistance I was able to find enough information using Ancestry and Findagrave.

    After finishing the project, I can firmly say that my findings verify the information presented in the assigned reading. The XML file only turned out to be around 2 kilobytes, which on a modern flash drive, is almost nothing! For reference, a single gigabyte is 1048576 kilobytes! That’s more than five-hundred thousand letters! Now that’s a lot of space that you save!

Below is a image of the storage space that the XML and the PDF file of the same letter take up. As you can see, the XML file is significantly smaller than the PDF file!

File Size Comparisions

Guest post by: Andy Mao