Data Mining in the Humanities
Mar 1, 2022 • 1 min read

Second Blog Post

My experience with TEI-XML Encoding

For this lab assignment students were tasked with learning how to use and implement XML. By utilizing TEI-XML encoding we were able to digitize letters related to Rutgers Alumni. The advantage to encoding these letters is their intangibility, thus allowing them to never get lost or damaged. Another advantage to encoding these letters is the convenience of storing a large amount of data compactly and efficiently. This allows thousands of letters to be stored on a device much smaller than a storage room.

When I was in highschool I learned how to “code” a little bit using HTML. Through my experience with XML I noticed some similarities between it and HTML. The syntax is very similar in nature with opening and closing tags. The major difference between the two programs is that HTML is more oriented towards displaying data while XML is more oriented towards the storage and transfer of data. Through a little research it is obvious why there are inherent similarities between the two programs, they both originate from the same language: SGML or Standard Generalized Markup Language. The process of finding and transcribing all the data using XML went smoothly. The only hiccups were finding information on some people mentioned in some of the pdfs such as Pauline White and Robert H. Marr, but with the assistance of a classmate, these issues were quickly resolved.

My findings agreed with the arguments in the readings as the final file was only a few kilobytes. For reference, the minimum storage for an iPhone is 125 gigabytes which is in turn is 125,000,000 kilobytes. This means that an iPhone can store roughly 50,000,000 (50 million) letters. Much more than a storage room…

code

This image demonstrates the ease of inputing information using XML and its similarities with HTML in syntax.

Guest post by: Christopher Ng