Search the Catholic Portal

Very satisfying!

I have made significant progress in the process of harvesting EAD files and preparing them for ingestion into the "Catholic Portal". This posting outlines the successes.

Assuming a Catholic Research Resources Alliance members place their EAD files in a HTTP-accessible directory, and those files have a .xml extension, then the following Perl scripts enable me to harvest and prepare them for indexing:

  • harvest-ead.pl - reads remote HTTP-accessible directories and copies all of the .xml files found there to a local cache
  • validate.pl - makes sure the cached XML files are well-formed and conform to the EAD DTD, and if not, then move the files to a different directory
  • transform.pl - reads the validated XML files, adds id attributes to all unitid elements through the use of a stylesheet (addunitid.xsl), transforms the resulting XML into HTML using another stylesheet (ead2html.xsl), and saves the result to an HTTP-accessible directory

What was really cool and a huge time-saver was the use of ead2html.xsl. Originally named AAAv2002-HTML.xsl, found on a page called User Contributed Stylesheets, and submitted by Stephanie Ashley, this stylesheet took my id attributes and automatically made named anchors for me. Boy, did I get lucky. "Thank you, Stephanie!"

My next step is to revisit my indexing routines.

Share this post:

Comments on "Very satisfying!"

Comments 0-5 of 0

Please login to comment