VUFind record drivers and templates

This posting documents how I wrote and edited a couple of VUFind record drivers and Smarty templates for the "Portal" of the Catholic Research Resources Alliance. In writing this posting I hope to support any developer coming behind me as well as inform the wider open source community on how VUFind works.

The Problem

Read More

Text mining Catholic pamphlets

This is the quickest of blog postings outlining how I am initially providing a text mining interface to digitized Catholic pamphlets.

Jean McManus used a scanner to create PDF versions of a few Catholic pamphlets. Along the way, she also had the software to a bit of OCR. She then gave the PDF documents to me with filenames matching MARC 001 fields.

Read More

CRRA Update October 2010

CRRA Update OCTOBER 2010

  • CRRA Welcomes Dominican University and University of San Francisco
  • Spotlight on Portal Development: Progress on EAD; VuFind Announces 2.0 Roadmap
  • CRRA January 6, 2011 in San Diego – Make your plans to join us!
  • Call for Proposals: IMLS National Leadership Grants

New Member Highlights

Read More

Internet Archive content, VUFind (Solr), and text mining

The posting outlines how I have: 1) mirrored metadata and full text content from the Internet Archive, 2) made the mirrored content accessible through VUFind, and 3) implemented a rudimentary text mining interface against the mirror.


The "Catholic Portal" is intended to be a research tool centered around "rare, unique, and uncommon" materials of a Catholic nature. Many of these sorts of things are older as opposed to newer, and therefore, many of these things are out of copyright. Projects such as Google Books and the Open Content Alliance specialize in the mass digitization of out of copyright materials. By extension we can hope some of the things apropos to the Portal have been digitized by one or more of these projects.

Read More

Names & addresses

This posting outlines how the names & addresses of the "Catholic Portal" are made available. The purpose of this posting is mostly documentation. Documentation for myself, since I always forget. And documentation so somebody else can do the work after I win the lottery and move to the beach to drink cocktails with umbrellas in them.

Here goes:

Read More

Digital Access Committee (DAC) Meeting

Today we had a CRRA Digital Access Committee (DAC) meeting via the telephone. Attendees included:

  • Ann Hanlon
  • Demian Katz
  • Eric Frierson
  • Eric Morgan
  • Kevin Cawley
  • Pat Lawton
  • Susan Leister
  • Thomas Leonhardt

I did a bit of "Portal" show & tell demonstrating the work done to date on indexing EAD files. (See the previous blog posting.) We then discussed ways the indexing/display could be improved. Suggestions included:

Read More

Indexing MARC and EAD in VUFind with Solr for the CRRA

This posting outlines how I am currently indexing MARC and EAD files in VUFind with Solr for the CRRA. (Boy, there are a lot of acronyms in that sentence!)


The Catholic Research Resources Alliance (CRRA) is a member-driven organization with the purpose of making available "rare, unique, and uncommon" research materials for Catholic scholarship. Presently the membership is primarily made up of libraries and archives who pool together their metadata records, have them indexed, and provide access to the index. My responsibility is to build and maintain the technical infrastructure supporting this endeavor.

Read More

Very satisfying!

I have made significant progress in the process of harvesting EAD files and preparing them for ingestion into the "Catholic Portal". This posting outlines the successes.

Assuming a Catholic Research Resources Alliance members place their EAD files in a HTTP-accessible directory, and those files have a .xml extension, then the following Perl scripts enable me to harvest and prepare them for indexing:

Read More

EAD @ Marquette 4 CRRA

This is the briefest of travelogues reporting on a meeting about EAD files at Marquette University for the Catholic Research Resources Alliance on September 20, 2010.

marquette sights

Read More

Today I indexed some of the metadata I extracted yesterday using a script called Of all the scripts I've written so far, this one is the most straight-forward. Read locally-developed XML file. Extract the unique identifier, title, and date. Associate each with VUFind/Solr fields. Commit.

You can (temporarily) see the fruits of these labors because all of the records have been associated with the Eric Lease Morgan Foo Bar Library. The result is a list of container-level records with very little additional information.

Read More

CRRA Update September 2010

CRRA Update

In this update …

Read More

Preparing EAD files for indexing

This posting outlines how I plan to prepare EAD files for indexing with Solr, the underlying indexing technology of VUFind.

The problem

I am aggregating sets of EAD files from Catholic Research Resource Alliance members. I am expected to index these files at the most granular level possible -- meaning at the did level. In order to satisfy both human and computer requirements, each indexed record needs at least a unique identifier, a human-readable descriptor, and a location code. The unique identifier can be gotten from the unitid element. The human-readable descriptor can come from the unittitle. The location code can be inferred from the url attribute of the eadid element.

Read More

Adding unitid elements to did elements

This posting outlines how I believe I will add unitid elements to did elements of EAD files.

The problem

As the CRRA matures, I expect a greater amount of the metadata ingested into the "portal" will come from EAD files. In order to index EAD files meaningfully, I need to extract unique identifiers from each container-level element, a human-readable description of the container, and a location code. The identifier and human-readable description can easily come from unitid and unititle elements of did elements.

Read More

VuFind 2.0 Conference

VUFind is the technical backbone of the "Catholic Portal", and this posting documents my experiences at the VuFind 2.0 Conference held at the Villanova Conference Center on September 15 & 16, 2010. In short, it provided an opportunity for the community to share successes, challenges, and visions for the future.

Read More

CRRA in San Diego, January 2011

We invite you to attend the CRRA reunion and discussions in San Diego on Thursday afternoon, January 6, 2011. We are scheduling this meeting before the ALA Midwinter Meeting meetings begin on Friday in hopes that many of you who are attending the ALA meetings will be able to join in the CRRA discussions as well.

At this time, we are putting together what promises to be a set of lively and informative discussions. This will be an opportunity to talk about CRRA activities taking place at your library, to discuss progress to date on the 2010/11 goals in the strategic plan, and to explore our readiness to promote the Catholic portal to librarians and scholars. VuFind 1.0 will be very near to being ready for implementation and this will be an opportunity to explore its functionality. Also, we will take a look at how the contents on the portal are growing particularly in regard to adding rare, unique and uncommon archival collections and other materials. The outlines of the proposal to be submitted to the NEH Challenge Grant will be ready for discussion. And, we want to hear from everyone – new and continuing members – how things are going at your library. Very importantly, this is an occasion to network and socialize with your CRRA colleagues.

Read More

VUFind "Midwest" User's Group Meeting

An inaugural VUFind "Midwest" User's Group Meeting was held Friday, September 3, and this posting outlines my perceptions of what happened there.

Read More

Collection Policy Statement for the Catholic Portal

(The following is the current collection policy for the Catholic Portal.)

Collection Policy Statement for the Catholic Portal

Read More

CRRA Update August 2010

CRRA Update AUGUST 2010

We are pleased to announce that St. Joseph’s University’s (Philadelphia) Francis A. Drexel Library, under the leadership of Evelyn Minick, is the newest and twelfth member of the CRRA. St. Joseph’s brings a host of resources to the CRRA, including the Jesuitica Collection.

Read More

Where in the world is the CRRA?

Pat and I are in the process of mapping the locations of CRRA members, below:

View CRRA Members in a larger map

Harvesting, updating, and re-indexing

This posting describes the automated process I am currently using to harvest, update, and re-index the MARC records of the "Catholic Portal".

Step #1 - Make a list

Librarians love lists, and I am no exception. The process begins with a list (databases) of CRRA members who have MARC metadata to share. Each item in the list includes the following fields:

Read More