BLOG

CRRA in San Diego January 6, 2011

From left to right: Eric Morgan (ND), Eric Frierson (St. Ed's), Marta Deyrup (Seton Hall), Clay Stalls (Loyola Marymount), Kris Brancolini (Loyola Marymount), Jennifer Younger (CRRA), Tyrone Cannon (Univ of San Francisco), Janice Welburn (Marquette), Jean Zanoni (Marquette), Pat Lawton (CRRA), Alma Ortega (Univ of San Diego), Theresa Byrd (Univ of San Diego), Susan Ohmer (Notre Dame), Laverna Saunders (Duquesne), Diane Maher (U San Diego), Ed Starkey (U San Diego)

The San Diego meeting provided an opportunity for new and continuing CRRA members and friends to look at the enhanced portal, discuss future directions for the CRRA,  and last but not least,  to get to know one another.

CRRA in San Diego Jan. 6, 2011

We look forward to seeing many of you in San Diego for our upcoming meeting.  Full details follow and are on the web at http://tinyurl.com/crra-jan2011.

Portal development is a focal point for this meeting.  Many milestones have been met and Eric will demonstrate new portal functionality including Web 2.0 features of VuFind, an EAD indexing and display tool, and text mining techniques to facilitate discovery and creation of new knowledge.

Read More

CRRA in San Diego

This is a simple annotated list of links used as an outline for a presentation to the CRRA in San Diego:

  1. CRRA website - The good ol' look & feel but wrapped around new content and functionality. ("Thank you, Eric Frierson!")
  2. Web 2.0 - All the Web 2.0 links (cite this, email this, favorite this) that did not work previously now function correctly.
  3. EAD viewer - It is now possible to view EAD files locally or from the originating institution.
  4. Item-level indexing - The content of EAD files is indexed at the item level making for finer-grained searching.
  5. PDF display - Records linking to digitized versions of books now enable a person to get the full text. Examples include content from the St. Michael's and the University of Notre Dame
  6. Text mining - After extracting the full text from the PDF documents, it is possible to apply concordancing techniques to the full text for analysis.
  7. Automated updating - The "Portal" can be updated automatically by harvesting metadata from member institutions, massaging it for the Portal, and re-indexing it on a regular basis.
  8. Use statistics - Rudimentary Web server log file analysis as well as Google Analytics reports illustrate how the Portal is being used.
  9. Blog - A running commentary on what's happening with Portal development.

Simple log file analysis

Today I did a bit of simple log file analysis against the Portal's Apache log file. Specifically, I wanted to extract the queries people have been using.

Naturally, I wrote a program to do this work -- parse.pl. It is rather brain-dead and certainly not 100 percent accurate, but it goes generate a report of some value.

Read More

ND/CRRA Forum on Digital Humanities

This message outlines an upcoming event tentatively called the Notre Dame/CRRA Forum on Digital Humanities:

    Who: Anybody and everybody across the University
   What: A set of presentations and workshops on
         digital humanities
   When: Thursday afternoon (February 24) and Friday
         morning (February 25)
  Where: (probably) Geddes Hall
    Why: Because it is about more than find and
         access, it is also about use and
         understanding

The Hesburgh Libraries, the Center for Research Computing (CRC), and the Catholic Research Resources Alliance (CRRA) are jointly sponsoring a set of presentations and workshops on the digital humanities Thursday afternoon (February 24) and Friday morning (February 25). While all of the details have yet to be ironed out, we expect there to be at least two presenters on Thursday:

Read More

Blogpost about CRRA - DePaul Univ Law Library

CRRA is getting some press ...  DePaul University Rinn Law Library for their recent blogpost “Catholic Research Resources Alliance Helps Locate Canon Law Titles” http://depaullaw.typepad.com/library/2010/09/catholic-research-resources-alliance-helps-locate-canon-law-titles.html

DePaul is the CRRA's newest member and we welcome and thank you!

Catholic Portal look & feel

Thanks to the good work done by Eric Frierson of St. Edwards University, the "sandbox" of "Catholic Portal" now sports the look & feel of our public view:

screen shot

Read More

CRRA Update November 2010

CRRA Update NOVEMBER 2010

  • Welcome to the University of Dayton and to DePaul University, the Alliance's Newest members!  Watch for more information about Dayton and DePaul in the December Update.
  • Committee News:
    (1) The Collections Committee reaffirmed the collecting focus and provided a suggested rubric for “rare, uncommon.”
    (2) The Digital Access Committee (DAC) is working on a number of issues related to portal development. Please see more committee news below.
  • Mark your calendars! CRRA in San Diego (January, 2011), CRRA/Notre Dame Forum on Digital Humanities (February, 2011), CRRA in Philadelphia (March 2011). Details below.



Read More

Catholic pamphlets and the "Catholic Portal"

This posting outlines a possible workflow for getting digitized versions of Notre Dame's Catholic pamphlets into the "Catholic Portal".

The problem

The University of Notre Dame owns a significant number of Catholic pamphlets. These materials have been cataloged and denoted as destined for the "Portal" in their MARC records with the letters "CRRA" in field 590$u.

Read More

VUFind record drivers and templates

This posting documents how I wrote and edited a couple of VUFind record drivers and Smarty templates for the "Portal" of the Catholic Research Resources Alliance. In writing this posting I hope to support any developer coming behind me as well as inform the wider open source community on how VUFind works.

The Problem

Read More

Text mining Catholic pamphlets

This is the quickest of blog postings outlining how I am initially providing a text mining interface to digitized Catholic pamphlets.

Jean McManus used a scanner to create PDF versions of a few Catholic pamphlets. Along the way, she also had the software to a bit of OCR. She then gave the PDF documents to me with filenames matching MARC 001 fields.

Read More

CRRA Update October 2010

CRRA Update OCTOBER 2010

  • CRRA Welcomes Dominican University and University of San Francisco
  • Spotlight on Portal Development: Progress on EAD; VuFind Announces 2.0 Roadmap
  • CRRA January 6, 2011 in San Diego – Make your plans to join us!
  • Call for Proposals: IMLS National Leadership Grants

New Member Highlights

Read More

Internet Archive content, VUFind (Solr), and text mining

The posting outlines how I have: 1) mirrored metadata and full text content from the Internet Archive, 2) made the mirrored content accessible through VUFind, and 3) implemented a rudimentary text mining interface against the mirror.

Background

The "Catholic Portal" is intended to be a research tool centered around "rare, unique, and uncommon" materials of a Catholic nature. Many of these sorts of things are older as opposed to newer, and therefore, many of these things are out of copyright. Projects such as Google Books and the Open Content Alliance specialize in the mass digitization of out of copyright materials. By extension we can hope some of the things apropos to the Portal have been digitized by one or more of these projects.

Read More

Names & addresses

This posting outlines how the names & addresses of the "Catholic Portal" are made available. The purpose of this posting is mostly documentation. Documentation for myself, since I always forget. And documentation so somebody else can do the work after I win the lottery and move to the beach to drink cocktails with umbrellas in them.

Here goes:

Read More

Digital Access Committee (DAC) Meeting

Today we had a CRRA Digital Access Committee (DAC) meeting via the telephone. Attendees included:

  • Ann Hanlon
  • Demian Katz
  • Eric Frierson
  • Eric Morgan
  • Kevin Cawley
  • Pat Lawton
  • Susan Leister
  • Thomas Leonhardt

I did a bit of "Portal" show & tell demonstrating the work done to date on indexing EAD files. (See the previous blog posting.) We then discussed ways the indexing/display could be improved. Suggestions included:

Read More

Indexing MARC and EAD in VUFind with Solr for the CRRA

This posting outlines how I am currently indexing MARC and EAD files in VUFind with Solr for the CRRA. (Boy, there are a lot of acronyms in that sentence!)

Background

The Catholic Research Resources Alliance (CRRA) is a member-driven organization with the purpose of making available "rare, unique, and uncommon" research materials for Catholic scholarship. Presently the membership is primarily made up of libraries and archives who pool together their metadata records, have them indexed, and provide access to the index. My responsibility is to build and maintain the technical infrastructure supporting this endeavor.

Read More

Very satisfying!

I have made significant progress in the process of harvesting EAD files and preparing them for ingestion into the "Catholic Portal". This posting outlines the successes.

Assuming a Catholic Research Resources Alliance members place their EAD files in a HTTP-accessible directory, and those files have a .xml extension, then the following Perl scripts enable me to harvest and prepare them for indexing:

Read More

EAD @ Marquette 4 CRRA

This is the briefest of travelogues reporting on a meeting about EAD files at Marquette University for the Catholic Research Resources Alliance on September 20, 2010.

marquette sights

Read More

index-ead.pl

Today I indexed some of the metadata I extracted yesterday using a script called index-ead.pl. Of all the scripts I've written so far, this one is the most straight-forward. Read locally-developed XML file. Extract the unique identifier, title, and date. Associate each with VUFind/Solr fields. Commit.

You can (temporarily) see the fruits of these labors because all of the records have been associated with the Eric Lease Morgan Foo Bar Library. The result is a list of container-level records with very little additional information.

Read More

CRRA Update September 2010

CRRA Update
SEPTEMBER 2010

In this update …

Read More