Monday, October 5, 2009

DSpace, Tomcat5 and Postgresql8

Okay, DSpace is up and running publicly for the Southwest Collection/Special Collections Library.

Things that I have learned in this process:

1) DSpace uses a Storage Resource Broker, hosted by SDSC, called the DICE SRB. I was unable to get it working with our network's security settings, but I think this is a most elegant solution available for heterogeneous file storage. I would recommend it to users who have different domain settings from what we are running.

2) If you are setting up DSpace for the first time, a few things that you absolutely MUST remember:
A) The Tomcat owner (whether it's Tomcat5 or Tomcat-6 doesn't matter) MUST also be set as the DSpace home directory owner. I've read this in one or two forums or blogs, but I'm restating it here for the hapless DSpace installer who is unaware of this fact. I had the most help with the Gentoo installation instructions by the nice man from University of Texas, but I have found that a lot of what he discusses as being undocumented in the DSpace documentation is not only accurate for Gentoo, but also dead on for other OSes, including Red Hat Enterprise. (If you don't have the Tomcat owner set to own DSpace, you will not be able to upload to the assetstore on DSpace.)

B) These instructions were useful: http://wiki.dspace.org/index.php/SymlinkDSpace But follow them precisely, without forgetting any slashes or characters, or your installation won't work at all. (And it will be very frustrating and irritating to you...and you'll nearly start to pull your hair out...until you realize how stupid you were to forget an ending slash on one of the AJPs.) Or maybe that's just me.

C) Storage: I would recommend at least 1TB of storage for DSpace users, even with the most minimal projects; while this might seem like a lot, in this day and age of cheapening storage, 1TB is a teaspoon in an ocean. No matter how small your digitization project might be, you're going to wind up utilizing this space over time.

3) Other questions now being asked at my institution include:
A) Should we watermark? In my opinion we can provide a low res version of an image that we're worried about, to prevent download and printing. I think watermarking is an old technology that is easily circumvented by programs like Photoshop. There really isn't a point to watermarking, but if you are worried about an image being downloaded or printed, then you could set the image resolution low enough so that it looks fine online, can be zoomed in on, but looks terrible when printed or resized.

B) Who should do metadata? At the Southwest Collection, we're discussing the possibility of clearing all metadata with the cataloging department prior to allowing items to be made publicly available. This will provide controlled vocabulary and institutional prioritization of access points and terms.

C) What should be OCRed? We have a lot of older materials (pre-1800) that use special characters and fonts. OCR will be very difficult, so the question arises of how to make these collections accessible to people with low vision, who either use screen readers or have to zoom very closely. Do we make transcripts of these items--and if so, how long will that take? Or do we let the items appear OCRed as much as possible, and then have student assistants clean up and OCR problems?

That's all for now.

No comments:

Post a Comment