Monday, October 12, 2009

What course will digitization take?

I received this over the Digital Medievalist listserv:

----------------------------------------------------------
WORKSHOP: Host your texts on Google in one day

The Center For Hellenic Studies will conduct a one-day workshop at the Center’s Washington, D.C., campus, on Monday, Jan. 11, 2010, with the subject: “Host your texts on Google in one day”. Bring one or more XML texts to the workshop in the morning, and leave in the afternoon with a running Google installation of Canonical Text Services serving your texts to the internet (http://chs75.chs.harvard.edu/projects/diginc/techpub/cts).

For more information, including how to apply, please see http://chs75.harvard.edu/CTSWorkshop.html.

Feel free to forward this announcement to anyone who might be interested.

Posted by: Roberto Rosselli Del Turco (rosselli at ling dot unipi dot it)

URL: http://digitalmedievalist.wordpress.com/2009/10/12/workshop-host-your-texts-on-google-in-one-day/



------------------------------------------

I'm afraid, good or bad, that this is the path that digitization is going to take over the next ten years. Here's the problem:
1) Archives and Libraries have something that Google needs: artifacts, whether those are books, photographs, etc.
2) Google has what Archives and Libraries don't have: Money. Lots and lots of money.

Google wants the materials to digitize them; Libraries and Archives need the technological foundation (money, hardware, human labor) to digitize their materials. It's a very simple solution for Google to dangle a pile of money and resources in the faces of Archives or Libraries to say, "Look, we've got what you can't possibly get in these hard times."

And researchers simply want to be able to access materials. In reality they could care less who has digitized them, who has made them available online. All that matters to researchers is that the materials exist in the digital environment, and that they are easy to access.

While I'm charge of digitization in an archive, I have to admit, Google has a lot going for it in this capitalist economy where archives and libraries don't generate a lot of revenue.

Wednesday, October 7, 2009

Wanted: Someone who knows how to build, program, and maintain a server; all other things unspecified

I recently had a conversation with a friend who was discussing a vacancy in her office. She made reference to the fact that everyone and his/her dog has applied to this vacancy, and even highlighted one person who had applied with no relevant degrees, experience, or--most important--any pertinent skills, to do the job. (He'd listed on his application something along the lines of the work he does in his own field is so good that he'd be great in this vacant position, too.)

Later on that same day, my dad and I were talking about what a hard time my unemployed lawyer-sibling is having finding a job out in Dallas. My dad jokingly said to me, "I worry about the day you say you'll have to move back in with us because you've been laid off and couldn't find another job." My comforting response to him was that I'm in a field that is similar to that of the undertaker's field--there's always going to need to be someone available to plug that computer in, at minimum.

It got me thinking about my own office and my own job. I run a group of servers for a living; in addition I design digital projects for presentation on the web. What I find hilarious about these exchanges is, most people only half understand what the IT person does. Even more so than the desktop support IT person, the server IT person receives a lot of mixed emotions--some people think that the server manager is being lazy because s/he is not the one to drop off a new keyboard; they often don't realize that the server person is sitting in his/her office with the door closed because s/he's trying to finish bringing something online from the server, or trying to change something that is currently online.

I think it helps my position that I'm also working on a degree in Technical Communication and Rhetoric; I'm a fairly good communicator anyway, and I regularly try to represent what I'm working on to different skill levels. I am also aware that one of the risks of maintaining a company's servers is always going to be criticism for "being lazy," ie. criticism for not regularly talking to people about what I'm doing.

But I don't think anyone can easily look at what I do in my job and say, "I could easily do that because the work I do in my profession is so good." Because, though it makes me a little sad to say this, not very many people even remotely understand what a person who maintains servers actually does.

Monday, October 5, 2009

DSpace, Tomcat5 and Postgresql8

Okay, DSpace is up and running publicly for the Southwest Collection/Special Collections Library.

Things that I have learned in this process:

1) DSpace uses a Storage Resource Broker, hosted by SDSC, called the DICE SRB. I was unable to get it working with our network's security settings, but I think this is a most elegant solution available for heterogeneous file storage. I would recommend it to users who have different domain settings from what we are running.

2) If you are setting up DSpace for the first time, a few things that you absolutely MUST remember:
A) The Tomcat owner (whether it's Tomcat5 or Tomcat-6 doesn't matter) MUST also be set as the DSpace home directory owner. I've read this in one or two forums or blogs, but I'm restating it here for the hapless DSpace installer who is unaware of this fact. I had the most help with the Gentoo installation instructions by the nice man from University of Texas, but I have found that a lot of what he discusses as being undocumented in the DSpace documentation is not only accurate for Gentoo, but also dead on for other OSes, including Red Hat Enterprise. (If you don't have the Tomcat owner set to own DSpace, you will not be able to upload to the assetstore on DSpace.)

B) These instructions were useful: http://wiki.dspace.org/index.php/SymlinkDSpace But follow them precisely, without forgetting any slashes or characters, or your installation won't work at all. (And it will be very frustrating and irritating to you...and you'll nearly start to pull your hair out...until you realize how stupid you were to forget an ending slash on one of the AJPs.) Or maybe that's just me.

C) Storage: I would recommend at least 1TB of storage for DSpace users, even with the most minimal projects; while this might seem like a lot, in this day and age of cheapening storage, 1TB is a teaspoon in an ocean. No matter how small your digitization project might be, you're going to wind up utilizing this space over time.

3) Other questions now being asked at my institution include:
A) Should we watermark? In my opinion we can provide a low res version of an image that we're worried about, to prevent download and printing. I think watermarking is an old technology that is easily circumvented by programs like Photoshop. There really isn't a point to watermarking, but if you are worried about an image being downloaded or printed, then you could set the image resolution low enough so that it looks fine online, can be zoomed in on, but looks terrible when printed or resized.

B) Who should do metadata? At the Southwest Collection, we're discussing the possibility of clearing all metadata with the cataloging department prior to allowing items to be made publicly available. This will provide controlled vocabulary and institutional prioritization of access points and terms.

C) What should be OCRed? We have a lot of older materials (pre-1800) that use special characters and fonts. OCR will be very difficult, so the question arises of how to make these collections accessible to people with low vision, who either use screen readers or have to zoom very closely. Do we make transcripts of these items--and if so, how long will that take? Or do we let the items appear OCRed as much as possible, and then have student assistants clean up and OCR problems?

That's all for now.