Tuesday, February 24, 2009

Bandwidth, Money, and Other Countries: We are so spoiled

Link: http://news.bbc.co.uk/2/hi/technology/7907529.stm Article: "Lord Carter Defends Digital Plan"

"'The truth is that not a single media company knows what its model will be in ten year's time,' he said."

This article provides a very interesting, though not as in-depth as I'd like, look at the state of Internet availability in the UK, as well as the state of record digitization in the UK.

I actually had to look again at the article's publish date because I was very surprised by a few statements, such as, "Lord Carter also used the meeting to criticise what he described as a "superficial misunderstanding" of how the UK would roll out next-generation networks (offering speeds of up to 100Mbps)"

I have traveled the UK, mostly staying in London, though also venturing to Dublin for a bit, and though I'm not surprised that so many people are left out of the loop access-wise--things are so dang expensive there--I am quite surprised that so few are up to even 100 Mbps, and that there are still a number of "notspots," to quote the article.

Friends in South Africa tell me that they are charged by megabytes per month. The folks I know from there are computer programmers, so they run out of bandwidth by about the middle of the third week of the month.

Friday, February 20, 2009

How to Build a Good Digitization Team

I now have a fully-functional DSpace test server that will be running a small dark archive. We are going to keep this DSpace instance private and contribute to the dark archive until we reach the point at which we know that the the archive is ready for public consumption. (Of course, "complete" also refers to copyright issues and license agreements to the items that have been digitized and placed on this system.)

Dr. Diane Warner--and her graduate student assistant--have been tremendously helpful in preparing an exemplary digital collection for the test server.

These two people don't seem to understand what a luxury it has been for me to work with them over the past three weeks as we have prepared the collection. It is an IT person's dream job to work on a team with colleagues who demonstrate high competence, sharp intellect, good imagination, and strong logic skills. I can honestly say that both Dr. Warner and her grad assistant repeatedly have demonstrated these skills, as well as an ability to surpass my own knowledge concerning the functions and capabilities of DSpace.

My recommendation this week for someone who is considering building a digitization project on open source software: Build your first project with a brilliant, enthusiastic team, and establish a good team dynamic from the beginning. Although I have now been completely spoiled from having worked with such a great team--I regret that I most likely won't have the opportunity to work with them again--I have also learned so much more than I would have had I started with a team that I had to lead by the nose from the beginning.

Based upon what I have learned over the past few weeks I'd recommend composing a digitization team of:

1) Strong, smart faculty member(s): this person proposes the digitization project and is your subject matter expert. (I was fortunate in that Dr. Warner brings to the table an organized thought-process and expertise with cataloging.)

2) Research assistant who can demonstrate think-out-of-the-box ingenuity, who can take initiative without prompting.

3) Flexible server manager who can adapt to changing project demands: I'm striving for this, and I am striving to listen better to the needs of my users without diminishing efficiency.

4) A desktop support person who runs interference when things have to get done on the server and only the server admin can do them.


If you can get your first digital project complete with a good team, you'll have more time to focus on the software, digitized content, and metadata than on team communication problems. I wish I could show the project here, just to be able to brag, "Look what we did!" I told the folks I worked with that getting digitization running here is going to be like fighting an uphill battle, but these folks made it all easier by building the steps into the hill.


My next post might be about the role of copyright in digital collections. Over the past few weeks, I've been questioning the scope of my position here at the Southwest Collection. Although I don't think I should be the advocate or approver of all things copyright, I do believe that, as digital initiatives coordinator, I should be concerned with communicating to all of my colleagues the importance of good faith effort in locating permission for digitization. (Dr. Warner and I had a long discussion about this, too.)

Sunday, February 15, 2009

Do We Need a New Internet? (Article)

Article: "Do We Need a New Internet?"

"The idea is to build a new Internet with improved security and the capabilities to support a new generation of not-yet-invented Internet applications, as well as to do some things the current Internet does poorly — such as supporting mobile users."

http://www.nytimes.com/2009/02/15/weekinreview/15markoff.html?pagewanted=2&_r=1

Stanford engineers are looking at building a more secure Internet. One security specialist quoted in the article discusses how the current Internet is Pearl Harbor waiting for the planes to land.

I wonder what the future of open source software would be on an Internet built by Stanford; of course my previous statement is mildly cynical. (Look, this is a job for a technical communicator!)

To quote the article again with something that most interests me as a dissertation topic:

"Proving identity is likely to remain remarkably difficult in a world where it is trivial to take over someone’s computer from half a world away and operate it as your own. As long as that remains true, building a completely trustable system will remain virtually impossible."

What will force us into Web 3.0 may actually be Internet security; Dell already ships laptops with the thumbprint scanner--and anyone who watches "Burn Notice" knows that THAT isn't very hard to get around. But if we're going to think of an absolutely secure Internet with some sort of driver's license verification, there's going to have to be a retinal scan database somewhere, perhaps a sub-dermal DNA database, and an agreement that Someone (capital "S") monitors that database to protect security interests. Of course there is a split problem with a centralized database: 1) Who monitors it, and 2) How much is it going to cost?

And really, now we're getting into my favorite consideration: can the Capitalist game that recognizes money as the points system viably survive when an individual is required to give physical evidence to prove that it is he who plans to buy that caparisoned wooden elephant figurine from, say, Thailand? And, in return, does that individual really need the Someone who Monitors the DNA Database to know he also has Addison's disease, via the DNA he supplied as his Internet driver's license?

I know this sounds sci-fi, but really, folks, don't be naive. Ten years ago we could argue that this dramatic shift would not happen, that we have nothing to worry about with unstable electronic identity narrative, but that was during Web 1.0--you couldn't make a web page where people left footprints. Now, electronic footprints are as easy to follow as if someone were collecting hair falling out of your head as you walked down a street.

Wednesday, February 11, 2009

Article

http://www.infoworld.com/article/09/02/09/06FE-shrinking-operating-system_2.html

"The Incredible Shrinking Operating System."

This is so true, and so pertinent to what we've discussed in the Online Publishing course I'm taking here at Texas Tech University.

Monday, February 9, 2009

I didn't have to eat my words--what a pleasant surprise!

I think last week I said something about time constriction and how an open source server running DSpace can be reinstalled in three hours' time, given no interruption. Last week, actually the very day that I posted that, I did something to crash the test server running DSpace. Although it had been something I'd planned to do, I was hoping to let the server run a little longer before the crash.

I determined to test my theory about three hours, and, as it turns out, when I block everyone from speaking to me, and when I let my desktop support person, Jason Price (a conscientious, invaluable colleague), run interference from distractions, I was indeed able to uphold my three hour promise.

I decided that instead of trying to mess around with all the crazy Postgresql permissions (I chowned where I shouldn't have), I wanted to upgrade to the newest version of DSpace and simply would replace the extant data tables. I also had a few items on the prior test server (services, software, etc.) that I really didn't need, and instead of just using a fine-toothed comb to take care of the things I didn't like, I realized it was a good opportunity simply to format the server, reinstall the newest release of DSpace (something I was sort of waiting on, but when I saw what crashed the system, I realized that I couldn't keep using that older DSpace release), and get everything running well.

What have I learned, then, since beginning this DSpace project? I have learned a most valuable lesson when working with open source software: Open source documentation is very poor until you understand how the writers are speaking; in other words, open source documentation gives you the very bare minimum without any contingencies or very good FAQ support for when errors crop up. Once I'd installed DSpace the first time using Carlos Ovalle's instructions (http://sentra.ischool.utexas.edu/~i312co/dspace/), I understood how all the pieces of the puzzle fit together, enough to where I could then navigate the DSpace instructions without confusion. I also realized what I have to change for running DSpace on Gentoo. I'm a little nervous about running on the Dell-supported RedHat server, switching OSes, but really, one open source OS isn't all that different from another, not in the nuts and bolts capacity. (Insert smirk from the open source audience here).

Since this is only Monday, I plan to test Archivists' Toolkit on this Dspace installation. If you're wondering what I'm referring to, please visit http://www.archiviststoolkit.org/ to find out!


That's all for this week!

Wednesday, February 4, 2009

Digitization: Open Source vs. M$, and Time for Servers

Many people assume that digitization of an archive or records system means scanning things into a computer. If digitization were as simple as someone using the flatbed Epson to scan in the photographic archives of the Texas Tech football team playing their way toward the 1939 Cotton Bowl, then most digitization specialists would be grossly overpaid.

Digitization processes, at the minimum, require three components to be successful. For this example, I will continue with my photograph metaphor. First the process of scanning must follow a set of accepted standards to provide for consistency of digital development over the course of years and technologies. In other words, these standards provide for digital preservation with the primary intent of accessibility.

Second, a metadata standard must be agreed upon by all project managers. DSpace, the program that I am testing at the Southwest Collection, uses Dublin Core metadata standards by default. Although Dublin Core can be swapped out with another standard system, it is important to note that moving away from the default of Dublin Core provides for a fair bit of extra work on the server backend. A metadata librarian chooses to represent a digitized item within the metadata system, and this representation, whether in the form of Dublin Core or something else, offers a lot of control, on behalf of the metadata librarian, over how the item will be searched, what digital collections it will belong to, etc.

The third component, and perhaps the most interesting component in a digital project, is good editing judgment. A digitization project manager, often a faculty member in the archive here at the Southwest Collection, must consider how best to represent a photograph—do well-taught students using markup determine how the photograph is searched; does the faculty member guide student assistants in the keywords they are allowed to use to describe an item; if the back of a photograph has any sort of information on it, should that be scanned in? These types of judgments must adhere both to standards and to editorial representation. If a university archive is providing a digitized version for research access, the archive must consider what best preserves the state of the information as represented in the physical photograph to the greatest extent. If a museum is presenting the same digitized version of the photograph, the museum will have to use different metadata standards to represent the photo as part of a wider collection, for instance in an electronic exhibit format.

Where do DSpace and open source operating systems fit into these questions? Let me start with open source operating systems: M$ provides an easy-to-use, fast, quick-and-dirty server program that quickly integrate information onto the Internet, onto network shares, into databases, and so forth. The problem, however, is that much of the M$ software is proprietary, expensive, and bulky. The M$ operation systems are prepackaged with hundreds of unnecessary services that, really, could be disabled, as long as the administrator knows which ones are unnecessary for performance. Additionally open source software is very difficult to run on M$ because of the various compatibility issues, and it is usually much easier to run off of open source servers.

Open source operating systems provide a free and quick means of manipulating and serving data, as long as the administrator knows what s/he is doing with the software. A lot of people run screaming when they hear “Unix,” or “Linux,” and most don’t even know what “FreeBSD” means; it is important for admins interested in Unix or Linux to realize that the act of running the open source operating system will give them an enormous amount of control over what actually runs on that server.

An open source server administrator who knows what s/he is doing can format, rebuild, and reinstall an operating system in approximately three hours, which includes reinstalling all necessary software packages-- given the opportunity for unwavering attention to the matter at hand. When an open source content management system like Plone crashes on a server administrator, all that is really necessary is a single service restart: /zope/bin/runzope restart, for instance, without having to rely on five other dependencies to perform this command. Another example, then, is how DSpace runs. If a service interruption occurs with Dspace, for instance on my Gentoo test server, all I have to do is /etc/init.d/tomcat-6 restart, /etc/init.d/postgresql restart, and that is that. The most important thing on the server end of running a digitization server, in fact, is monitoring where your database is being stored—keep DSpace on a different disk than your database. (A consideration of spindle placement is important, too, in building a larger digitization server. But more later about that.)

As I am testing with DSpace, I need to learn more about PostgreSQL, which brings up my next point. A serious danger in IT is allowing oneself a very wide breadth of knowledge without a much needed depth. The job in itself necessitates a learning adaptability for new software, new hardware, and even new user learning styles; unfortunately, the greatest limitation to any server administrator is time. Between constant interruption and poor funding, often new software and hardware that should be put into production as soon as it is purchased or built sits in a corner, rotting into obsolescence, simply because the administrator has had no time to build, test, secure the new product, nor time to train users on it. This, at least, has been my observation of other friends in IT in the academic setting, and it has certainly been my own experience. Unless one focuses without interruption, there is a tradeoff of depth of knowledge for flexibility of learning.

Tests servers offer admins the opportunity to understand the nuts and bolts of what goes on. Also, in testing with Linux or Unix (like FreeBSD), I highly recommend that you avoid using the GUI entirely. If you are going to learn the operating system, learn the command line first. Otherwise, you’re really stinting yourself on a good opportunity to know exactly how things function.