Open Source Software and Rhetorical Questions about the Internet's Future: 2009

Sunday, December 6, 2009

It's hard not to love governmental control prior to attacks

Link from Slashdot; you can read the Slashdot. This reminds me a little of the Georgia-Russian War, in which Russia cut off network access in Georgia just prior to the invasion.

It's funny; I base my career of Internet access, and I always wonder what would happen if I wind up no longer having access to the Internet. Since I'm also a good editor and subscriber to pragmatism, I'd no doubt earn money somehow, but then I wonder what people do who have no trainable or mercenary skills that have no relevance to Internet work.

Monday, October 12, 2009

What course will digitization take?

I received this over the Digital Medievalist listserv:

----------------------------------------------------------
WORKSHOP: Host your texts on Google in one day

The Center For Hellenic Studies will conduct a one-day workshop at the Center’s Washington, D.C., campus, on Monday, Jan. 11, 2010, with the subject: “Host your texts on Google in one day”. Bring one or more XML texts to the workshop in the morning, and leave in the afternoon with a running Google installation of Canonical Text Services serving your texts to the internet (http://chs75.chs.harvard.edu/projects/diginc/techpub/cts).

For more information, including how to apply, please see http://chs75.harvard.edu/CTSWorkshop.html.

Feel free to forward this announcement to anyone who might be interested.

Posted by: Roberto Rosselli Del Turco (rosselli at ling dot unipi dot it)

URL: http://digitalmedievalist.wordpress.com/2009/10/12/workshop-host-your-texts-on-google-in-one-day/

------------------------------------------

I'm afraid, good or bad, that this is the path that digitization is going to take over the next ten years. Here's the problem:
1) Archives and Libraries have something that Google needs: artifacts, whether those are books, photographs, etc.
2) Google has what Archives and Libraries don't have: Money. Lots and lots of money.

Google wants the materials to digitize them; Libraries and Archives need the technological foundation (money, hardware, human labor) to digitize their materials. It's a very simple solution for Google to dangle a pile of money and resources in the faces of Archives or Libraries to say, "Look, we've got what you can't possibly get in these hard times."

And researchers simply want to be able to access materials. In reality they could care less who has digitized them, who has made them available online. All that matters to researchers is that the materials exist in the digital environment, and that they are easy to access.

While I'm charge of digitization in an archive, I have to admit, Google has a lot going for it in this capitalist economy where archives and libraries don't generate a lot of revenue.

Wednesday, October 7, 2009

Wanted: Someone who knows how to build, program, and maintain a server; all other things unspecified

I recently had a conversation with a friend who was discussing a vacancy in her office. She made reference to the fact that everyone and his/her dog has applied to this vacancy, and even highlighted one person who had applied with no relevant degrees, experience, or--most important--any pertinent skills, to do the job. (He'd listed on his application something along the lines of the work he does in his own field is so good that he'd be great in this vacant position, too.)

Later on that same day, my dad and I were talking about what a hard time my unemployed lawyer-sibling is having finding a job out in Dallas. My dad jokingly said to me, "I worry about the day you say you'll have to move back in with us because you've been laid off and couldn't find another job." My comforting response to him was that I'm in a field that is similar to that of the undertaker's field--there's always going to need to be someone available to plug that computer in, at minimum.

It got me thinking about my own office and my own job. I run a group of servers for a living; in addition I design digital projects for presentation on the web. What I find hilarious about these exchanges is, most people only half understand what the IT person does. Even more so than the desktop support IT person, the server IT person receives a lot of mixed emotions--some people think that the server manager is being lazy because s/he is not the one to drop off a new keyboard; they often don't realize that the server person is sitting in his/her office with the door closed because s/he's trying to finish bringing something online from the server, or trying to change something that is currently online.

I think it helps my position that I'm also working on a degree in Technical Communication and Rhetoric; I'm a fairly good communicator anyway, and I regularly try to represent what I'm working on to different skill levels. I am also aware that one of the risks of maintaining a company's servers is always going to be criticism for "being lazy," ie. criticism for not regularly talking to people about what I'm doing.

But I don't think anyone can easily look at what I do in my job and say, "I could easily do that because the work I do in my profession is so good." Because, though it makes me a little sad to say this, not very many people even remotely understand what a person who maintains servers actually does.

Monday, October 5, 2009

DSpace, Tomcat5 and Postgresql8

Okay, DSpace is up and running publicly for the Southwest Collection/Special Collections Library.

Things that I have learned in this process:

1) DSpace uses a Storage Resource Broker, hosted by SDSC, called the DICE SRB. I was unable to get it working with our network's security settings, but I think this is a most elegant solution available for heterogeneous file storage. I would recommend it to users who have different domain settings from what we are running.

2) If you are setting up DSpace for the first time, a few things that you absolutely MUST remember:
A) The Tomcat owner (whether it's Tomcat5 or Tomcat-6 doesn't matter) MUST also be set as the DSpace home directory owner. I've read this in one or two forums or blogs, but I'm restating it here for the hapless DSpace installer who is unaware of this fact. I had the most help with the Gentoo installation instructions by the nice man from University of Texas, but I have found that a lot of what he discusses as being undocumented in the DSpace documentation is not only accurate for Gentoo, but also dead on for other OSes, including Red Hat Enterprise. (If you don't have the Tomcat owner set to own DSpace, you will not be able to upload to the assetstore on DSpace.)

B) These instructions were useful: http://wiki.dspace.org/index.php/SymlinkDSpace But follow them precisely, without forgetting any slashes or characters, or your installation won't work at all. (And it will be very frustrating and irritating to you...and you'll nearly start to pull your hair out...until you realize how stupid you were to forget an ending slash on one of the AJPs.) Or maybe that's just me.

C) Storage: I would recommend at least 1TB of storage for DSpace users, even with the most minimal projects; while this might seem like a lot, in this day and age of cheapening storage, 1TB is a teaspoon in an ocean. No matter how small your digitization project might be, you're going to wind up utilizing this space over time.

3) Other questions now being asked at my institution include:
A) Should we watermark? In my opinion we can provide a low res version of an image that we're worried about, to prevent download and printing. I think watermarking is an old technology that is easily circumvented by programs like Photoshop. There really isn't a point to watermarking, but if you are worried about an image being downloaded or printed, then you could set the image resolution low enough so that it looks fine online, can be zoomed in on, but looks terrible when printed or resized.

B) Who should do metadata? At the Southwest Collection, we're discussing the possibility of clearing all metadata with the cataloging department prior to allowing items to be made publicly available. This will provide controlled vocabulary and institutional prioritization of access points and terms.

C) What should be OCRed? We have a lot of older materials (pre-1800) that use special characters and fonts. OCR will be very difficult, so the question arises of how to make these collections accessible to people with low vision, who either use screen readers or have to zoom very closely. Do we make transcripts of these items--and if so, how long will that take? Or do we let the items appear OCRed as much as possible, and then have student assistants clean up and OCR problems?

That's all for now.

Friday, September 18, 2009

DSpace and Storage Resource Broker

My real-life DSpace server is live now, at http://collections.swco.ttu.edu . There is some back-end tweaking I need to do, but we have enough up now that we can make it available to the public.

My next task is to set up SRB (https://libnet.ucsd.edu/nara/) to provide me with the ability to work with heterogeneous storage devices. This way, I will, hopefully, be able to use a fully-fleshed MD3000 with my new server, and then later, will be able to add more storage drives onto the server. I'm pretty impressed with what I've read about SRB so far, so I really hope it will do what it advertises, particularly on the university system with its wide variety of restrictions.

Tuesday, September 8, 2009

It's been a long while

It's been a long while. I took a hiatus for the summer, focusing strictly on work, contemplating whether I want to continue this blog outside of the assignment for class, which was the original reason I started it.

I've decided I do want to keep it, though I don't know how useful my blog will necessarily be to IT people or to researchers of technical communication or rhetoric.

Then again, I think that a blog's subject matter relies much more on how much the author likes the sound of his/her own fingers typing thoughts than on how many readers s/he receives.

Things going on right now:
My digitization server is up and running DSpace. I'm planning to migrate all current digitized collections from the test server over to the production server this week. It's all running Postgresql, which seems to be sufficient for the requirements of this digitization process. DSpace itself installed fine, no problems, based on the documentation I'd created from the test server. I'm going to provide my documentation to the folks at the Law School here on campus because they are planning to run DSpace, as well.

I'm going to run Archivists' Toolkit on the production server soon. A select group of users and I have been testing it on my end and am happy with the results, so it's time to put it up and let my users work with it. Archivists' Toolkit, like a lot of open source software, is tremendously forward-thinking as far as how materials in archives should be managed, and I am very impressed by the tracking options available from this program. It uses MySql, so it's a fairly stable program, and users can connect to it on their computers relatively simply. Using AT, we can track every item that is accessioned, processed, and digitized, and track every step as it happens.

My next test project is to install Word Press on the test server. I have some faculty who are interested in blogging capabilities, and a few other users who would like to explore the possibility of creating a journal of place. I'm going to give them access to work with Word Press, and if it is sufficient for their needs, I'll install it, too, on the production server. (Like AT, WordPress uses MySql, so it won't require another database program install.)

Thursday, April 30, 2009

If everything was made by Microsoft

Link: http://www.cracked.com/article_17323_if-everything-was-made-by-microsoft.html

Title: "If Everything Was Made by Microsoft"

Heh. I love this. Had to share it. My favorite is the kitchen safety graphic.

I like posting something about Microsoft on here. I'm not normally one of those anti-M$ Mac users, though sometimes I do show my Mac colors, times like this. I still think Google is the company M$ needs to watch out for, but I do heart Google, despite its potential for evil.

I've resigned myself to the likelihood that I'd also gleefully relish the order imposed by Emperor Palpatine.

Second Link: http://www.thewrap.com/ind-column/2679

Apparently, Google is striving to be on the cutting (bleeding?) edge of Web 3.0. To quote the article, "Under this latest iteration of advanced search, users will be automatically served the kind of news that interests them just by calling up Google’s page. The latest algorithms apply ever more sophisticated filtering – based on search words, user choices, purchases, a whole host of cues – to determine what the reader is looking for without knowing they’re looking for it." Hmm...so Google is using the AP's challenge as an excuse for attempting Web 3.0. Because, clearly, if I do a search on news concerning Oracle, that means I'm always going to want that and only that...And of course, what bothers ME the most is: Two people cannot perform the same search and get the same results.

This will be a research hell.

Tuesday, April 28, 2009

Are good phone manners important?

As I pretended to watch television about a month ago, I overheard my brother and niece get into a fight about phone manners. She, at eleven years old, is certain that she'll never have to use "please" and "thank you" when speaking on the telephone. She made her assertion by screaming at him at the top of her lungs, screaming in a manner that is often accompanied by an eye roll.

Anyway, the more I thought about, the more I considered how we're sending a generation of kids out into the world with no knowledge of how to even conduct the simplest functions necessary for a telephone call: asking to speak to someone, hearing a polite response back, leaving a good message, or even answering the telephone itself.

I decided, for a class project on Captivate, to make a phone manners tutorial. This is fairly basic, intended for kids eleven to twelve years old. When I spoke with parents of my niece's friends, a lot of them expressed interest in using the phone manners tutorial on their own kids, so I got the impression that there is a need for this type of thing.

Just click on the linked title here. It's a Flash video, so your browser might need to download a plugin. Otherwise, it should work fine.

Software Review: Google Chrome

Have you been wondering about this crazy new web browser all the kids are talking about? Are you feeling 'net listlessness because you're bored with the old web browsers? The linked software review might help you decided if Google Chrome will answer these questions for you.

I decided to write a software review on Google Chrome because a lot of people have asked me why Google would come out with a web browser when there is a plethora of useful browsers out there. (Heck, I wondered the same thing.) After using Chrome for a week straight (without letting myself touch Firefox, which was hell to me!), I came up with a few answers, more questions, and a fair bit of information.

Enjoy!

Tuesday, April 21, 2009

Thursday is Talk Like Shakespeare Day

Linketh thee here: http://www.cnn.com/2009/US/04/21/talk.like.shakespeare/index.html
Title: "Unleash thy inner bard on 'Talk Like Shakespeare Day'"

From Cnn: Mayor Daley of Chicago hath proclaimed Thursday, April 23rd, "Talk Like Shakespeare Day," in celebration of the great bard's 445th whelping day.

Methinks this day will be secondarily proclaimed the Day of Meritorious Locution.

Monday, April 20, 2009

Oracle and Sun Microsystems: A Reality that will Benefit DSpace and Archivists' Toolkit

Link: http://www.nytimes.com/2009/04/21/technology/companies/21sun.html?partner=rss&emc=rss
Article title: "Oracle Agrees to Acquire Sun Microsystems"

Okay, it's now a reality that one of the best database companies in the industry is acquiring a company that yes, sells rocking servers, and yes, offers its Java developers kits to adoring open-source masses; but, much more important to archivists and IT manager who run open source digital access products, the company that makes one of the supported backend database systems (Oracle) for DSpace is now acquiring the company that makes the backend database system (MySQL) for Archivists' Toolkit. WOO-HOO.

This should be huge news to the open source community that uses, programs, and installs archiving software. If you're involved in any sort of digitization project for archives, you can't walk down a sidewalk without tripping and face-planting onto DSpace in the first elevated crack in the concrete and then, two-feet later, skinning your knees as you fall over Archivists' Toolkit.

The biggest roadblock to DSpace's success has historically been that no one could figure out an easy way to move metadata from record-creation software into DSpace. Archivists' Toolkit, an record-creation software that builds information about acquired items from initial accession into a collection through to who has edited and accessed the digital item throughout its digital life, can build metadata records for archival items. It runs on MySQL, while DSpace is on Oracle or Postgresql. If someone could find an _easy_ way (because there are ways, but none is simple) to move information between the two programs, digitization would be a thousand times easier for everyone.

I suppose this is down the road, but this is great news for archives interested in digitization.

Tuesday, April 14, 2009

How should law enforcement be trained in technology?

Article Title: "Computer Science Student Targeted for Criminal Investigation for Allegedly Sending Email"
Link is: http://www.eff.org/press/archives/2009/04/13

To quote from this article: "Boston - A Boston College computer science student has asked a Massachusetts court to quash an invalid search warrant for his dorm room that resulted in campus police illegally seizing several computers, an iPod, a cell phone, and other technology."

This brings up an important question and hammers home a point I'm repeatedly encountering: How should law enforcement officials be trained to deal with technology? In a past job, I had to call in a specially-trained police officer to look at the computer of a user who had been charged with a very serious crime; this officer was clearly a geek who had been hired onto the police force to do this specific job specialization--computer crime investigation. I live in a medium-sized city, and I have to wonder if there are more officers than that one available now, or if there were more back then.

But either way: if law enforcement is called to investigate a technological crime, the real people you want available to determine if a crime is committed are white hat hackers or well-learned open source programmers. These are the people who can track the footprints of cyber-criminals. And it's pathetic, in my opinion, that campus, city, and state law enforcement doesn't get better training. I realize that this one incident doesn't speak for the technological representation in all police departments, but I'm guessing that, with a legal system unable to create laws that really touch Internet transactions and transgressions, there isn't a really active police force, either, that can recognize these.

Thursday, April 9, 2009

UK Looks to Mobile Broadband toward Achieving Goal

This is a follow-up of an article I'd posted in February or March about the 2Mbps initiative in Great Britain. Interestingly (not surprisingly), they're looking toward mobile broadband to help bridge the gap:

"Similarly the 2Mbps pledge wouldn't strictly have to come from land-line services, with wireless (Wi-F) and Mobile Broadband solutions being touted too."

Wednesday, April 8, 2009

"Google warns newspapers not to anger readers"

http://news.bbc.co.uk/2/hi/technology/7988561.stm

We are in the foothills of a burgeoning mountain of a conflict. This is quite fun. I suspect it's time for us to see some laws change toward intellectual property, copyright, advertising--just general information ownership. The legal system has been unwilling to touch the Internet with a ten-foot-pole, but the capitalist system can no longer survive under this kind of laissez faire attitude. I'm wondering how much the new presidential administration in the U.S. has affected this.

Hm.

I must admit: I'm having a _lot_ of fun with all of this fighting between giants--who wouldn't? It's Rupert Murdoch; it's Google! Two of my favorite multi-billion dollar entities...(well, maybe not in this economy). I commented about that to a friend, who replied, "That's not my idea of fun." It's a fun time to be a geek.

Sigh--pearls to swine, folks.

Tuesday, April 7, 2009

Is Web 2.0 Fraying at the Edges?

Man, between the article I posted yesterday and this one today, I'm really excited to see what legal precedents get set this year:

To quote from the article, ironically, "Neither Mr. Singleton nor a statement released by The A.P. mentioned any adversary by name. But many news executives, including some at The A.P., have voiced concern that their work has become a source of revenue for Google and other sites that can sell search terms or ads on pages that turn up articles."

The article, titled, "Associated Press Seeks More Control of Content on Web," discusses how smaller websites are profiting from using large portions of AP articles (or entire articles), while advertising money drawn from views of those articles is going to sources other than the AP. Now, I don't think they have a prayer in court of saying that people aren't allowed to quote online from their articles, but maybe they do. Sites that use articles wholesale and reap profit from views, however, will probably have to pay.

And the irony is perpetually in my head that I'm posting these things on a Google-run website.

I _love_ Google. Don't get me wrong. I totally subscribe to the Death Star, daycare and all, but Google is the John Rockefeller, Andrew Carnegie, and J.P. Morgan of this age, and laws need to be adjusted to deal with this behemoth, just as they were changed to deal with the industrialists and bankers of yesteryear.

Monday, April 6, 2009

Uh-oh, Google is being an evil empire again. Villagers, get your pitchforks to beat them back!

Article title, "Google’s Plan for Out-of-Print Books Is Challenged," http://www.nytimes.com/2009/04/04/technology/internet/04books.html?_r=1 , to quote:

"The settlement, 'takes the vast bulk of books that are in research libraries and makes them into a single database that is the property of Google,' said Robert Darnton, head of the Harvard University library system. 'Google will be a monopoly.'"

As far as digitization is concerned, it’s amazing how many things one must consider when managing digitization projects in an archive. For instance, who owns this? Google, now, is trying to gain possession to digital rights of as many orphaned items it can. As the digital initiatives coordinator in an archive, I’m a little worried about what impact this will have on institutions that technically have donor agreements for their items (prior to digitization) but that never _actually_ looked into the ownership of rights issue: If we are sloppy about our rights ownership, and if Google looks into it and catches this sloppiness, I must question whether or not they could just steal a digitized (or heck, even non-digitized) item from under our noses. (Hence the anxiety revealed by speakers quoted in the article, as well.)

I might have to list, "Talk to an attorney" at the top of my digitization checklist soon.

Monday, March 30, 2009

This academic political controversy brought to you by DSpace

Article title, "MIT to make all faculty publications open access"

To quote, "If there were any doubt that open access publishing was setting off a bit of a power struggle, a decision made last week by the MIT faculty should put it to rest. Although most commercial academic publishers require that the authors of the works they publish sign all copyrights over to the journal, Congress recently mandated that all researchers funded by the National Institutes of Health retain the right to freely distribute their works one year after publication (several foundations have similar requirements). Since then, some publishers started fighting the trend, and a few members of Congress are reconsidering the mandate. Now, in a move that will undoubtedly redraw the battle lines, the faculty of MIT have unanimously voted to make any publications they produce open access."

and

"The faculty will have to prepare an appropriately formatted copy of their works to the provost for hosting. MIT plans to place them on its DSpace system, a content hosting system it developed with HP and distributes under a BSD license."

This is actually huge news, particularly for academic researchers AND for users and admins of DSpace. I'm hoping this will mean that a lot more funding gets kicked toward DSpace, Inc. Granted, MIT is the institution that built DSpace, but this is the core reason it was created. I'm really eager to see what the future will bring in regard to this.

Friday, March 13, 2009

This is just cool

And this link: gopher://gopher.std.com/11/The%20Online%20Book%20Initiative

This is just cool. Although it's fun to consider the potential implications of Gopher having taken over, it would be sheer speculation!

Tuesday, March 10, 2009

DSpace, Part Dos

Now that I've let the DSpace project sit for a week and a half, my thoughts about using this program as an archival digitization content management system are distilled.

First, let me make a comment about another program that is out there. One program, ContentDM, is an exorbitant, annual price even though it is a fairly good program. In paying that licensing price, the user receives excellent support from OCLC, the Online Computer Library Center, to run ContentDM. The drawback of ContentDM, however, in archival digitization management is that it was designed specifically for libraries; libraries do not host unique materials, AND their items are offered for check-out. And, most obviously, libraries mainly stock books and journals only. So ContentDM is more geared toward library materials, and researchers looking at ContentDM sites will be using different access points through the metadata than will researchers seeking archival materials.

Dublin Core: Dublin Core is internet standard, and a cataloger worth his/her salt will be able successfully to create the access points necessary to bring researchers to the collection. Therefore, the most important aspect of creating an archival content management system is to create access points that state obviously where materials are--if it's in an archive, it is rare or unique--you really need your researchers following your trail of breadcrumbs.

DSpace itself. I've become increasingly fond of Dspace. I wish I could post my test website here, but unfortunately, this is not possible. Perhaps I'll post a screenshot eventually. As I had said earlier, I'd originally tried it on FreeBSD, and that was just a little too unwieldy for me. Gentoo proved surprisingly stress-free to run, and Tomcat on this box has been pretty stable. (I'd be curious to find out others' experience with Tomcat. Most of the discussion I've read about Tomcat relates to DSpace, and typically the errors people are having are standard administrator errors--errors that I myself ran into, too, until I realized what I was doing.)

So once everything was built, it took me about an hour to create graphics that would brand our DSpace test box, make it look like it belonged to the TTU SWCO. The file formats DSpace will display are many and varied. I posted a music collection (mp3) in our archive, as well as a broadside collection (thank you again, Dr. W). Finally, once things were in DSpace, I indexed it to be searchable, and things seem to continue to be running smoothly. I've not had to reboot the system or anything.

Things DSpace is not: It is not a typical content management system. You can't post a Wikki or a Forum in there to let users comment on your content. (Though it would not surprise me at all if there were modules for that. It is, after all, open source.) I question the accessibility of DSpace for users with disabilities; I should run some tests on that, too, to make sure that DSpace properly tags everything for these users. DSpace has a good variety of bin files to execute the most common tasks (backup, index, create users, etc.). I haven't had any problem with that. DSpace can use either Oracle or Postgresql. I'm running this version off Postgresql because of the small sample of material and the speed at which Postgresql was configurable.

If the collection decides to work with DSpace, things I will have to consider/get:

-Server: Probably just a PE RedHat
-Server room: In the process of cleaning out an old storage room. This entails everything from throwing garbage away to removing building floor and ceiling tiles to surplussing old equipment.
-Building support: Once we let people understand that digitization will happen HERE (x marks the spot), Jason, I, and a few other folks will provide training and recommendations for tactical support on the metadata creation side.

This is essential for a simple reason: no archival content management system is easy. I think some folks expect digitization to mean "scan this in," but in reality, there is a lot that goes into it, not the least of which is how to make a graphic (or other file) searchable, how to index it. DSpace (and all other digitization programs) provide us with this opportunity. It's important for everyone committed to using DSpace (whether at my office, or in planning to use DSpace elsewhere) that DSpace isn't easy, and it's going to take careful planning. As I explained to one of my bosses, I can't in good conscience recommend anything without strong planning first.

And, finally, the other thing I'll be doing this week is installing Archivists' Toolkit onto the server to be used with DSpace.

Friday, March 6, 2009

Government, Powered by Google

Okay, two links to share before I go into my diatribe:

http://news.slashdot.org/article.pl?sid=09/03/06/1326247&from=rss "America's New CIO Loves Google."

and

http://www.slideshare.net/domainlabs/building-more-transparent-effective-government-presentation "Building More Transparent and Effective Government: The Case Study of Washington D.C."

Okay, for one thing, any case study that uses the word "transparent," automatically should seem suspect. Transparency is the word "experts" use to make their audience think that all the information they need is revealed when, in reality, the reverse is the truth.

Second: I am terrified by the slide that shows the "Business/Gov't Tech Satisfaction" as being low UNTIL it has been powered by Google. When you're an "expert," and you are affiliated with a company AND have been asked by the government to research a solution about something--well, no flipping fig newton, of COURSE you're going to say that the company with which you're affiliated is the solution for fixing all the problems in the government. (And anyone who reads through this with an intelligent eye will recognize these arguments/anxieties.)

Implications: What are the implications on the American economic system of allowing a private corporation to be in charge of the digitization of America's records? I'm thinking about examples we've seen past and present of companies to which the American government (read: the American economy) has been too intimately tied. One example is Ford Motor Company, another example is Haliburton; I think of these two for different reasons, though I'd argue that each had multiple impacts.

What do we have to plan for electronic records management at a state university level, for instance, if the federal government is creating a digitization and ERM model through Google? Since I work at the archive for TTU campus, I recognize that this is a weighty question for all archivists to consider. Should Google be allowed to plug itself into all governmental archival practices? (And, another important question: has it already?)

I'm not trying to say that Google _IS_ the evil empire, but in some ways it is. I think the Google CEOs have demonstrated a lot of foresight in their digitization business model--and of course they're getting sued, but why shouldn't they?--but I also feel a lot of anxiety for obvious reasons. Google might BE the new Ford Motor Company: it is demonstrating incredible foresight; it is providing innovative solutions to problems that are only going to get worse without a modicum of remediation (and really, A LOT worse, and a LOT more than a modicum is needed); and it demonstrates the ability to achieve good solutions.

Even still, I can't stop the internal shudder when I see that "Business/Gov't Tech User Satisfaction: Powered by Google." And really, I'm using Blogspot, which also is . . . Powered by Google.

Tuesday, February 24, 2009

Bandwidth, Money, and Other Countries: We are so spoiled

Link: http://news.bbc.co.uk/2/hi/technology/7907529.stm Article: "Lord Carter Defends Digital Plan"

"'The truth is that not a single media company knows what its model will be in ten year's time,' he said."

This article provides a very interesting, though not as in-depth as I'd like, look at the state of Internet availability in the UK, as well as the state of record digitization in the UK.

I actually had to look again at the article's publish date because I was very surprised by a few statements, such as, "Lord Carter also used the meeting to criticise what he described as a "superficial misunderstanding" of how the UK would roll out next-generation networks (offering speeds of up to 100Mbps)"

I have traveled the UK, mostly staying in London, though also venturing to Dublin for a bit, and though I'm not surprised that so many people are left out of the loop access-wise--things are so dang expensive there--I am quite surprised that so few are up to even 100 Mbps, and that there are still a number of "notspots," to quote the article.

Friends in South Africa tell me that they are charged by megabytes per month. The folks I know from there are computer programmers, so they run out of bandwidth by about the middle of the third week of the month.

Friday, February 20, 2009

How to Build a Good Digitization Team

I now have a fully-functional DSpace test server that will be running a small dark archive. We are going to keep this DSpace instance private and contribute to the dark archive until we reach the point at which we know that the the archive is ready for public consumption. (Of course, "complete" also refers to copyright issues and license agreements to the items that have been digitized and placed on this system.)

Dr. Diane Warner--and her graduate student assistant--have been tremendously helpful in preparing an exemplary digital collection for the test server.

These two people don't seem to understand what a luxury it has been for me to work with them over the past three weeks as we have prepared the collection. It is an IT person's dream job to work on a team with colleagues who demonstrate high competence, sharp intellect, good imagination, and strong logic skills. I can honestly say that both Dr. Warner and her grad assistant repeatedly have demonstrated these skills, as well as an ability to surpass my own knowledge concerning the functions and capabilities of DSpace.

My recommendation this week for someone who is considering building a digitization project on open source software: Build your first project with a brilliant, enthusiastic team, and establish a good team dynamic from the beginning. Although I have now been completely spoiled from having worked with such a great team--I regret that I most likely won't have the opportunity to work with them again--I have also learned so much more than I would have had I started with a team that I had to lead by the nose from the beginning.

Based upon what I have learned over the past few weeks I'd recommend composing a digitization team of:

1) Strong, smart faculty member(s): this person proposes the digitization project and is your subject matter expert. (I was fortunate in that Dr. Warner brings to the table an organized thought-process and expertise with cataloging.)

2) Research assistant who can demonstrate think-out-of-the-box ingenuity, who can take initiative without prompting.

3) Flexible server manager who can adapt to changing project demands: I'm striving for this, and I am striving to listen better to the needs of my users without diminishing efficiency.

4) A desktop support person who runs interference when things have to get done on the server and only the server admin can do them.

If you can get your first digital project complete with a good team, you'll have more time to focus on the software, digitized content, and metadata than on team communication problems. I wish I could show the project here, just to be able to brag, "Look what we did!" I told the folks I worked with that getting digitization running here is going to be like fighting an uphill battle, but these folks made it all easier by building the steps into the hill.

My next post might be about the role of copyright in digital collections. Over the past few weeks, I've been questioning the scope of my position here at the Southwest Collection. Although I don't think I should be the advocate or approver of all things copyright, I do believe that, as digital initiatives coordinator, I should be concerned with communicating to all of my colleagues the importance of good faith effort in locating permission for digitization. (Dr. Warner and I had a long discussion about this, too.)

Sunday, February 15, 2009

Do We Need a New Internet? (Article)

Article: "Do We Need a New Internet?"

"The idea is to build a new Internet with improved security and the capabilities to support a new generation of not-yet-invented Internet applications, as well as to do some things the current Internet does poorly — such as supporting mobile users."

http://www.nytimes.com/2009/02/15/weekinreview/15markoff.html?pagewanted=2&_r=1

Stanford engineers are looking at building a more secure Internet. One security specialist quoted in the article discusses how the current Internet is Pearl Harbor waiting for the planes to land.

I wonder what the future of open source software would be on an Internet built by Stanford; of course my previous statement is mildly cynical. (Look, this is a job for a technical communicator!)

To quote the article again with something that most interests me as a dissertation topic:

"Proving identity is likely to remain remarkably difficult in a world where it is trivial to take over someone’s computer from half a world away and operate it as your own. As long as that remains true, building a completely trustable system will remain virtually impossible."

What will force us into Web 3.0 may actually be Internet security; Dell already ships laptops with the thumbprint scanner--and anyone who watches "Burn Notice" knows that THAT isn't very hard to get around. But if we're going to think of an absolutely secure Internet with some sort of driver's license verification, there's going to have to be a retinal scan database somewhere, perhaps a sub-dermal DNA database, and an agreement that Someone (capital "S") monitors that database to protect security interests. Of course there is a split problem with a centralized database: 1) Who monitors it, and 2) How much is it going to cost?

And really, now we're getting into my favorite consideration: can the Capitalist game that recognizes money as the points system viably survive when an individual is required to give physical evidence to prove that it is he who plans to buy that caparisoned wooden elephant figurine from, say, Thailand? And, in return, does that individual really need the Someone who Monitors the DNA Database to know he also has Addison's disease, via the DNA he supplied as his Internet driver's license?

I know this sounds sci-fi, but really, folks, don't be naive. Ten years ago we could argue that this dramatic shift would not happen, that we have nothing to worry about with unstable electronic identity narrative, but that was during Web 1.0--you couldn't make a web page where people left footprints. Now, electronic footprints are as easy to follow as if someone were collecting hair falling out of your head as you walked down a street.

Wednesday, February 11, 2009

Article

http://www.infoworld.com/article/09/02/09/06FE-shrinking-operating-system_2.html

"The Incredible Shrinking Operating System."

This is so true, and so pertinent to what we've discussed in the Online Publishing course I'm taking here at Texas Tech University.

Monday, February 9, 2009

I didn't have to eat my words--what a pleasant surprise!

I think last week I said something about time constriction and how an open source server running DSpace can be reinstalled in three hours' time, given no interruption. Last week, actually the very day that I posted that, I did something to crash the test server running DSpace. Although it had been something I'd planned to do, I was hoping to let the server run a little longer before the crash.

I determined to test my theory about three hours, and, as it turns out, when I block everyone from speaking to me, and when I let my desktop support person, Jason Price (a conscientious, invaluable colleague), run interference from distractions, I was indeed able to uphold my three hour promise.

I decided that instead of trying to mess around with all the crazy Postgresql permissions (I chowned where I shouldn't have), I wanted to upgrade to the newest version of DSpace and simply would replace the extant data tables. I also had a few items on the prior test server (services, software, etc.) that I really didn't need, and instead of just using a fine-toothed comb to take care of the things I didn't like, I realized it was a good opportunity simply to format the server, reinstall the newest release of DSpace (something I was sort of waiting on, but when I saw what crashed the system, I realized that I couldn't keep using that older DSpace release), and get everything running well.

What have I learned, then, since beginning this DSpace project? I have learned a most valuable lesson when working with open source software: Open source documentation is very poor until you understand how the writers are speaking; in other words, open source documentation gives you the very bare minimum without any contingencies or very good FAQ support for when errors crop up. Once I'd installed DSpace the first time using Carlos Ovalle's instructions (http://sentra.ischool.utexas.edu/~i312co/dspace/), I understood how all the pieces of the puzzle fit together, enough to where I could then navigate the DSpace instructions without confusion. I also realized what I have to change for running DSpace on Gentoo. I'm a little nervous about running on the Dell-supported RedHat server, switching OSes, but really, one open source OS isn't all that different from another, not in the nuts and bolts capacity. (Insert smirk from the open source audience here).

Since this is only Monday, I plan to test Archivists' Toolkit on this Dspace installation. If you're wondering what I'm referring to, please visit http://www.archiviststoolkit.org/ to find out!

That's all for this week!

Wednesday, February 4, 2009

Digitization: Open Source vs. M$, and Time for Servers

Many people assume that digitization of an archive or records system means scanning things into a computer. If digitization were as simple as someone using the flatbed Epson to scan in the photographic archives of the Texas Tech football team playing their way toward the 1939 Cotton Bowl, then most digitization specialists would be grossly overpaid.

Digitization processes, at the minimum, require three components to be successful. For this example, I will continue with my photograph metaphor. First the process of scanning must follow a set of accepted standards to provide for consistency of digital development over the course of years and technologies. In other words, these standards provide for digital preservation with the primary intent of accessibility.

Second, a metadata standard must be agreed upon by all project managers. DSpace, the program that I am testing at the Southwest Collection, uses Dublin Core metadata standards by default. Although Dublin Core can be swapped out with another standard system, it is important to note that moving away from the default of Dublin Core provides for a fair bit of extra work on the server backend. A metadata librarian chooses to represent a digitized item within the metadata system, and this representation, whether in the form of Dublin Core or something else, offers a lot of control, on behalf of the metadata librarian, over how the item will be searched, what digital collections it will belong to, etc.

The third component, and perhaps the most interesting component in a digital project, is good editing judgment. A digitization project manager, often a faculty member in the archive here at the Southwest Collection, must consider how best to represent a photograph—do well-taught students using markup determine how the photograph is searched; does the faculty member guide student assistants in the keywords they are allowed to use to describe an item; if the back of a photograph has any sort of information on it, should that be scanned in? These types of judgments must adhere both to standards and to editorial representation. If a university archive is providing a digitized version for research access, the archive must consider what best preserves the state of the information as represented in the physical photograph to the greatest extent. If a museum is presenting the same digitized version of the photograph, the museum will have to use different metadata standards to represent the photo as part of a wider collection, for instance in an electronic exhibit format.

Where do DSpace and open source operating systems fit into these questions? Let me start with open source operating systems: M$ provides an easy-to-use, fast, quick-and-dirty server program that quickly integrate information onto the Internet, onto network shares, into databases, and so forth. The problem, however, is that much of the M$ software is proprietary, expensive, and bulky. The M$ operation systems are prepackaged with hundreds of unnecessary services that, really, could be disabled, as long as the administrator knows which ones are unnecessary for performance. Additionally open source software is very difficult to run on M$ because of the various compatibility issues, and it is usually much easier to run off of open source servers.

Open source operating systems provide a free and quick means of manipulating and serving data, as long as the administrator knows what s/he is doing with the software. A lot of people run screaming when they hear “Unix,” or “Linux,” and most don’t even know what “FreeBSD” means; it is important for admins interested in Unix or Linux to realize that the act of running the open source operating system will give them an enormous amount of control over what actually runs on that server.

An open source server administrator who knows what s/he is doing can format, rebuild, and reinstall an operating system in approximately three hours, which includes reinstalling all necessary software packages-- given the opportunity for unwavering attention to the matter at hand. When an open source content management system like Plone crashes on a server administrator, all that is really necessary is a single service restart: /zope/bin/runzope restart, for instance, without having to rely on five other dependencies to perform this command. Another example, then, is how DSpace runs. If a service interruption occurs with Dspace, for instance on my Gentoo test server, all I have to do is /etc/init.d/tomcat-6 restart, /etc/init.d/postgresql restart, and that is that. The most important thing on the server end of running a digitization server, in fact, is monitoring where your database is being stored—keep DSpace on a different disk than your database. (A consideration of spindle placement is important, too, in building a larger digitization server. But more later about that.)

As I am testing with DSpace, I need to learn more about PostgreSQL, which brings up my next point. A serious danger in IT is allowing oneself a very wide breadth of knowledge without a much needed depth. The job in itself necessitates a learning adaptability for new software, new hardware, and even new user learning styles; unfortunately, the greatest limitation to any server administrator is time. Between constant interruption and poor funding, often new software and hardware that should be put into production as soon as it is purchased or built sits in a corner, rotting into obsolescence, simply because the administrator has had no time to build, test, secure the new product, nor time to train users on it. This, at least, has been my observation of other friends in IT in the academic setting, and it has certainly been my own experience. Unless one focuses without interruption, there is a tradeoff of depth of knowledge for flexibility of learning.

Tests servers offer admins the opportunity to understand the nuts and bolts of what goes on. Also, in testing with Linux or Unix (like FreeBSD), I highly recommend that you avoid using the GUI entirely. If you are going to learn the operating system, learn the command line first. Otherwise, you’re really stinting yourself on a good opportunity to know exactly how things function.

Monday, January 26, 2009

DSpace, FreeBSD, and Gentoo

My name is Ana, and I work at the Southwest Collection at Texas Tech University. I was hired into my position six months ago, and my role is to build and test digitization software and servers for archival digitization purposes.

My first open source endeavor was with FreeBSD, back in 2005. I determined to learn only the command-line interface for the simple reason that the command-line interface provides a very thorough understanding of how an open source, unix-based operating system functions. As I learned more about FreeBSD, I became enamored with its flexibility, the ease of control, and very strong security features.

I built my first production FreeBSD server to serve Plone and Zope for the Landscape Architecture program at Texas Tech University. Zope itself was a little clumsy, but working with the Plone CMS proved extremely easy, and I was able to give professors and student organizations access to build quickly their own classroom and organizational content management systems without deterioration of skill over time. (You can view one example of a departmental announcement CMS served from this box here: http://cms.larc.ttu.edu/TTULarc .)

I am now testing with a product called DSpace, an open source digitization management system with an extremely well-designed search feature. (http://www.dspace.org). I'm impressed with how easy Dspace is to operate, although with my own minimal database experience, I'm going to have to figure out how to replace and manage databases in Postgresql. This uses Apache Tomcat, which is very similar to the regular Apache web server but incorporates a few more utilities. I tried to use DSpace, Tomcat, and Postgresql on FreeBSD, but I couldn't get it working. Not only could I not get it working, I also couldn't locate one thing about Dspace and FreeBSD. (Give it a shot. Google the following, exactly as I type it here: "Dspace"+"FreeBSD"+install) If you look at the results, although there are a ton of websites that appear, you'll see that although the sites look promising, they actually only offer more problems than they solve. For instance there is a very helpful Japanese page that comes up second or third on the list, but after using Google to translate the page, I quickly discovered that the translation wasn't very accurate. I think this page could be very helpful, but I am still not sure.

So, since I could find little help on installing DSpace on FreeBSD (that dspace installation diary for freebsd is the biggest letdown), I determined to try another open source OS, but I decided to try a Linux brew.

The Linux brew I went with was Gentoo, seems like a solid, stable OS that is as flexible as FreeBSD, but with a less elitist group of developers. (I won't go further with this. I love FreeBSD. I hated moving to Linux myself!)

Either way, I found a really well-written Dspace install how-to for Gentoo. I tried to use DSpace 1.5 (the newest version) with these instructions, but it just didn't work. The DSpace installation instructions themselves are fairly shoddy, and I couldn't waste any more time trying to work around installing version 1.5. Instead, I went ahead and moved a notch down to the older DSpace 1.4.2. I'd rather use the newer relase, but I figured as long as I got it put together quickly, I'd be willing to work the kinks out later.

Anyway, here is the link to that website:
http://sentra.ischool.utexas.edu/~i312co/dspace/

I keep meaning to track this guy down and thank him. He did a great job, really helpful. I had to modify the instructions a little bit because I used a newer version of Postgresql and of Tomcat (Tomcat-6), but it worked out fine. I will soon try to update to 1.5, just to find out how complicated the update process is.

The more I think about it, the more it nags me that there is such poor material out there about installing Dspace on a FreeBSD box. I think I'll go ahead and try again on my FreeBSD box, perhaps at my house instead of at work so that I'm not wasting work time doing it, and write up some instructions. This way, there will at least be something out there. (I even tried sending a question out to the FreeBSD mailing list, but there were no responses.) I guess both Dspace and FreeBSD are each individually mildly obscure, and the two combined is simply an aberration. Ha, world, no longer!

The other thing that bothers me is how shoddy the DSpace installation instructions are. The documention goes on for thirty pages (or so, maybe less) about how to create metadata fields for tagging, but there are three pages for installation, and the product will not work when following those instructions. Granted, I've always been better at learning by doing rather than learning from a set of instructions. I know my FreeBSD install experience will improve now that I actually got Dspace working with Gentoo, so perhaps soon I can practice enough to where I can help write better documentation for the Dspace project itself.

That's what the open source community is all about: contributing what you can, when you can, in hopes of helping someone else who might want to someday contribute.

That's all for now. Tune in later--at the same bat time, same bat place!