Archive for the ‘Matt's Adventures and Musings’ Category

Agile Languages & Fedora — Update from OR09

Wednesday, May 20th, 2009

Leading up to this year’s Open Repositories, it became clear that there was demand for a BOF (Birds of Feather) session focused on agile languages and Fedora.  I pitched the idea in an email to a couple colleagues beforehand and then announced the BOF at my presentation on Monday morning.  Rather than constricting it to Fedora projects, I billed it as Agile Languages and Repositories.  About 30 people showed up.  The split was pretty even between Ruby, Python, and PHP developers.  About a third seemed to be Java developers in the process of defecting.  In addition to people doing stuff with Fedora, there were a handful of DSpace developers and possibly a couple who maintain ePrints repositories.  

For the first half of the BOF we sat in mixed groups, eating our lunches and each talking about the work we do.  We then split up by language (Ruby, Python, PHP) and discussed language-specific topics.  For that second half I sat at the Ruby table where we talked about ActiveFedora, JRuby, RDF support for Ruby, MODS support for Ruby, Solr (solr-ruby and RSolr), and how Blacklight fits into the mix. 

I closed the conversation by asking if we should set up email lists for collaboration.  It seemed reasonable to set up a general mailing list for the solutions community as well as a list specifically for people doing stuff with Ruby, Fedora repositories, and (most likely) ActiveFedora.  I also resolved to encourage the creation of Python-oriented and PHP-oriented equivalents.  For now I have created two lists on Google Groups.  The first one, Fedora Commons Create, is for general discourse about creating client applications for Fedora.  The second, ActiveFedora / Ruby + Fedora Commons, is for Ruby-specific collaboration.

In the end, I was really pleased to realize that for the first time we had a substantial group of people interested in each of the main interpreted languages (Ruby, Python, PHP) and each group had at least one open source Fedora-based project to use as a starting point for their conversations.  The Ruby group had ActiveFedora, the Python group had Ben O’Steen’s work and Peter Herndon’s Django integration, and the PHP/Drupal people had Islandora & Fez to start from. 

This was a comfortable step forward from the scenario as it was a year ago.

Google Groups
Fedora Commons Create
Visit this group
Google Groups
ActiveFedora / Ruby + Fedora Commons
Visit this group

Trainspotting as an explanation of the Semantic Web

Saturday, May 9th, 2009

I just came across this post on Russell Davies’ blog titled I like things to be numbered.  It’s an extract from an episode of the BBC Radio Show Museum of Curiosity. In it, a railway enthusiast named Chris Donald explains the beauty that trainspotters find in the fact that the railway companies assign numbers to absolutely everything, including clocks. [listen to the mp3]

Listening to Chris Donald speak, I couldn’t resist seeing the connection to Linked Data and the Semantic Web.  He nails the key concepts of the beauty and the loaded possibilities that come from being able to trace the connections between things.  The three-minute account even draws out the aspect of Semantic Web that tends to make people squeamish:

[...] I like things to be numbered.  I don’t know why; I just do.  The idea that every bridge had a number attached to it appeals to me and it appeals to a lot of people. [...]  It’s all quantifiable.  They know how many trains there are because they’re all numbered.  They have a book with all the numbers in it.  It’s all very controlled and they can understand it and it’s very two-dimensional. [...] with trains - you stand on the platform and you look at the track and you know that that metal bit of track on the floor is touching every train that you’re looking for and you understand that it’s a puzzle that can be solved.

I frequently find myself trying to adequately characterize the distinction between Semantic Web and Linked Data.  Is it just a re-branding of the concepts?  Is it an offshoot of the greater phenomenon?  In this little account by an avid trainspotter, I see a wonderful way to point out the distinction.  

The past 15 years of noise about Semantic Web have had the ring of this trainspotter’s “I like things to be numbered [...]  It’s all very controlled and [I] can understand it. [...] it’s a puzzle that can be solved.”  While there is nothing wrong with this per-se, it is only going to motivate certain types of people.  Further, it lends itself to visions of grandeur that quickly wander into a quagmire of failed logic and, to be honest, treads close to the intellectual foundations of fascism.  

Meanwhile, the burgeoning Linked Data movement is much more akin to railroad engineers saying “Well, we numbered everything out of necessity.  Might as well let the rest of the world make sense of those connections too.  Who knows what they’ll get out of it, but it certainly doesn’t hurt us to share the data.”

There’s one key place where I wish to differ with Mr. Donald, and I think a lot of Linked Data people will agree.  He describes this world of connections as being very orderly, controlled, and two-dimensional.  He says this because he is only looking at a single set of data from a single perspective.  As soon as you open your eyes to the growing cloud of linked open data, the landscape becomes much more akin to a wilderness, or possibly a garden, where the surface may seem simple and pretty while the world underneath is thriving with the complex, messy stuff of life.

Ads deserve permalinks too

Monday, April 27th, 2009

Twice already this weekend I have wanted to reference an advertisement in a blog post and been unable to link to that ad.  I do all of my television watching online, usually watching on sites like Hulu that provide permalinks for shows, episodes, comments, etc.  However, I have yet to find a permalink for any of the advertisements on those sites.   In our media-conscious world, we are almost as likely to discuss an advert as we are to discuss an episode of a television show or any other video content that you’ve attached the ad to.   Let’s say the mantra: If it’s valuable, it deserves a permalink.  If people might discuss it, it deserves a permalink.

As our modes of media consumption change, advertisers are being forced to adapt.  The most effective adapters have taken to creating ad content that is independently valuable in the eyes of their target consumers.  This is a great thing and it should be encouraged.  We should and do reward innovative advertisers by talking about and sharing their ads.  This would be so much easer to do if they gave us permalinks to use.

You may ask how this is different from viral video advertising, or you may point out that many ads (especially funny ones) already show up on YouTube.  Well, there is certainly a strong connection, but there is an important difference in attitude.  The notion of viral videos has built up in the venue of YouTube and social networks.  It carries a connotation of low production values, simplistic themes, and playing to the finicky, meme-obsessed banality of the crowd. The stuff of viral videos gets dumped into the swamp that is YouTube (or one of its clones) and is left to fester.  Those who watch the videos are treated like little more than flies.  I, the consumer, am reduced to a tick on the view count and possibly a comment on the generic video viewer page.  By allowing a YouTube URL to be the defacto identifier of your content, you’re basically conceding that the content doesn’t deserve to be treated with any distinction.

Our cultural shortsightedness regarding the web, its possibilities, and its future usage is comical.  

Note: In the sense that I’m using it here, permalink == URI (Universal Resource Identifier).   Yes, this post is basically an argument that ads, like everything else, should be treated as nodes in the semantic web (aka Linked Data Cloud).

Account from OGF25 Repositories Workshop: Creating a Repository Standard?

Friday, March 20th, 2009

04 March 2009

Catania, Sicily, Italy

Open Grid Forum 25th Conference (OGF25)

 

It’s not entirely clear when I figured out that I was sitting on a standards body panel discussing the creation of a digital repository related standard.  I’m pretty sure it finally clicked sometime after the session was over, once I had consumed a couple glasses of wine.

 

I still don’t see what I contributed to the conversation, though the other participants assured me that my comments were useful.  The experience reminds me of my friend, let’s call him Josh, a community organizer who was recently pulled onto one of the Obama administration’s advisory panels.  Shortly after joining the advisory panel, Josh confessed that at the end of most calls he has to follow up with a friend and ask “Ok. So what exactly did we just decide and who is responsible for doing what?”

 

The panel discussion started by making observations that we’re all familiar with:

  • the importance, and associated challenges, of unique identifiers and persistent URIs
  • search, retrieval, and management are separate concerns, each with appropriate standards associated (ie. SWORD, OpenSearch, etc.)
  • cloud computing is very different from cloud storage

After this, I quickly found myself in completely unfamiliar waters when conversation abruptly turned to the creation of a standard  for digital repositories. I thought “Pshaw.  We don’t need Yet Another Standard.  Where did this come from?”  In fact, the whole field of repositories is so new that the prospect of a repository standard seems absurdly premature to me.  Discussion on the panel honed in on the two obvious contenders for a standard: 1) metadata requirements, and 2) functionality profiles (a list of features necessary in order for a repository system to be deemed compliant and interoperable).  From my perspective, repositories already swim in a glut of metadata standards (as well as non-standard, ad-hoc metadata) and, by nature, must embrace heterogeneous metadata.  The second notion, that of functionality profiles, sounds like something that few will read and none will understand.  To be honest, the entire discussion confused me.  I did my best to contribute to the discussion where I could.

 

After the workshop ended, I had a chance to catch my breath and discuss the panel with a couple of people.  Eventually, I came to look at the whole scenario from a different perspective and had a mild change of heart.  In a discussion with Neil Chue Hong, a very smart guy from Edinburgh, I started thinking about all the informal conclusions that frame discussions between developers at conferences like Dev8D, Code4Lib and RepoCamp.  I then thought about all the little architectural wins and failures that I see in software like Flickr, YouTube, Hulu and (woah) ABC’s full episode player.  After all, these are repositories too. Within a few moments of pondering, an initial list of obvious basic guidelines shone through quite clearly.

  • give permanent, unique URIs to all content you expose, even if you intend to limit access to that content based on geography or time of access
  • support linking with versioning or datetime info
  • expose a RESTful API
  • give preference to AtomPub
  • consider ORE when you need to express aggregations of data
  • provide linked data (RDF) endpoints

Some open topics also seem like obvious fodder for discussion:

  • what query language(s) to use in search APIs
  • navigating the difference between standards and interoperability
  • leveraging standards where possible (ie SWORD)

These are merely the things that seem obvious to me right away.  What would happen if we got the really smart people talking in this vein?  

 

I think this warrants further exploration and, strange though it is, I expect that the outputs of such exploration might resemble the stuff of standards bodies (be it a recommendation, a community document, or a standard).  Possibly I have been infected with that odd standards-wonk bug, or possibly I’m just catching up with the rest of the world in acknowledging the inevitable.

The Wave Builds: Thinkers beyond the library world suddenly start talking about digital curation.

Wednesday, February 4th, 2009

To give you a sense of the sudden traction that our area of expertise has deservedly gained, check out the Snarkmarket Book Project which was posted only yesterday. It has already garnered over 100 pitches of subject matter for the “New Liberal Arts” and more than a third of them concern Digital Curation and/or Internet Archivists.

The Librarian Avengers in the crowd will especially relish this comment by Matt Thompson:

“Library science” is a fusty old term that increasingly fails to fit an ever-expanding and ever-more-important range of skills. “Knowledge management”is weighed down by the awful word “management.” In Matt University, we’d rebrand it “knowledge mastery” or something similarly grandiose. After all, this is becoming critical. How do we capture, structure, sift and preserve enormous bodies of information?

A different kind of long tail

Thursday, November 20th, 2008

This morning I was reading a bio of Ezra Koening in Salon’s Sexiest Men Living Series.  The bio had a link to an entry about Koening’s band, Vampire Weekend, in NPR’s Global Hit Podcast.  After listening to the NPR entry about Vampire Weekend, I habitually added the podcast to my RSS reader.  I was impressed to discover that the feed has 765 entries.  That’s five entries a week going back to 29 November 2005; three years of trends in global music culture, right there at my fingertips.

To my eyes, this seems like a wonderful new meaning of “long tail”.  It’s something that we’ve seen pending on the horizon but it’s now finally beginning to manifest.  Our civilization documents itself so thoroughly that we can grab a detailed background on nearly any topic.  This has been true in a cursory way for a while, manifesting in sites like wikipedia, but the internet is quickly reaching an information saturation point and architectural maturity that allows us to view the entire web as a living, self-documenting wiki.

Up until now, the long tail has mainly referred to an economic construct: by reducing barriers to entry into markets, by radically reducing distribution costs, and by increasing the opportunities for direct engagement between producer and consumer, the internet has made it profitable (or at least financially tenable) to cater to the countless minority niche interests in any given market.

I see a different kind of long tail coming to prevalence now.  Where the economic long tail is far reaching, the long tail of information runs deep.  As with the NPR podcast, we can look back in time and find a wealth of source material.  Armed with 20/20 hindsight, we can view and review the many ways that our civilization has chosen to express itself.  Published materials no longer die a day or a week after their creation; instead they stay alive for us to find them, or find new meaning in them, in the future.  Even better, we have begun to resurrect the materials that might have been presumed dead, destined to spend eternity on a dark dusty shelf.

When I look at that NPR podcast, I see context.  I see one thread in a complex history that I get to explore and rearrange at my own leisure.  Each of us sees this ocean of information differently, and each time we dip our hands into its depths we return with our own fresh version of the story, woven from the many disparate threads (and the gems upon them) that lie beneath the surface.

RIRI Day Two: Richard Green on Institutional Repositories

Tuesday, August 12th, 2008

At the moment, I’m witnessing Richard Green from University of Hull masterfully dissecting the notion of an Institutional Repository.  Its a treat to have someone spell this stuff out step by step from such a grounded perspective.  One wonderful element of his presentation was to simply leave some time for people to explore ePrints and DSpace repositories [1][2][3] (from the perspective of public end users).  He made the point that people, myself included, often work with only one repository system (or no repository system) and neglect to simply explore the existing options.

In the midst of his presentation about the RepoMMan project, Richard posed an interesting pair of questions regarding the prospect of giving users a private “My Repository” space for managing their stuff.  He asked us:

  1. What might a user want to get from “My Repository”?
  2. What might a user want to put into “My Repository”?

He allowed the room to ponder these questions for a while.  I must admit that I was left doubting my knee-jerk responses and in turn thinking a bit further about what users really want from systems like this.  Richard then reported that a survey of his users at University of Hull provided a resounding response.  His users wanted:  Storage (safe, backed up), Access (easy and from anywhere), Management (full version control), and Preservation (to know stuff is there when they want it, short and long term).  I found this to be much more straight forward than the responses I expected.

Richard then gave us a tour of the RepoMMan interface.  Some key characteristics of the systems are the fact that the web interface, which is implemented in Flex, mimics an FTP client (to provide familiarity) and the metadata editor uses Data Fountains to pre-populate objects with automatically generated metadata so that users can then review and revise existing metadata rather than starting from a blank form.

The presentation will continue this afternoon.  By the end of the week, Richard’s full slide deck for the presentation will be up in the RIRI repository.

At RIRI: The Red Island Repository Institute fires up

Tuesday, August 12th, 2008

The Red Island Repository Institute (RIRI), hosted by the University of Prince Edward Island (UPEI) has started with a bang.  Sandy Payette spent an entire day feeding the room with a wonderful mix of vision, software architecture, social context, and technical details.

Mark Leggott has put together a great event. There are people here from all over North America, and even one visitor from Australia.  Everyone has been enjoying the beautiful environs of Prince Edward Island and the quality of information being exchanged is top notch.  I particularly like the fact that Mark is “drinking his own kool-aid” by setting up a Drupal/Fedora site for the institute.

This should be a great week.

In Boston, reading The Register

Wednesday, August 6th, 2008

I’m in Boston at the moment.  I’m hanging out with my sister’s pitbull today while I prepare for the Red Island Repository Institute.

This morning I added The Register to my RSS subscriptions.  I’m a bit intimidated by the volume of content that the feed puts out, but the info is just so darn tasty.

Fedora Solutions Integration Council

Thursday, July 3rd, 2008

Picking up from the ideas in The Missing Sync for Fedora Commons, I’ve been talking with Thorny and Sandy at Fedora Commons about creating a Fedora Solutions Integration Council.  We haven’t quite figured out the structure of it, but the ideas are coming together pretty quickly. Bottom line, the council’s responsibility is to help everyone make informed decisions and support each other’s work.  

 As a first stab, I’m putting effort into three things:  

  1. bring together the streams of communication (ie. blogs, irc, etc) 
  2. help projects find and connect with others who are doing similar work
  3. identify the major themes: problem areas, innovations, exciting solutions, etc.

Ultimately, I hope this will allow us to shed light on the various avenues of exploration in Fedora-centric application development.  So many people are doing such interesting and exciting work.  It’s time for us to talk more openly and enthusiastically about it.

The other Fedora Solutions Councils are organized around themes like eScience, Museums, and Education.   In contrast, the Integration Council is aimed at addressing the cross-cutting concerns of application development.  We all have to deal with things like access controls, scalability, and workflow.  The best solutions to these types of challenges are often applicable in many contexts, regardless of whether you are an eScience project or a small humanities archive.  Our aim is to get as much information flowing between developers as possible.  I want to let developers decide for themselves which ideas apply to their work.

Watch this space.