Code4Lib & Dev8D: How they measured up last year.

February 8th, 2010
Written by Matt Zumwalt

In early 2009, I chanced upon becoming the only person to attend both Dev8D in London and Code4Lib 2009 in Providence, Rhode Island. While at Code4Lib, I had every intention of posting a comparison of the two conferences, but eventually decided that nobody would care to hear about it.  The blog post sat unfinished for a year until the news came out that both conferences would be happening simultaneously this year.  Suddenly a side-by-side comparison seemed much more meaningful.  I dug into my notebooks and found the table you see here.

Further down the page, for the uninitiated, I’ve included an off-the-cuff description of each conference.

Code4Lib & Dev8D 2009 - Quick Comparison

  Dev8D Code4Lib
Primary Area of Focus Repositories & Academic Computing Libraries
Dominant Vendors Blackboard, Microsoft ExLibris, OCLC, Talis
Relationship with Vendors undecided "make it good or we'll throw bacon at you."
Preferred Channel of Ongoing Communication twitter! everywhere! irc: the only true channel
Programming Languages python, ruby, java, php python, ruby, java, php
Presentation Structure multiple simultaneous sessions single track
Presentation Format very little powerpoint, more conversation lotsa powerpoint, but fun
Worst Technical Glitch troublesome wifi troublesome wifi
After-Hours organized evening activities "cliques"
People in Attendance 250? 90?
Popular Terms "stories" "z.3950", "METS", "LCSH"
Word Most Likely to Elicit Booing "xml" "xml"
Biggest Buzzword Linked Data Linked Data
Data Interfaces & Formats Most Often Mentioned SWORD, ORE, RDF Jangle, METS
Dominant Software Products DSpace, Fedora, EPrints WorldCat, Blacklight, VuFind
How Established Is It? might have become a one-off third year and going strong
Gender Balance 3 women? 15-20% women
Pigeonhole geeks? nerds?
Snark Figurehead repohate anarchivist
Digs Palmer's Lodge (free rooms in a crowded hostel) Renaissance Providence (a restored Masonic temple)
Location London, UK Providence, RI, USA
Organizers David Flanders - a true dynamo from JISC host committee, most decisions made via community voting
Mood Around Lightning Talks easy to sign up, low pressure audience competitive signup, intimidating audience
Hygiene occasionally stinky I was the unkempt one.
Scheduling spare time - by design packed schedule
My Most Memorable Moment The Dragons' Den - Pitching software ideas to inpressive judges, with real cash money reward at stake. LinkedData Pre-Conference with Ed Summers, Dan Chudnov and Michael Giarlo

Code4Lib: the big annual download for coders in libraries

Code4Lib has a large, established community based mostly in North America.  As the name implies, the community consists of “computer programmers and library technologists who largely work for and with libraries”[from Code4Lib wiki]. The Code4Lib Journal and the #code4lib irc channel are hubs of intense discourse year round.  The Code4Lib conferences are run as a single track where every presenter delivers his/her work before the entire group of conference attendees. The lineup of presenters, along with the location and dates for the conference are chosen each year by means of an open, public voting process.

The annual Code4Lib conferences are the primary place for this community of extremely smart, innovative people to exchange notes about the work they’re doing.  It’s very intense, very technical, and extremely informative.  There is very little hand-holding; everyone is expected to keep up with the flow of information or get out of the way.  For the past two years (possibly longer), registration has reached capacity within 48 hours of opening up.

Taken as a group, the Code4Lib crowd can seem intimidating.  They have a very strong meritocracy, which is one of their biggest strengths.  The only sure-fire way to get attention in this community is to make great software and to share it.  If you do that, you will be acknowledged and socially rewarded.  I heard complaints about “cliques” forming, mainly because everyone was left to fend for themselves in the evenings - people who already knew each other clumped together and swiftly disappeared.  Arriving as an outsider, this was unnerving for me at first but in the end everyone was actually very friendly and eager to trade ideas with anyone who could keep up with the intellectual pace.

My most memorable moment at Code4Lib was the Linked Data preconference hosted by Ed Summers, Dan Chudnov and Michael Giarlo from the Library of Congress.  As part of the day-long workshop, they set up one of the best hands-on lessons I’ve ever been part of.  The premise was simple: use FOAF profiles as the means for entering into a raffle.  This relatively simple exercise forced everyone in the room to, at the very least, publish a FOAF document (anywhere on the web) with a couple of required assertions included.  Those with more experience helped those with less.  Everyone gained something and one key concept nailed its mark: the linked data movement of 2009 has many, many things in common with the early world wide web, and everyone is invited along for the ride.

Code4Lib 2010 will be held in Asheville, NC February 22-26.  It has been sold out since December.

Dev8D: making developer happiness in the UK

Dev8D is the brainchild of David Flanders at JISC.  The name, Developer Happiness Days, harkens to the meme of developer happiness that became one of the core mantras for “lightweight application framework” evangelists around 2005/2006.  As I understand it, David looked around at all of the developers working on JISC-funded projects across the UK and saw a massive pool of technical talent hampered by the fact it lacked an  ongoing community for exchange of ideas, skills, criticism and praise.  He convinced JISC to fund a small conference specifically for software developers, and specifically aimed at fostering a developer community.  They made him promise to provide proof of “innovation” happening and — tada — Dev8D was born.

I’m tempted to describe Dev8D as zany.  David threw in as many activities, challenges, rewards, and discussion topics as he could possibly muster.  His goal was simple — get people talking, encourage them to dream up new technical possibilities, reward them for sharing those ideas, reward them even more for executing those ideas, and have fun in the process.

It would have been hard for an enthusiastic software developer not to have fun at Dev8D.  We were all basically along for the ride.  Every time you turned a corner, someone had cooked up a new nifty little web app or come up with some novel way to use an existing service to augment our Dev8D free-for-all.  David used wordle to generate tag cloud “name tags” from everyone’s blogs.  By lunch on the first day, someone started projecting a twitter aggregation of #dev8d on the wall.  By tea time, we had two or three custom apps floating around that did wacky stuff based on codes in your tweets.  (In February 2010, this all may sound a bit tired, but remember that a mere year ago all things twitter were brand new to nearly everyone.)  The name of the game was “dream it up and do it”, and we certainly did that.

My most memorable moment at Dev8D was The Dragon’s Den.  This was David’s solution to JISC’s requirement that he provide proof of innovation as outputs from the event.   JISC set up a Developers Challenge, providing real-world users from UK academia and soliciting the Dev8D attendees to submit their best ideas for new software to solve those users’ technological needs.  The best team would win a wad of cash in exchange for documenting their idea.  The way you submitted your idea was by presenting it in the Dragon’s Den, where a panel of judges from JISC and local industry would hear your idea, ask questions, and provide feedback.  Though my submission didn’t win the challenge, I learned a lot from those judges and would participate again in a heartbeat simply for the feedback.

Dev8D 2010 will be held in London 24-27 February.  They may still be accepting new registrations, though all of the free accommodations have been slurped up.

Crosstalk in 2010

Later this month, Code4Lib 2010 and Dev8D 2010 will have three days of overlap.  I will be in Asheville, NC for Code4Lib and Eddie Shin will be in London for Dev8D, so MediaShelf will be plugged in on both sides of the pond.  We have promised to encourage crosstalk between the events, though we haven’t sorted out the details.  When we settle on a plan, we will post an update on this blog.

RIRI Year Two Redux

July 25th, 2009
Written by matt

This week the University of Prince Edward Island hosted the second installment of Red Island Repository Institute (RIRI).  Participants came from North America and Europe to get a week-long intensive immersion in all things Fedora.  The institute is organized by Mark Leggott and the team at UPEI who created Islandora, a plugin that integrates Fedora into Drupal.  Thorny Staples (Product Director for Fedora Commons), Chris Wilper (Technical Lead for Fedora Commons), and I were the primary instructors.

Due to budget cuts across the board, there were less people in attendance at RIRI this year.  Over the next few years, broad community collaboration is certain to involve much more remote interaction online and less face-to-face exchange at large conferences.  I have a hunch that this will actually be beneficial for the growth of technical communities as long as it doesn’t last too long or become too restrictive.  After all, there is something potent about the ethos of Less Talk, More Code.  In the meantime, those of us who were able to make it to PEI had a really productive week.

Though structured as an intensive week-long training exercise, RIRI has taken on an element of information exchange between all of the participants.  The predominant flow of information was, as you would expect, from the three instructors to the 15 attendees, but I think that Thorny and Chris will agree with me that we have all picked up some good information and ideas along the way.  If anything, the institute has started to migrate in the direction of that quasi-mythical ideal conference where you get the conference-type info about who is doing what in your field while also getting your hands dirty with workshops and in-depth detail and discussion of how they are doing it and how you might build on their efforts.

Learning Curve

Those of us who were attending RIRI for the second time were of the consensus that this year had a much more gratifying learning curve with a sufficient continuity carrying through all five days.  This was predominantly thanks to how much Islandora and ActiveFedora have matured in the past twelve months.  We had everyone up and running, writing and running code against Fedora, in the multiple hands-on sessions through the week.

In contrast with last year, where we spent quite a bit of time on installing and configuring Fedora, this year we were able to use a VirtualBox that the Islandora team set up.  The VirtualBox has a debian image with Fedora, Islandora and Drupal pre-installed.  On Tuesday I installed Ruby, ActiveFedora and Solr on there too.  By Wednesday, we were able to pass around a couple of USB sticks and have everyone running identical development environments for the hands-on sessions.

I’m sure we will soon post the VirtualBox image for anyone to download.  Next year, everyone will probably have it running on their laptops before they even arrive on the Island.

Islandora

The Islandora team at UPEI have been very busy setting up Fedora-based Virtual Research Environments for the variety of scholars at their University.  It seems that once they had hammered out the module that integrates Fedora into Drupal, they fearlessly dove into spinning off deployments for numerous disciplines and projects.  I was thoroughly impressed by how many VREs they have set up, the diversity of content they are collecting, and the range of tools that they are setting up to manage, display, and operate on that content.  Four years ago, I looked at Fedora and saw a tidal wave of potential uses behind it.  That wave has finally begun to break and it is as exciting as we had imagined.

ActiveFedora

We used ActiveFedora as the basis for the hands-on sections of the institute.  On Wednesday we basically ran through the console tour that you can find on the ActiveFedora project site.  In the developer breakout on Thursday, we defined some ActiveFedora models in a rails app, created some Fedora objects based on those models, and explored their RDF relationships.

“That really makes sense.” seemed like the main response to ActiveFedora.  I can’t think of a better confirmation that designing the Domain Specific Language (DSL) for ActiveFedora’s models was worth it.

As part of the workshops, I also ran through the improvements I have planned for ActiveFedora version 1.1.

Hydra

Thorny gave a really great presentation on the background, goals, and status of the Hydra project.    Hydra and Islandora stand as wonderful complements to each other.  Islandora, written in PHP, takes Drupal as the starting point from which it reaches into the wide open space of a Fedora repository.  Hydra takes nearly the opposite approach.  Hydra does not assume an overarching system as its operating context.  Instead it builds outwards from ActiveFedora, which in turn builds upon Fedora’s internal flexibilities and strengths.   Where Islandora functions as a component that you plug into Drupal, Hydra apps are free-standing solutions that in turn rely on a core whose functionality can be integrated into potentially any system.  One is a stalactite, the other is a stalagmite.  Both have grown naturally out of real-world needs.

FeSL

Projects seeking to adopt Fedora right now have great options before them.  Different technologies will suit different projects while the overall vision and best practices pollinate across solutions.  The best example of this cross pollination is the FeSL Project, which Islandora and Hydra are both participating in along with MediaShelf and DuraSpace.  This effort will result in a complete replacement of Fedora’s security implementation so that it can be used more effectively and more flexibly by any of the client applications we write — Hydra, ActiveFedora, Islandora, Muradora, or otherwise.  We are still seeking additional community funding for FeSL.  For more info, see the Fedora Enhanced Security Layer page on the Fedora wiki.

Workflow

One of the more exciting themes for me this year was workflow for Fedora Repositories.  I’ve been actively interested in the topic since Richard Green and Chris Awre presented RepoMMan at OpenRepositories in 2007.  Despite my interest, I have tread softly in that realm because the topic, and particularly technologies like BPEL, have always seemed dangerously askew from the actual problem(s) at hand.  I’ve always felt that we were framing the entire problem incorrectly.  In the past nine months, that exact line of thinking has come to a head, leading people like the Hydra project (including the former RepoMMan team) to explore a variety of approaches.  This resulted in a well-received presentation at OpenRepositories this year in Atlanta, which Thorny presented again for the RIRI participants this week.

The topic of workflow finally congealed for me in an actionable way here at RIRI when Mark Leggott started talking about supporting Taverna in the Islandora VREs.  Taverna is one of a handful of “workflow engines” designed to allow scientists to chain together batches of computational operations that they want to perform on data in their labs.  From an architectural perspective, Taverna isn’t all that different from BPEL tools.  The difference is that Taverna and its ilk are designed with the assumption that the scientist who owns the data will create the workflows.  Consider this in contrast to the idea of a developer or repository manager creating workflows on behalf of the end users.  This simple assumption, that workflows will be constructed and run by the people who own/create the underlying content, leads the technology down a very different path of development.  Until now, we have mostly thought of workflows as services that a repository provides for end users.  It’s time for us to flip that around and instead think of ways to expose repositories and their supporting services as nodes that end users can tie into their own workflows as they see fit.  This approach has the ring of good engineering.  It’s a simpler, loosely-coupled, user-centric solution to the problem.  It has the additional benefit of putting the repository engineer alongside the content owner/creator while they both create and share workflows to perform their corresponding tasks.

Solutions Integration Community

RIRI has also brought focus back onto the utility of the Solutions Integration Community that we have been gradually building this past year.  I’ve now set up a fedora-solutions google group.  Join up if you want to be a part of the stuff described on the solutions integration community page on the Fedora wiki.

A Great Year to Come

Over the next 12 months we are going to see a brilliant array of advances around Fedora Repositories.  MediaShelf will certainly touch many parts of those advances.  We’re here to help you make your mark in this space, and we’re here to make Fedora work for your users.  Contact us if you want to put MediaShelf to work on your repository efforts.

Agile Languages & Fedora — Update from OR09

May 20th, 2009
Written by Matt Zumwalt

Leading up to this year’s Open Repositories, it became clear that there was demand for a BOF (Birds of Feather) session focused on agile languages and Fedora.  I pitched the idea in an email to a couple colleagues beforehand and then announced the BOF at my presentation on Monday morning.  Rather than constricting it to Fedora projects, I billed it as Agile Languages and Repositories.  About 30 people showed up.  The split was pretty even between Ruby, Python, and PHP developers.  About a third seemed to be Java developers in the process of defecting.  In addition to people doing stuff with Fedora, there were a handful of DSpace developers and possibly a couple who maintain ePrints repositories.  

For the first half of the BOF we sat in mixed groups, eating our lunches and each talking about the work we do.  We then split up by language (Ruby, Python, PHP) and discussed language-specific topics.  For that second half I sat at the Ruby table where we talked about ActiveFedora, JRuby, RDF support for Ruby, MODS support for Ruby, Solr (solr-ruby and RSolr), and how Blacklight fits into the mix. 

I closed the conversation by asking if we should set up email lists for collaboration.  It seemed reasonable to set up a general mailing list for the solutions community as well as a list specifically for people doing stuff with Ruby, Fedora repositories, and (most likely) ActiveFedora.  I also resolved to encourage the creation of Python-oriented and PHP-oriented equivalents.  For now I have created two lists on Google Groups.  The first one, Fedora Commons Create, is for general discourse about creating client applications for Fedora.  The second, ActiveFedora / Ruby + Fedora Commons, is for Ruby-specific collaboration.

In the end, I was really pleased to realize that for the first time we had a substantial group of people interested in each of the main interpreted languages (Ruby, Python, PHP) and each group had at least one open source Fedora-based project to use as a starting point for their conversations.  The Ruby group had ActiveFedora, the Python group had Ben O’Steen’s work and Peter Herndon’s Django integration, and the PHP/Drupal people had Islandora & Fez to start from. 

This was a comfortable step forward from the scenario as it was a year ago.

Google Groups
Fedora Commons Create
Visit this group
Google Groups
ActiveFedora / Ruby + Fedora Commons
Visit this group

Trainspotting as an explanation of the Semantic Web

May 9th, 2009
Written by Matt Zumwalt

I just came across this post on Russell Davies’ blog titled I like things to be numbered.  It’s an extract from an episode of the BBC Radio Show Museum of Curiosity. In it, a railway enthusiast named Chris Donald explains the beauty that trainspotters find in the fact that the railway companies assign numbers to absolutely everything, including clocks. [listen to the mp3]

Listening to Chris Donald speak, I couldn’t resist seeing the connection to Linked Data and the Semantic Web.  He nails the key concepts of the beauty and the loaded possibilities that come from being able to trace the connections between things.  The three-minute account even draws out the aspect of Semantic Web that tends to make people squeamish:

[...] I like things to be numbered.  I don’t know why; I just do.  The idea that every bridge had a number attached to it appeals to me and it appeals to a lot of people. [...]  It’s all quantifiable.  They know how many trains there are because they’re all numbered.  They have a book with all the numbers in it.  It’s all very controlled and they can understand it and it’s very two-dimensional. [...] with trains - you stand on the platform and you look at the track and you know that that metal bit of track on the floor is touching every train that you’re looking for and you understand that it’s a puzzle that can be solved.

I frequently find myself trying to adequately characterize the distinction between Semantic Web and Linked Data.  Is it just a re-branding of the concepts?  Is it an offshoot of the greater phenomenon?  In this little account by an avid trainspotter, I see a wonderful way to point out the distinction.  

The past 15 years of noise about Semantic Web have had the ring of this trainspotter’s “I like things to be numbered [...]  It’s all very controlled and [I] can understand it. [...] it’s a puzzle that can be solved.”  While there is nothing wrong with this per-se, it is only going to motivate certain types of people.  Further, it lends itself to visions of grandeur that quickly wander into a quagmire of failed logic and, to be honest, treads close to the intellectual foundations of fascism.  

Meanwhile, the burgeoning Linked Data movement is much more akin to railroad engineers saying “Well, we numbered everything out of necessity.  Might as well let the rest of the world make sense of those connections too.  Who knows what they’ll get out of it, but it certainly doesn’t hurt us to share the data.”

There’s one key place where I wish to differ with Mr. Donald, and I think a lot of Linked Data people will agree.  He describes this world of connections as being very orderly, controlled, and two-dimensional.  He says this because he is only looking at a single set of data from a single perspective.  As soon as you open your eyes to the growing cloud of linked open data, the landscape becomes much more akin to a wilderness, or possibly a garden, where the surface may seem simple and pretty while the world underneath is thriving with the complex, messy stuff of life.

Ads deserve permalinks too

April 27th, 2009
Written by Matt Zumwalt

Twice already this weekend I have wanted to reference an advertisement in a blog post and been unable to link to that ad.  I do all of my television watching online, usually watching on sites like Hulu that provide permalinks for shows, episodes, comments, etc.  However, I have yet to find a permalink for any of the advertisements on those sites.   In our media-conscious world, we are almost as likely to discuss an advert as we are to discuss an episode of a television show or any other video content that you’ve attached the ad to.   Let’s say the mantra: If it’s valuable, it deserves a permalink.  If people might discuss it, it deserves a permalink.

As our modes of media consumption change, advertisers are being forced to adapt.  The most effective adapters have taken to creating ad content that is independently valuable in the eyes of their target consumers.  This is a great thing and it should be encouraged.  We should and do reward innovative advertisers by talking about and sharing their ads.  This would be so much easer to do if they gave us permalinks to use.

You may ask how this is different from viral video advertising, or you may point out that many ads (especially funny ones) already show up on YouTube.  Well, there is certainly a strong connection, but there is an important difference in attitude.  The notion of viral videos has built up in the venue of YouTube and social networks.  It carries a connotation of low production values, simplistic themes, and playing to the finicky, meme-obsessed banality of the crowd. The stuff of viral videos gets dumped into the swamp that is YouTube (or one of its clones) and is left to fester.  Those who watch the videos are treated like little more than flies.  I, the consumer, am reduced to a tick on the view count and possibly a comment on the generic video viewer page.  By allowing a YouTube URL to be the defacto identifier of your content, you’re basically conceding that the content doesn’t deserve to be treated with any distinction.

Our cultural shortsightedness regarding the web, its possibilities, and its future usage is comical.  

Note: In the sense that I’m using it here, permalink == URI (Universal Resource Identifier).   Yes, this post is basically an argument that ads, like everything else, should be treated as nodes in the semantic web (aka Linked Data Cloud).

Account from OGF25 Repositories Workshop: Creating a Repository Standard?

March 20th, 2009
Written by matt

04 March 2009

Catania, Sicily, Italy

Open Grid Forum 25th Conference (OGF25)

 

It’s not entirely clear when I figured out that I was sitting on a standards body panel discussing the creation of a digital repository related standard.  I’m pretty sure it finally clicked sometime after the session was over, once I had consumed a couple glasses of wine.

 

I still don’t see what I contributed to the conversation, though the other participants assured me that my comments were useful.  The experience reminds me of my friend, let’s call him Josh, a community organizer who was recently pulled onto one of the Obama administration’s advisory panels.  Shortly after joining the advisory panel, Josh confessed that at the end of most calls he has to follow up with a friend and ask “Ok. So what exactly did we just decide and who is responsible for doing what?”

 

The panel discussion started by making observations that we’re all familiar with:

  • the importance, and associated challenges, of unique identifiers and persistent URIs
  • search, retrieval, and management are separate concerns, each with appropriate standards associated (ie. SWORD, OpenSearch, etc.)
  • cloud computing is very different from cloud storage

After this, I quickly found myself in completely unfamiliar waters when conversation abruptly turned to the creation of a standard  for digital repositories. I thought “Pshaw.  We don’t need Yet Another Standard.  Where did this come from?”  In fact, the whole field of repositories is so new that the prospect of a repository standard seems absurdly premature to me.  Discussion on the panel honed in on the two obvious contenders for a standard: 1) metadata requirements, and 2) functionality profiles (a list of features necessary in order for a repository system to be deemed compliant and interoperable).  From my perspective, repositories already swim in a glut of metadata standards (as well as non-standard, ad-hoc metadata) and, by nature, must embrace heterogeneous metadata.  The second notion, that of functionality profiles, sounds like something that few will read and none will understand.  To be honest, the entire discussion confused me.  I did my best to contribute to the discussion where I could.

 

After the workshop ended, I had a chance to catch my breath and discuss the panel with a couple of people.  Eventually, I came to look at the whole scenario from a different perspective and had a mild change of heart.  In a discussion with Neil Chue Hong, a very smart guy from Edinburgh, I started thinking about all the informal conclusions that frame discussions between developers at conferences like Dev8D, Code4Lib and RepoCamp.  I then thought about all the little architectural wins and failures that I see in software like Flickr, YouTube, Hulu and (woah) ABC’s full episode player.  After all, these are repositories too. Within a few moments of pondering, an initial list of obvious basic guidelines shone through quite clearly.

  • give permanent, unique URIs to all content you expose, even if you intend to limit access to that content based on geography or time of access
  • support linking with versioning or datetime info
  • expose a RESTful API
  • give preference to AtomPub
  • consider ORE when you need to express aggregations of data
  • provide linked data (RDF) endpoints

Some open topics also seem like obvious fodder for discussion:

  • what query language(s) to use in search APIs
  • navigating the difference between standards and interoperability
  • leveraging standards where possible (ie SWORD)

These are merely the things that seem obvious to me right away.  What would happen if we got the really smart people talking in this vein?  

 

I think this warrants further exploration and, strange though it is, I expect that the outputs of such exploration might resemble the stuff of standards bodies (be it a recommendation, a community document, or a standard).  Possibly I have been infected with that odd standards-wonk bug, or possibly I’m just catching up with the rest of the world in acknowledging the inevitable.

Session Hopping, LinkedData, and Data APIs at OGF25 in Sicily

March 20th, 2009
Written by matt

04 March 2009

Catania, Sicily, Italy

Open Grid Forum 25th Conference (OGF25)

 

[Note: I'm posting my backlog of updates from the past 2 months of travel.  An update specifically about the OGF Repositories Workshop will follow shortly]

 

I made it to the conference center in Catania, Sicily a few hours before the OGF Repositories Workshop.  Immediately upon arriving I met Nick Ferguson, coordinator of the workshop, and had a nice chat with Neil Chue Hong about repositories, ORE, and grid computing vs. cloud computing.   After that I was left to kill time until the workshop by sitting in on one of the OGF sessions.  At first, I stepped into what I thought was the Earth Sciences session, but it turned out to be the Computational Chemistry session and went way over my head.  I then passed through a handful of other random presentations before settling on a room where about 30 people were having a discussion about XQuery.

 

I soon discerned that this group was hammering out the spec for some sort of standard data systems interface.  When I arrived, they had been debating the strengths and demerits of XPath/XQuery vs. SQL as a query language.  The converation quickly stumbled into the pit of interoperability hell.  Standard interjections abounded: “Some implementations won’t have that data to return…”, “you will have to expose user info in order to support that…” mumble mumble “… we didn’t do it that way because one unnamed vendor couldn’t support it…”  I nearly laughed out loud when an attendee from the back of the room interrupted the discussion declaring “But in most situations, you should only be returning items owned by the current user.” 

 

I still had no idea what data they were attempting to expose.  (I later learned that it was the RUS-WG, who are defining a standard interface for retrieving job usage records … Obscure indeed.)    The 90-minute discussion ended up having nearly nothing to do with the actual data these people want to work with.  Instead, the conversation was entirely dominated by the travails of navigating the strange space of Data API design.

 

Meanwhile, serendipitously, I was using this downtime (and the conference wifi access) to finally read George Thomas’s slides about recovery.gov publishing open data.  Though I missed the presentation, the slides spell out the project’s intentions pretty clearly.  They’re full of references to REST, ATOM, RDFa and the LOD cloud.  I experienced such a fascinating contrast between the exposition before my eyes and the discussion filtering in through my ears.  In particular, one of Thomas’s slides jumped out at me.  The slide, titled “Follow the dollar, not the person”, showed a semantic model for users, user groups, and posts in a bulletin-board style Community Forum system.  It was totally readable, totally understandable, precise, flexible, and using an ontology that lends itself to re-use.  

 

Over the past year, I have satirically placed a golden halo above “linked data” in my mind.  As I sat in the RUS-WG session, light fell upon that halo and it glowed.

 

This experience, as well as consequent discussions at OGF, has left me with a distinct sense that there’s a pattern here.  We are all, of our own accord and in our own little techno-fiefdoms, attempting to do the same things and running into the same challenges.  I think that the previously obscure field of digital repositories has valuable perspective to provide and many pieces of wisdom to share in this domain.  I hope to see more public discourse about these topics, and I know who to start prodding to speak up.  Watch this space.

 

 

Post Script:

 

The morning following my OGF session-hopping experience, I realized that the track I had passed over, innocuously titled “HEP”, was a meeting of the High Energy Physics community.  In particular, it was primarily a discussion about how they are going to handle processing the data outputs from the LHC experiments when they fire up the collider later this year.  /me kicks himself for missing this.

Even the NYTimes is Noticing DAM

February 10th, 2009
Written by Matt Zumwalt

Following from last week’s post about the Snarkmarket Book Project, here’s even stiffer evidence of the sudden increase in mainstream attention that real content management has garnered.  In Digital Archivists, Now in Demand, the New York Times Jobs section talks about our “nascent” discipline of Digital Asset Management and discusses the career possibilities in the field.

A friend of mine sent me a link to the article asking “Is this the kind of thing your company works on? Sounds interesting.” It will be really amusing when everyone talks about this stuff like they have always dealt with it.

… in Honor of SOA

February 4th, 2009
Written by matt

In a clever marketing move, The Burton Group have held a wake in honor of Service Oriented Architecture (SOA).  They’ve also set up one of those cute custom shortened URLs: http://tinyurl.com/SOAWake. From the announcement:

[...] It’s time to declare that SOA is dead and move on to more the practical matter of bringing up its offspring: Services.

This great find was brought to our attention by Ben O’Steen’s twitter feed.

The Wave Builds: Thinkers beyond the library world suddenly start talking about digital curation.

February 4th, 2009
Written by matt

To give you a sense of the sudden traction that our area of expertise has deservedly gained, check out the Snarkmarket Book Project which was posted only yesterday. It has already garnered over 100 pitches of subject matter for the “New Liberal Arts” and more than a third of them concern Digital Curation and/or Internet Archivists.

The Librarian Avengers in the crowd will especially relish this comment by Matt Thompson:

“Library science” is a fusty old term that increasingly fails to fit an ever-expanding and ever-more-important range of skills. “Knowledge management”is weighed down by the awful word “management.” In Matt University, we’d rebrand it “knowledge mastery” or something similarly grandiose. After all, this is becoming critical. How do we capture, structure, sift and preserve enormous bodies of information?