Archive for the ‘conferences’ Category

Account from OGF25 Repositories Workshop: Creating a Repository Standard?

Friday, March 20th, 2009

04 March 2009

Catania, Sicily, Italy

Open Grid Forum 25th Conference (OGF25)

 

It’s not entirely clear when I figured out that I was sitting on a standards body panel discussing the creation of a digital repository related standard.  I’m pretty sure it finally clicked sometime after the session was over, once I had consumed a couple glasses of wine.

 

I still don’t see what I contributed to the conversation, though the other participants assured me that my comments were useful.  The experience reminds me of my friend, let’s call him Josh, a community organizer who was recently pulled onto one of the Obama administration’s advisory panels.  Shortly after joining the advisory panel, Josh confessed that at the end of most calls he has to follow up with a friend and ask “Ok. So what exactly did we just decide and who is responsible for doing what?”

 

The panel discussion started by making observations that we’re all familiar with:

  • the importance, and associated challenges, of unique identifiers and persistent URIs
  • search, retrieval, and management are separate concerns, each with appropriate standards associated (ie. SWORD, OpenSearch, etc.)
  • cloud computing is very different from cloud storage

After this, I quickly found myself in completely unfamiliar waters when conversation abruptly turned to the creation of a standard  for digital repositories. I thought “Pshaw.  We don’t need Yet Another Standard.  Where did this come from?”  In fact, the whole field of repositories is so new that the prospect of a repository standard seems absurdly premature to me.  Discussion on the panel honed in on the two obvious contenders for a standard: 1) metadata requirements, and 2) functionality profiles (a list of features necessary in order for a repository system to be deemed compliant and interoperable).  From my perspective, repositories already swim in a glut of metadata standards (as well as non-standard, ad-hoc metadata) and, by nature, must embrace heterogeneous metadata.  The second notion, that of functionality profiles, sounds like something that few will read and none will understand.  To be honest, the entire discussion confused me.  I did my best to contribute to the discussion where I could.

 

After the workshop ended, I had a chance to catch my breath and discuss the panel with a couple of people.  Eventually, I came to look at the whole scenario from a different perspective and had a mild change of heart.  In a discussion with Neil Chue Hong, a very smart guy from Edinburgh, I started thinking about all the informal conclusions that frame discussions between developers at conferences like Dev8D, Code4Lib and RepoCamp.  I then thought about all the little architectural wins and failures that I see in software like Flickr, YouTube, Hulu and (woah) ABC’s full episode player.  After all, these are repositories too. Within a few moments of pondering, an initial list of obvious basic guidelines shone through quite clearly.

  • give permanent, unique URIs to all content you expose, even if you intend to limit access to that content based on geography or time of access
  • support linking with versioning or datetime info
  • expose a RESTful API
  • give preference to AtomPub
  • consider ORE when you need to express aggregations of data
  • provide linked data (RDF) endpoints

Some open topics also seem like obvious fodder for discussion:

  • what query language(s) to use in search APIs
  • navigating the difference between standards and interoperability
  • leveraging standards where possible (ie SWORD)

These are merely the things that seem obvious to me right away.  What would happen if we got the really smart people talking in this vein?  

 

I think this warrants further exploration and, strange though it is, I expect that the outputs of such exploration might resemble the stuff of standards bodies (be it a recommendation, a community document, or a standard).  Possibly I have been infected with that odd standards-wonk bug, or possibly I’m just catching up with the rest of the world in acknowledging the inevitable.

Session Hopping, LinkedData, and Data APIs at OGF25 in Sicily

Friday, March 20th, 2009

04 March 2009

Catania, Sicily, Italy

Open Grid Forum 25th Conference (OGF25)

 

[Note: I'm posting my backlog of updates from the past 2 months of travel.  An update specifically about the OGF Repositories Workshop will follow shortly]

 

I made it to the conference center in Catania, Sicily a few hours before the OGF Repositories Workshop.  Immediately upon arriving I met Nick Ferguson, coordinator of the workshop, and had a nice chat with Neil Chue Hong about repositories, ORE, and grid computing vs. cloud computing.   After that I was left to kill time until the workshop by sitting in on one of the OGF sessions.  At first, I stepped into what I thought was the Earth Sciences session, but it turned out to be the Computational Chemistry session and went way over my head.  I then passed through a handful of other random presentations before settling on a room where about 30 people were having a discussion about XQuery.

 

I soon discerned that this group was hammering out the spec for some sort of standard data systems interface.  When I arrived, they had been debating the strengths and demerits of XPath/XQuery vs. SQL as a query language.  The converation quickly stumbled into the pit of interoperability hell.  Standard interjections abounded: “Some implementations won’t have that data to return…”, “you will have to expose user info in order to support that…” mumble mumble “… we didn’t do it that way because one unnamed vendor couldn’t support it…”  I nearly laughed out loud when an attendee from the back of the room interrupted the discussion declaring “But in most situations, you should only be returning items owned by the current user.” 

 

I still had no idea what data they were attempting to expose.  (I later learned that it was the RUS-WG, who are defining a standard interface for retrieving job usage records … Obscure indeed.)    The 90-minute discussion ended up having nearly nothing to do with the actual data these people want to work with.  Instead, the conversation was entirely dominated by the travails of navigating the strange space of Data API design.

 

Meanwhile, serendipitously, I was using this downtime (and the conference wifi access) to finally read George Thomas’s slides about recovery.gov publishing open data.  Though I missed the presentation, the slides spell out the project’s intentions pretty clearly.  They’re full of references to REST, ATOM, RDFa and the LOD cloud.  I experienced such a fascinating contrast between the exposition before my eyes and the discussion filtering in through my ears.  In particular, one of Thomas’s slides jumped out at me.  The slide, titled “Follow the dollar, not the person”, showed a semantic model for users, user groups, and posts in a bulletin-board style Community Forum system.  It was totally readable, totally understandable, precise, flexible, and using an ontology that lends itself to re-use.  

 

Over the past year, I have satirically placed a golden halo above “linked data” in my mind.  As I sat in the RUS-WG session, light fell upon that halo and it glowed.

 

This experience, as well as consequent discussions at OGF, has left me with a distinct sense that there’s a pattern here.  We are all, of our own accord and in our own little techno-fiefdoms, attempting to do the same things and running into the same challenges.  I think that the previously obscure field of digital repositories has valuable perspective to provide and many pieces of wisdom to share in this domain.  I hope to see more public discourse about these topics, and I know who to start prodding to speak up.  Watch this space.

 

 

Post Script:

 

The morning following my OGF session-hopping experience, I realized that the track I had passed over, innocuously titled “HEP”, was a meeting of the High Energy Physics community.  In particular, it was primarily a discussion about how they are going to handle processing the data outputs from the LHC experiments when they fire up the collider later this year.  /me kicks himself for missing this.

RIRI Day Two: Richard Green on Institutional Repositories

Tuesday, August 12th, 2008

At the moment, I’m witnessing Richard Green from University of Hull masterfully dissecting the notion of an Institutional Repository.  Its a treat to have someone spell this stuff out step by step from such a grounded perspective.  One wonderful element of his presentation was to simply leave some time for people to explore ePrints and DSpace repositories [1][2][3] (from the perspective of public end users).  He made the point that people, myself included, often work with only one repository system (or no repository system) and neglect to simply explore the existing options.

In the midst of his presentation about the RepoMMan project, Richard posed an interesting pair of questions regarding the prospect of giving users a private “My Repository” space for managing their stuff.  He asked us:

  1. What might a user want to get from “My Repository”?
  2. What might a user want to put into “My Repository”?

He allowed the room to ponder these questions for a while.  I must admit that I was left doubting my knee-jerk responses and in turn thinking a bit further about what users really want from systems like this.  Richard then reported that a survey of his users at University of Hull provided a resounding response.  His users wanted:  Storage (safe, backed up), Access (easy and from anywhere), Management (full version control), and Preservation (to know stuff is there when they want it, short and long term).  I found this to be much more straight forward than the responses I expected.

Richard then gave us a tour of the RepoMMan interface.  Some key characteristics of the systems are the fact that the web interface, which is implemented in Flex, mimics an FTP client (to provide familiarity) and the metadata editor uses Data Fountains to pre-populate objects with automatically generated metadata so that users can then review and revise existing metadata rather than starting from a blank form.

The presentation will continue this afternoon.  By the end of the week, Richard’s full slide deck for the presentation will be up in the RIRI repository.

At RIRI: The Red Island Repository Institute fires up

Tuesday, August 12th, 2008

The Red Island Repository Institute (RIRI), hosted by the University of Prince Edward Island (UPEI) has started with a bang.  Sandy Payette spent an entire day feeding the room with a wonderful mix of vision, software architecture, social context, and technical details.

Mark Leggott has put together a great event. There are people here from all over North America, and even one visitor from Australia.  Everyone has been enjoying the beautiful environs of Prince Edward Island and the quality of information being exchanged is top notch.  I particularly like the fact that Mark is “drinking his own kool-aid” by setting up a Drupal/Fedora site for the institute.

This should be a great week.

thinking about developer happiness at JA-SIG

Monday, April 28th, 2008

Five years ago developers spent a lot of time speaking SQL when they talked about writing a database-driven app. Since then, we have enjoyed the arrival of modern webapp frameworks with good ORM. Now developers spend very little time talking about SQL. Instead, they talk about higher level problems and application-specific challenges. In other words, we are able to spend developer resources in more potent ways. This has played a major role in the recent upsurge of innovative, user-driven apps.

Right now I’m sitting in Christopher Brown’s JA-SIG presentation about writing a Fedora App in ColdFusion. Christopher has done valiant work. He’s a trailblazer. More importantly, he has a functioning application that is now in active use. However, I can’t help but feel like we’ve backpedaled five years in terms of developer experience. Christopher’s slides are dominated by Fedora-specific structures and the terminology from Fedora’s APIs. I feel like I’m back in SQL land. Being forced to think about this boilerplate code is an unnecessary burden for developers. It prevents them from fully taking advantage of Fedora’s power.

Now that we’ve had RubyFedora in hand for a few weeks and have been playing with ActiveFedora for a while, it’s really encouraging to be reminded what the alternative is. I’m so eager to set free developers like Christopher, to let them forget about the boilerplate code, so that instead they can invent new ways of helping users do crazy stuff with their digital content.