Archive for the ‘Matt's Adventures and Musings’ Category

Final Videos from Code4Lib 2010

Sunday, February 28th, 2010

I’ve posted my last videos to the Code4Lib & Dev8D 2010 channel on Vimeo. (NB: the Dev8D responses are all on YouTube, not Vimeo).  We only had a half day on Thursday but I managed to run around and gather enough video for one montage of hello messages and three mini-interviews.

I convinced Thursday’s emcee to let me conclude the morning announcements by showing Dev8D’s video saying hello to us and recording a quick video of the whole conference saying “Hi Dev8D. How’s the weather in London?”.  Lesson learned: when recording 200 people saying hello, keep the microphone away from your mouth.

A snowstorm was threatening to disrupt everyone’s travel plans so we spread to the winds pretty swiftly once the figurative gong rang at noon, but I managed to grab lunch at Asheville’s infamous 12 Bones Smokehouse with some friends before departing.

Code4Lib 2010 Last Day from Matt Zumwalt on Vimeo.

Mark Matienzo at Code4Lib 2010 Recommends “Ask Anything” from Matt Zumwalt on Vimeo.

Matt Cordial at Code4Lib 2010 - Lightweight Tools and Robust Testing from Matt Zumwalt on Vimeo.

A Little Blacklight Lovefest at Code4Lib 2010 from Matt Zumwalt on Vimeo.

Code4Lib 2010 Day 2: Afternoon Session

Thursday, February 25th, 2010

It’s extremely gratifying to attend a conference where super smart presenters don’t shy away from showing real code.  It’s even better to be at a conference where the presenters manage to show complex code while making it understandable even to people who don’t use the related systems.  Every presenter today has managed to do that really well.

This is what happens when smart people people come together, communicate often (year round), and respect each other.

Building a Better Advanced Search

Naomi Dushay & Jessie Keck’s presentation on building a better advanced search for SearchWorks is a key example of this unabashedly technical yet clear and concise communication.  They opened with a clear presentation of the real-world situation, including usability goals, etc.  They followed this with a breakdown of the desirable features and the possible ways to implement them.  Naomi then plunged into seriously complex solr config details that she managed to make intelligible step-by-step.  Hint: she used screenshots of the code with the salient items circled.

Drupal 7: A more powerful platform for building library applications

Cary Gordon from the Cherry Hill Company showed us the new features in Drupal 7.  I expected to be bored, since I don’t use Drupal and can’t stand the sight of PHP code. Instead, I was pleasantly impressed and intrigued by the features Cary walked through.  I keep hearing that Drupal has matured substantially in recent years, but I didn’t believe it until now.

Enhancing Discoverability With Virtual Shelf Browse

Andreas Orphanides, Cory Lown, and Emily Lynema from NCSU showed us their snazzy virtual bookshelf with an “infinite shelf” allowing you to scroll through book covers five at a time ad infinitum.  One wonderful idea is the use of “faux covers”, where they generated something that looks like a real book cover whenever a real cover image couldn’t be retrieved for a search result.

As with almost any Code4Lib presentation, this would have been intriguing if they only gave a demo of the software but they didn’t stop there.  80% of the time was spent explaining how they did it, what technologies they used, how they thought about the problem, what worked well and what went wrong.  Gosh I love hackers who share.

How to Implement A Virtual Bookshelf With Solr

In their third presentation of the day (remember, presentations were chosen by open popular vote), Naomi Dushay and Jessie Keck showed us their work around implementing the much-desired support for browsing through digital collections using a virtual shelf organized by call number.  This feature sounds simple, but the implementation proved more challenging.  To start with, the librarians couldn’t come to a consensus about what constitutes “correct” ordering.  Solution: ignore the arguing librarians and go straight to the users!

The real challenge lay in sorting out 8 million call numbers from numerous libraries using a variety of call number systems (ie. LC, Dewey, SUDOC, Thesis, etc.)  Who knew that call numbers were so complex? Not I, until now.

I knew these guys were good - I’ve collaborated with them, I’ve seen their code - but wow.  I didn’t realize how remarkably good they were until this third presentation.

Lightning Talks & Breakout Sessions

The day closed with 13 Lightning Talks followed by 6 Breakout Sessions.

Lightning Talks:

  • LibX Update - Godmar Back
  • How to build a Virtual Bookshelf Without Solr (or MySQL) - Maccabee Levine
  • VIVO, an interdisciplinary national network - Paul Albert
  • WolfWalk, two ways - Jason Casden
  • Custom metasearch widgets - Alex Smith
  • Node.js development - Gabriel Farrell
  • Catalog Auto-suggest using SOLR - Jill Sexton
  • EmeraldView, a PHP frontend for Greenstone - Yitzchak Schaffer
  • Faceted browse on the cheap - Tom Keays
  • EAD, APIs, and Cooliris: providing access to digitized archival materials. - Tim Shearer
  • Kill the Search Button - Michael Nielsen, Jørn Thøgersen [facilitated by Roy Tennant]
  • You Heard It Here First… - Roy Tennant
  • File Information Tool Set (FITS) - Spencer McEwen

Breakout Sessions:

  • Mobile Dev
  • Blacklight - Bess Sadler
  • Tools4Lib - the best tools used in your job - Devon Smith/decasm
  • Let’s Link Our Data - If you have data, metadata, vocabularies, or just about anything else you want to link, show up and we’ll figure out what to do with it! - Ryan Scherle/ryscher
  • Homebrew - building your own instead of using Fedora/DSpace/Blacklight/whatever - Esmé Cowles/escowles
  • API queries to vocabularies - Ya’aqov Ziso

Code4Lib Day 2: Closing out the Morning

Wednesday, February 24th, 2010

Media, Blacklight, and Viewers Like You

Chris Beer from WGBH (Boston’s Public Television; producers of Nova, Julia Child, etc.) walked us through the inner workings of the new WGBH website that runs on Blacklight, Fedora, and LightHTTPD.  Without getting bogged down in the details, he touched on many of the technical and organizational challenges they faced and how they addressed them.  Some of the technologies in the mix

  • Blacklight
  • Fedora
  • PBCore: Metadata Standard for Media based on Dublin Core
  • ODRL: Open Digital Rights Language
  • JQuery

Chris did some amazing things with JQuery and FlowPlayer in his customized Blacklight views.  One example: You can browse to a point in a video by clicking in the transcript. If you then play the video, the transcript automatically scrolls with it — accurately.  This was made possible by the fact that WGBH already has detailed timecodes embedded in their transcripts.

The data model for a single episode in the WGBH archive fills an entire page — raw video, edited video, music, images, post-processing, transcripts, etc. with rights and permissions varying for each part.  This highlights the grace of Blacklight, which allowed him to simply leave that system in place and merely expose the public parts with a slick, faceted search & discovery interface.

One detail that Chris called out was the fact that Fedora has trouble with large (5GB+) datastreams.  He pulled up the corresponding (unresolved) tickets in Fedora’s Jira to establish the fact that this is a long-known unresolved problem.  He then added that WGBH’s own internal (proprietary) DAM system has the same problems.  At least with open source software, addressing a bug like this is a matter of public process and community initiative rather than being subject to the whims of a single organization.

WGBH will soon be launching a number of public websites based on this Fedora+Blacklight combination.

Becoming Truly Innovative: Migrating from Millennium to Koha

Ian Walls from NYU gave a humorous account of finally ditching proprietary Integrated Library Systems, which he represented as battling kittens, in favor of the open source ILS called Koha.  He went through the details of how they handled the challenge of migrating their data from iii (the proprietary ILS) to Koha and closed by telling the room that the process he used should work for anyone with a similar proprietary install.  The NYU migration took a total of 3-4 months without any all-nighters.

All of the code he used is available at contribs.koha.org. I don’t personally work with ILS systems, so I can’t say I have a solid read on the impact of this information, but I got the distinct impression that this was very encouraging news for a many of the people in the room.  When someone from the audience asked for a show of hands “How many people would like to migrate away from iii?” more than a third of the room raised their hands.

Ask Anything! (aka. Human Search Engine)

I know the Dev8D people will dig this one.

Dan Chudnov from Library of Congress facilitated an open-ended session where anyone with a question/request/missed connection could grab a mic and announce their desire.  In short, the “If there’s anything that you want to ask everybody, now is your chance to speak up.” The core idea was to give people an opportunity to tap the collective knowledge of the room.

The process was to hear a question, identify people who want to respond, and then move on to the next question.  The full-room discussion was kept to a minimum.  Here are most of the questions asked (I missed a few).

  • Switching from Perl: What language should I switch to?
  • How do I make an API accessible only from specific domains
  • Who is using my library (pymarc) and how are you using it?  Please tell me so I can make it better.
  • How do I extract files from Internet Archive .arc format?
  • Is anyone interested in helping my organization port a custom staff app from Millenium ILS to Aleph ILS?
  • Is there any interest in a 1-2 day Blacklight meeting?
  • Has anyone used the California Digital Library digital curation microservices?
  • Please tell us: What do Librarians (non-techies) need to know about software development?
  • What’s the best way to do complex log analysis? (answer: splunk)
  • I want to allow people to add data to my site without maintaining user info for them?  I want to use something like OAuth but it has to work with something like curl as the client? (recommendation: RPX)
  • How do I model dates & date ranges in RDF?
  • Proposal for a breakout session on building homebrew digital repositories (instead of Fedora, DSpace, etc)
  • Request for 10 minute tutorial on 3D graphics programming
  • Anyone interested in modeling archival description in RDF? If so, join http://groups.google.com/group/semantic-archives
  • Is anyone using OpenCalais module for Drupal?
  • Is anyone here going to be working on the OLE project?
  • I’ve created software for hiding borrower information in library systems, but how can I share the code without making it vulnerable to “the man”?
  • Is anyone doing work with HTML5? Especially geodata services, etc?
  • Are there any libraries out there using Plone for their website?
  • Who else is developing Facebook apps for their libraries?  Any ideas how to make a useful Facebook app for a library?
  • Is anyone using MARCLogic or eXist in production systems?
  • Since this seems to be really working, Should we start something like a Stackoverflow for library hackers? (answer: Yes.  Awesome idea.)
  • We’re working on search & discovery with non-roman scripts in a solr/lucene context.  Please provide suggestions.
  • Is anyone working on natural language processing implementations for machine learning?
  • We’re moving all of our production systems to cloud environments.  Has anyone else done that?
  • Is there a JSON library for MARC? (answer: it’s built into evergreen, OCLC is also looking into it.)

The verdict on Ask Anything: resounding applause, many smiles.

Surprise Trivia

Alex O’Neil from University of Prince Edward Island emcee’d a very fun round of trivia before we adjourned for lunch.  It was wicked fun.

Code4Lib 2010 - Day 2 Opening Presentations: Being Smart about How You Work

Wednesday, February 24th, 2010

Day Two of Code4Lib opened with three great presentations on ways to be smarter about how you approach application development.  This was such a strong set of particularly relevant presentations that I’m dedicating a whole post to it.  Coincidentally, all three presentations were lead by women.

For those who were not in attendance, I strongly recommend seeking out the slides (and the video) of these presentations.  They were all engaging, concise and informative reviews of topics that are important for any developer who wants to do their job with professionalism, confidence and agility.

Iterative Development Done Simply

Emily Lynema from North Carolina State University gave a rapid-fire account of agile development in the real world.  Two years ago, at a conference like this the presentation(s) about agile development tended to be spoken in the abstract - “Did you realize we could work like this?”.  Now that agile development has properly caught on in many organizations, it’s rewarding to hear someone give a breakdown of “Here’s how it actually works for us on our projects.”

Vampires vs. Werewolves: Ending the War Between Developers and Sysadmins with Puppet

Bess Sadler from Stanford (formerly of UVA) and the Blacklight Project gave her Vampires and Werewolves rallying call for ending the war between developers and system administrators.  Developers are responsible for making new apps, implementing new features, and offering new technologies that meet users needs; it’s their job to innovate.  System Administrators are responsible for ensuring the stability of systems; it’s their job to be risk-averse.  This is usually a recipe for protracted pain.  Bess proposes that we can solve this by changing the way we think about the relationship between these two groups.  She also proposes using a number of industry best practices.

  • “Let go of the anger. Really listen.”
  • Take the system administrators out to coffee
  • Test your code & use continuous integration (ie. Rspec with rcov & Hudson)
  • Use system monitoring (ie. nagios) to monitor specific features as well as apps
  • Document your software (”You can’t RTFM if TFM doesn’t exist.”)
  • Use an automated system configuration tool (ie. Puppet)
This last item, using automated system configuration tools like puppet, is the lynchpin of Bess’ argument.  By allowing developers to automate the majority of deployment tasks, you gain clear paths for developers and system administrators to share the burden of keeping the “villagers with pitchforks” happy.

I am Not Your Mother: Write Your Test Code

Naomi Dushay, Willy Mene, and Jessie Keck from Stanford told us — and showed us — why testing your code is a must.  Naomi opened with an opening coda “I am not your mother.  Write your test code.” and then showed us through the rcov coverage reports (in Hudson) for Blacklight, SearchWorks (Stanford’s customized version of Blacklight), both of which went from below 10% test coverage to over 90% test coverage in under a year.
Jessie Keck followed with his account of being converted to Test Driven Development (TDD), including a quick real-time demos of writing rspec tests and cucumber tests.  He also gave some
Willy Mene closed out the presentation with a breakdown of the different types of tests you should be writing
  • unit tests
  • integration tests
  • black box/functional/acceptance tests
One of the crucial points that Naomi, Willy and Jessie all mentioned was the fact that all developers already write informal tests by calling bits of code and printing out the result in order to visually confirm that the code is working properly.  The main argument in favor of automated tests is that a computer is better than you (and your LCD-strained eyes) at performing repetitive actions.

Video of the Presentations

All of the Code4Lib presentations are being recorded.  They will be posted on the conference website sometime in the days/weeks after the event ends.

Possibly next year we will be able to stream the presentations live.  That would certainly make the irc channel easier for outsiders to follow.

Video: Hello to Dev8D from Code4Lib

Wednesday, February 24th, 2010
We had intended to do live streaming chats on ustream, but as it turns out, that is too much of a strain on the wifi here in Asheville.  As a result, I’ve set up a channel on vimeo titled Code4Lib & Dev8D 2010.  Our first video is up there now.

Hello to Dev8D from Code4Lib from Matt Zumwalt on Vimeo.

Code4Lib 2010 - Day 1

Wednesday, February 24th, 2010

The first day of Code4Lib has drawn to a close.  The irc channel was flooded with raucous banter and twitter was very active as well.  In the evening, one person who had only followed the tweets commented to me that “people at this conference are very polite”.  I jokingly assured her that all of the sideways comments must have been reserved for the irc channel.

The conference is off to a great start.  Meanwhile, many people expressed intrigue at the prospect of Dev8D attendees remotely adding their noise to the mix tomorrow.

Keynote: Cathy Marshall on Personal Digital Information

The conference opened with an insightful and humorous keynote presentation by Cathy Marshall from Microsoft Research. The presentation delved into the technical, psychological, and sociological factors that impact management and retention of personal digital information.

Morning Sessions: Linked Data, Cloud Computing

After an impressive and RDF-laden linked data pep talk by Ross Singer, the morning’s presentations had a strong cloud computing bent, which swiftly gave rise to groans on the irc channel about cloud computing being over-hyped. The complaints were quickly challenged by others who pointed out that Code4Lib’s entire schedule was decided by popular vote.  ”Where were your complaints when it was up for vote?”

Personally, in addition to watching the presentations and following the rapid-fire commentary on irc, I spent time exploring the commands that zoia the irc bot can respond to.

Afternoon Sessions: It all comes back to MARC Metadata.

The bread and butter for hackers in libraries inevitably revolves around MARC metadata and Integrated Library Systems (ILS).  MARC records are notoriously troublesome and unreliable.  It’s no surprise, then, that the afternoon’s presentations were almost exclusively about technologies and techniques for controlling, cleaning up, aggregating, de-duplicating, consolidating, and editing MARC metadata.

Lightning Talks & Breakout Sessions

The day closed with 14 Lightning Talks followed by 7 Breakout Sessions.

The Lightning Talks were:

  • UW Forward - Steve Meyer
  • MODS4Ruby & Opinionated XML - Matt Zumwalt
  • The Digital Archaeological Record - Matt Cordial
  • Hydra: Blacklight + ActiveFedora + Rails - Willy Mene
  • Why CouchDB? - Benjamin Young
  • Data integrity (cheap, fast, and easy) - Gwen Exner
  • HathiTrust Large Scale Search update - Tom Burton-West
  • EAD and MARC Sitting in a Tree: D-R-U-P-A-L - anarchivist
  • EZproxy Wondertool - Paul Joseph
  • HathiTrust APIs - Albert Bertram
  • Repository of MARC Abominations - Simon Spero and J-Rock
  • Mystery Meat - Joe Atzberger
  • Fuwatto Search - Masao Takaku

    The Breakout Sessions were:

    • Code4Lib Journal open meeting/discussion
    • Cloud4lib - next steps
    • xC - Extensible Catalog
    • VuFind
    • Solr
    • CouchDB
    • MODS for Ruby & Opinionated XML

    The lightning talks were, as always, diverse and energizing.  The breakouts drew a fairly even spread of interest across all 7 topics, allowing everyone to dive deep into their chosen subject area with 8 to 20 people.

    Tips for Code4Lib Outsiders and Newbies

    Monday, February 22nd, 2010

    Today is the day for Code4Lib pre-conferences.  Things are already roaring ahead.  A couple of items have jumped to the foreground that might not be obvious to new arrivals or to people watching from afar.

    Code4Lib Twitter List

    There is a code4lib twitter list.  This is an aggregation of all of the tweets by people who’ve listed themselves in the 2010 Twitter list page on the Code4Lib wiki.

    The Channel

    At Code4Lib, more than half of the conversation in the room actually happens in the #code4lib irc channel, aka “the channel”.  If you see someone giggle at something on their computer screen, there’s a pretty solid chance that they’re either laughing at something on the channel or they’re about to post a link to it to the channel.  If a presenter strikes a chord, positive or negative, you can see the murmurs come up in the channel.  In short, if you’re not watching the channel, you’re missing most of the story.

    In a way, as a rough rule of thumb, it’s like irc is where code4lib’s intra-conference chatter occurs while twitter is where people banter with the rest of the world.

    Zoia & her new Twitter Feed

    Zoia is the code4lib irc bot.  She lives in the #code4lib irc channel listening for any posts that begin with ‘@’.  If she recognizes the command, she will perform the associated function.  If she does not recognize the command, she will lovingly mock you.  Conveniently, this serves as a cute nudge to any twitter users who instinctively use @… to address other users. (the irc convention for addressing other users is to preface messages with the username and a colon - ie. “mediashelf :“)

    As of 10:20am EST on the 22nd of February,  Zoia has her own twitter account - bot4lib.  People in the irc channel can command Zoia to post things on her twitter feed.  It’s not quite hive mind, but it comes pretty close to hive voice.  Who said hackers are individualists?

    Ideas Flitting Across the Pond: Crosstalk between Code4Lib and Dev8D

    Monday, February 22nd, 2010

    In order to take advantage of the fact that Code4Lib and Dev8D are happening simultaneously this year, we’re going to try to facilitate a bit of Transatlantic crosstalk. After all, how could thousands of miles, five time zones and a couple trillion gallons of water keep developers apart? Nerdy enthusiasm knows no boundaries.

    With these two crowds of innovative and skillful people, there’s no way to fully anticipate how collaboration will occur. As a baseline, we will be arranging the following: daily video updates, daily live chats, regular updates on the MediaShelf blog, and links to other blogs that are reporting on the events.

    Dates & Schedules

    The two events will only overlap for 2 days - the 24th and 25th of February.

    Code4Lib dates: 22nd-25th February
    Dev8D dates: 24th-27th February

    For more info, see the Code4Lib schedule and the Dev8D programme.

    Hashtags & irc

    There are no official twitter hashtags (nor blog/delicious tags) designated for these events. As the excitement ramps up, the tags to watch at first will be

    If you read really fast, you could watch all three together.

    We will try to keep you posted as new memes and hashtags come to the forefront.

    Of course, the real place to witness code4lib chatter is in the #code4lib irc channel.

    UStream Channels

    We’ve created two UStream channels where we will be hosting occasional live feeds and uploading the daily video updates. Check them out and watch for changes: Code4Lib 2010 on UStream & Dev8D 2010 on UStream.
    When we decide on the times for hosting the live chats and posting the daily video updates, we will announce the times here and on the UStream channels.

    If you would like to contribute your own videos to these channels, create a UStream account, add the videos to your account, and tag the videos with either code4lib2010, dev8d2010, or both. Once you’ve done that, let us know your UStream username and we will add you to the list of contributors. Your videos will then show up on the channels according to which tag you used.

    Code4Lib & Dev8D: How they measured up last year.

    Monday, February 8th, 2010

    In early 2009, I chanced upon becoming the only person to attend both Dev8D in London and Code4Lib 2009 in Providence, Rhode Island. While at Code4Lib, I had every intention of posting a comparison of the two conferences, but eventually decided that nobody would care to hear about it.  The blog post sat unfinished for a year until the news came out that both conferences would be happening simultaneously this year.  Suddenly a side-by-side comparison seemed much more meaningful.  I dug into my notebooks and found the table you see here.

    Further down the page, for the uninitiated, I’ve included an off-the-cuff description of each conference.

    Code4Lib & Dev8D 2009 - Quick Comparison

      Dev8D Code4Lib
    Primary Area of Focus Repositories & Academic Computing Libraries
    Dominant Vendors Blackboard, Microsoft ExLibris, OCLC, Talis
    Relationship with Vendors undecided "make it good or we'll throw bacon at you."
    Preferred Channel of Ongoing Communication twitter! everywhere! irc: the only true channel
    Programming Languages python, ruby, java, php python, ruby, java, php
    Presentation Structure multiple simultaneous sessions single track
    Presentation Format very little powerpoint, more conversation lotsa powerpoint, but fun
    Worst Technical Glitch troublesome wifi troublesome wifi
    After-Hours organized evening activities "cliques"
    People in Attendance 90? 250?
    Popular Terms "stories" "z.3950", "METS", "LCSH"
    Word Most Likely to Elicit Booing "xml" "xml"
    Biggest Buzzword Linked Data Linked Data
    Data Interfaces & Formats Most Often Mentioned SWORD, ORE, RDF Jangle, METS
    Dominant Software Products DSpace, Fedora, EPrints WorldCat, Blacklight, VuFind
    How Established Is It? might have become a one-off fourth year and going strong
    Gender Balance 3 women? 15-20% women
    Pigeonhole geeks? nerds?
    Snark Figurehead repohate anarchivist
    Digs Palmer's Lodge (free rooms in a crowded hostel) Renaissance Providence (a restored Masonic temple)
    Location London, UK Providence, RI, USA
    Organizers David Flanders - a true dynamo from JISC host committee, most decisions made via community voting
    Mood Around Lightning Talks easy to sign up, low pressure audience competitive signup, intimidating audience
    Hygiene occasionally stinky I was the unkempt one.
    Scheduling spare time - by design packed schedule
    My Most Memorable Moment The Dragons' Den - Pitching software ideas to impressive judges, with real cash money reward at stake. LinkedData Pre-Conference with Ed Summers, Dan Chudnov and Michael Giarlo

    Code4Lib: the big annual download for coders in libraries

    Code4Lib has a large, established community based mostly in North America.  As the name implies, the community consists of “computer programmers and library technologists who largely work for and with libraries”[from Code4Lib wiki]. The Code4Lib Journal and the #code4lib irc channel are hubs of intense discourse year round.  The Code4Lib conferences are run as a single track where every presenter delivers his/her work before the entire group of conference attendees. The lineup of presenters, along with the location and dates for the conference are chosen each year by means of an open, public voting process.

    The annual Code4Lib conferences are the primary place for this community of extremely smart, innovative people to exchange notes about the work they’re doing.  It’s very intense, very technical, and extremely informative.  There is very little hand-holding; everyone is expected to keep up with the flow of information or get out of the way.  For the past two years (possibly longer), registration has reached capacity within 48 hours of opening up.

    Taken as a group, the Code4Lib crowd can seem intimidating.  They have a very strong meritocracy, which is one of their biggest strengths.  The only sure-fire way to get attention in this community is to make great software and to share it.  If you do that, you will be acknowledged and socially rewarded.  I heard complaints about “cliques” forming, mainly because everyone was left to fend for themselves in the evenings - people who already knew each other clumped together and swiftly disappeared.  Arriving as an outsider, this was unnerving for me at first but in the end everyone was actually very friendly and eager to trade ideas with anyone who could keep up with the intellectual pace.

    My most memorable moment at Code4Lib was the Linked Data preconference hosted by Ed Summers, Dan Chudnov and Michael Giarlo from the Library of Congress.  As part of the day-long workshop, they set up one of the best hands-on lessons I’ve ever been part of.  The premise was simple: use FOAF profiles as the means for entering into a raffle.  This relatively simple exercise forced everyone in the room to, at the very least, publish a FOAF document (anywhere on the web) with a couple of required assertions included.  Those with more experience helped those with less.  Everyone gained something and one key concept nailed its mark: the linked data movement of 2009 has many, many things in common with the early world wide web, and everyone is invited along for the ride.

    Code4Lib 2010 will be held in Asheville, NC February 22-26.  It has been sold out since December.

    Dev8D: making developer happiness in the UK

    Dev8D is the brainchild of David Flanders at JISC.  The name, Developer Happiness Days, harkens to the meme of developer happiness that became one of the core mantras for “lightweight application framework” evangelists around 2005/2006.  As I understand it, David looked around at all of the developers working on JISC-funded projects across the UK and saw a massive pool of technical talent hampered by the fact it lacked an  ongoing community for exchange of ideas, skills, criticism and praise.  He convinced JISC to fund a small conference specifically for software developers, and specifically aimed at fostering a developer community.  They made him promise to provide proof of “innovation” happening and — tada — Dev8D was born.

    I’m tempted to describe Dev8D as zany.  David threw in as many activities, challenges, rewards, and discussion topics as he could possibly muster.  His goal was simple — get people talking, encourage them to dream up new technical possibilities, reward them for sharing those ideas, reward them even more for executing those ideas, and have fun in the process.

    It would have been hard for an enthusiastic software developer not to have fun at Dev8D.  We were all basically along for the ride.  Every time you turned a corner, someone had cooked up a new nifty little web app or come up with some novel way to use an existing service to augment our Dev8D free-for-all.  David used wordle to generate tag cloud “name tags” from everyone’s blogs.  By lunch on the first day, someone started projecting a twitter aggregation of #dev8d on the wall.  By tea time, we had two or three custom apps floating around that did wacky stuff based on codes in your tweets.  (In February 2010, this all may sound a bit tired, but remember that a mere year ago all things twitter were brand new to nearly everyone.)  The name of the game was “dream it up and do it”, and we certainly did that.

    My most memorable moment at Dev8D was The Dragon’s Den.  This was David’s solution to JISC’s requirement that he provide proof of innovation as outputs from the event.   JISC set up a Developers Challenge, providing real-world users from UK academia and soliciting the Dev8D attendees to submit their best ideas for new software to solve those users’ technological needs.  The best team would win a wad of cash in exchange for documenting their idea.  The way you submitted your idea was by presenting it in the Dragon’s Den, where a panel of judges from JISC and local industry would hear your idea, ask questions, and provide feedback.  Though my submission didn’t win the challenge, I learned a lot from those judges and would participate again in a heartbeat simply for the feedback.

    Dev8D 2010 will be held in London 24-27 February.  They may still be accepting new registrations, though all of the free accommodations have been slurped up.

    Crosstalk in 2010

    Later this month, Code4Lib 2010 and Dev8D 2010 will have three days of overlap.  I will be in Asheville, NC for Code4Lib and Eddie Shin will be in London for Dev8D, so MediaShelf will be plugged in on both sides of the pond.  We have promised to encourage crosstalk between the events, though we haven’t sorted out the details.  When we settle on a plan, we will post an update on this blog.

    RIRI Year Two Redux

    Saturday, July 25th, 2009

    This week the University of Prince Edward Island hosted the second installment of Red Island Repository Institute (RIRI).  Participants came from North America and Europe to get a week-long intensive immersion in all things Fedora.  The institute is organized by Mark Leggott and the team at UPEI who created Islandora, a plugin that integrates Fedora into Drupal.  Thorny Staples (Product Director for Fedora Commons), Chris Wilper (Technical Lead for Fedora Commons), and I were the primary instructors.

    Due to budget cuts across the board, there were less people in attendance at RIRI this year.  Over the next few years, broad community collaboration is certain to involve much more remote interaction online and less face-to-face exchange at large conferences.  I have a hunch that this will actually be beneficial for the growth of technical communities as long as it doesn’t last too long or become too restrictive.  After all, there is something potent about the ethos of Less Talk, More Code.  In the meantime, those of us who were able to make it to PEI had a really productive week.

    Though structured as an intensive week-long training exercise, RIRI has taken on an element of information exchange between all of the participants.  The predominant flow of information was, as you would expect, from the three instructors to the 15 attendees, but I think that Thorny and Chris will agree with me that we have all picked up some good information and ideas along the way.  If anything, the institute has started to migrate in the direction of that quasi-mythical ideal conference where you get the conference-type info about who is doing what in your field while also getting your hands dirty with workshops and in-depth detail and discussion of how they are doing it and how you might build on their efforts.

    Learning Curve

    Those of us who were attending RIRI for the second time were of the consensus that this year had a much more gratifying learning curve with a sufficient continuity carrying through all five days.  This was predominantly thanks to how much Islandora and ActiveFedora have matured in the past twelve months.  We had everyone up and running, writing and running code against Fedora, in the multiple hands-on sessions through the week.

    In contrast with last year, where we spent quite a bit of time on installing and configuring Fedora, this year we were able to use a VirtualBox that the Islandora team set up.  The VirtualBox has a debian image with Fedora, Islandora and Drupal pre-installed.  On Tuesday I installed Ruby, ActiveFedora and Solr on there too.  By Wednesday, we were able to pass around a couple of USB sticks and have everyone running identical development environments for the hands-on sessions.

    I’m sure we will soon post the VirtualBox image for anyone to download.  Next year, everyone will probably have it running on their laptops before they even arrive on the Island.

    Islandora

    The Islandora team at UPEI have been very busy setting up Fedora-based Virtual Research Environments for the variety of scholars at their University.  It seems that once they had hammered out the module that integrates Fedora into Drupal, they fearlessly dove into spinning off deployments for numerous disciplines and projects.  I was thoroughly impressed by how many VREs they have set up, the diversity of content they are collecting, and the range of tools that they are setting up to manage, display, and operate on that content.  Four years ago, I looked at Fedora and saw a tidal wave of potential uses behind it.  That wave has finally begun to break and it is as exciting as we had imagined.

    ActiveFedora

    We used ActiveFedora as the basis for the hands-on sections of the institute.  On Wednesday we basically ran through the console tour that you can find on the ActiveFedora project site.  In the developer breakout on Thursday, we defined some ActiveFedora models in a rails app, created some Fedora objects based on those models, and explored their RDF relationships.

    “That really makes sense.” seemed like the main response to ActiveFedora.  I can’t think of a better confirmation that designing the Domain Specific Language (DSL) for ActiveFedora’s models was worth it.

    As part of the workshops, I also ran through the improvements I have planned for ActiveFedora version 1.1.

    Hydra

    Thorny gave a really great presentation on the background, goals, and status of the Hydra project.    Hydra and Islandora stand as wonderful complements to each other.  Islandora, written in PHP, takes Drupal as the starting point from which it reaches into the wide open space of a Fedora repository.  Hydra takes nearly the opposite approach.  Hydra does not assume an overarching system as its operating context.  Instead it builds outwards from ActiveFedora, which in turn builds upon Fedora’s internal flexibilities and strengths.   Where Islandora functions as a component that you plug into Drupal, Hydra apps are free-standing solutions that in turn rely on a core whose functionality can be integrated into potentially any system.  One is a stalactite, the other is a stalagmite.  Both have grown naturally out of real-world needs.

    FeSL

    Projects seeking to adopt Fedora right now have great options before them.  Different technologies will suit different projects while the overall vision and best practices pollinate across solutions.  The best example of this cross pollination is the FeSL Project, which Islandora and Hydra are both participating in along with MediaShelf and DuraSpace.  This effort will result in a complete replacement of Fedora’s security implementation so that it can be used more effectively and more flexibly by any of the client applications we write — Hydra, ActiveFedora, Islandora, Muradora, or otherwise.  We are still seeking additional community funding for FeSL.  For more info, see the Fedora Enhanced Security Layer page on the Fedora wiki.

    Workflow

    One of the more exciting themes for me this year was workflow for Fedora Repositories.  I’ve been actively interested in the topic since Richard Green and Chris Awre presented RepoMMan at OpenRepositories in 2007.  Despite my interest, I have tread softly in that realm because the topic, and particularly technologies like BPEL, have always seemed dangerously askew from the actual problem(s) at hand.  I’ve always felt that we were framing the entire problem incorrectly.  In the past nine months, that exact line of thinking has come to a head, leading people like the Hydra project (including the former RepoMMan team) to explore a variety of approaches.  This resulted in a well-received presentation at OpenRepositories this year in Atlanta, which Thorny presented again for the RIRI participants this week.

    The topic of workflow finally congealed for me in an actionable way here at RIRI when Mark Leggott started talking about supporting Taverna in the Islandora VREs.  Taverna is one of a handful of “workflow engines” designed to allow scientists to chain together batches of computational operations that they want to perform on data in their labs.  From an architectural perspective, Taverna isn’t all that different from BPEL tools.  The difference is that Taverna and its ilk are designed with the assumption that the scientist who owns the data will create the workflows.  Consider this in contrast to the idea of a developer or repository manager creating workflows on behalf of the end users.  This simple assumption, that workflows will be constructed and run by the people who own/create the underlying content, leads the technology down a very different path of development.  Until now, we have mostly thought of workflows as services that a repository provides for end users.  It’s time for us to flip that around and instead think of ways to expose repositories and their supporting services as nodes that end users can tie into their own workflows as they see fit.  This approach has the ring of good engineering.  It’s a simpler, loosely-coupled, user-centric solution to the problem.  It has the additional benefit of putting the repository engineer alongside the content owner/creator while they both create and share workflows to perform their corresponding tasks.

    Solutions Integration Community

    RIRI has also brought focus back onto the utility of the Solutions Integration Community that we have been gradually building this past year.  I’ve now set up a fedora-solutions google group.  Join up if you want to be a part of the stuff described on the solutions integration community page on the Fedora wiki.

    A Great Year to Come

    Over the next 12 months we are going to see a brilliant array of advances around Fedora Repositories.  MediaShelf will certainly touch many parts of those advances.  We’re here to help you make your mark in this space, and we’re here to make Fedora work for your users.  Contact us if you want to put MediaShelf to work on your repository efforts.