Media, Blacklight, and Viewers Like You
Chris Beer from WGBH (Boston’s Public Television; producers of Nova, Julia Child, etc.) walked us through the inner workings of the new WGBH website that runs on Blacklight, Fedora, and LightHTTPD. Without getting bogged down in the details, he touched on many of the technical and organizational challenges they faced and how they addressed them. Some of the technologies in the mix
- PBCore: Metadata Standard for Media based on Dublin Core
- ODRL: Open Digital Rights Language
Chris did some amazing things with JQuery and FlowPlayer in his customized Blacklight views. One example: You can browse to a point in a video by clicking in the transcript. If you then play the video, the transcript automatically scrolls with it — accurately. This was made possible by the fact that WGBH already has detailed timecodes embedded in their transcripts.
The data model for a single episode in the WGBH archive fills an entire page — raw video, edited video, music, images, post-processing, transcripts, etc. with rights and permissions varying for each part. This highlights the grace of Blacklight, which allowed him to simply leave that system in place and merely expose the public parts with a slick, faceted search & discovery interface.
One detail that Chris called out was the fact that Fedora has trouble with large (5GB+) datastreams. He pulled up the corresponding (unresolved) tickets in Fedora’s Jira to establish the fact that this is a long-known unresolved problem. He then added that WGBH’s own internal (proprietary) DAM system has the same problems. At least with open source software, addressing a bug like this is a matter of public process and community initiative rather than being subject to the whims of a single organization.
WGBH will soon be launching a number of public websites based on this Fedora+Blacklight combination.
Becoming Truly Innovative: Migrating from Millennium to Koha
Ian Walls from NYU gave a humorous account of finally ditching proprietary Integrated Library Systems, which he represented as battling kittens, in favor of the open source ILS called Koha. He went through the details of how they handled the challenge of migrating their data from iii (the proprietary ILS) to Koha and closed by telling the room that the process he used should work for anyone with a similar proprietary install. The NYU migration took a total of 3-4 months without any all-nighters.
All of the code he used is available at contribs.koha.org. I don’t personally work with ILS systems, so I can’t say I have a solid read on the impact of this information, but I got the distinct impression that this was very encouraging news for a many of the people in the room. When someone from the audience asked for a show of hands “How many people would like to migrate away from iii?” more than a third of the room raised their hands.
Ask Anything! (aka. Human Search Engine)
I know the Dev8D people will dig this one.
Dan Chudnov from Library of Congress facilitated an open-ended session where anyone with a question/request/missed connection could grab a mic and announce their desire. In short, the “If there’s anything that you want to ask everybody, now is your chance to speak up.” The core idea was to give people an opportunity to tap the collective knowledge of the room.
The process was to hear a question, identify people who want to respond, and then move on to the next question. The full-room discussion was kept to a minimum. Here are most of the questions asked (I missed a few).
- Switching from Perl: What language should I switch to?
- How do I make an API accessible only from specific domains
- Who is using my library (pymarc) and how are you using it? Please tell me so I can make it better.
- How do I extract files from Internet Archive .arc format?
- Is anyone interested in helping my organization port a custom staff app from Millenium ILS to Aleph ILS?
- Is there any interest in a 1-2 day Blacklight meeting?
- Has anyone used the California Digital Library digital curation microservices?
- Please tell us: What do Librarians (non-techies) need to know about software development?
- What’s the best way to do complex log analysis? (answer: splunk)
- I want to allow people to add data to my site without maintaining user info for them? I want to use something like OAuth but it has to work with something like curl as the client? (recommendation: RPX)
- How do I model dates & date ranges in RDF?
- Proposal for a breakout session on building homebrew digital repositories (instead of Fedora, DSpace, etc)
- Request for 10 minute tutorial on 3D graphics programming
- Anyone interested in modeling archival description in RDF? If so, join http://groups.google.com/group/semantic-archives
- Is anyone using OpenCalais module for Drupal?
- Is anyone here going to be working on the OLE project?
- I’ve created software for hiding borrower information in library systems, but how can I share the code without making it vulnerable to “the man”?
- Is anyone doing work with HTML5? Especially geodata services, etc?
- Are there any libraries out there using Plone for their website?
- Who else is developing Facebook apps for their libraries? Any ideas how to make a useful Facebook app for a library?
- Is anyone using MARCLogic or eXist in production systems?
- Since this seems to be really working, Should we start something like a Stackoverflow for library hackers? (answer: Yes. Awesome idea.)
- We’re working on search & discovery with non-roman scripts in a solr/lucene context. Please provide suggestions.
- Is anyone working on natural language processing implementations for machine learning?
- We’re moving all of our production systems to cloud environments. Has anyone else done that?
- Is there a JSON library for MARC? (answer: it’s built into evergreen, OCLC is also looking into it.)
The verdict on Ask Anything: resounding applause, many smiles.
Alex O’Neil from University of Prince Edward Island emcee’d a very fun round of trivia before we adjourned for lunch. It was wicked fun.