Archive for the ‘fedora’ Category

Blacklight, ActiveFedora and Shelver: Interplay between Searching, Managing and Indexing in a Repository Solution

Monday, March 1st, 2010

I submitted an abbreviated version of this proposal (limited to 4 pages) to the OR10 review committee.  Feel free to download the abbreviated version or this long version in PDF format.

OpenRepositories 2010 Presentation Proposal (Long Version)

Any repository solution provides facilities for Creation, Management, & Editing of Content as well as facilities for Searching & Browsing through that content. Experience has shown that when a solution binds these two areas of functionality together too tightly, the system becomes brittle and unworkable, discouraging innovation. Our work on the Hydra project has produced a flexible and intuitive solution that combines these two areas in an almost entirely decoupled fashion. This solution, which is already working in multiple Hydra applications, is built on a three-part pattern where Blacklight handles Search & Discovery, ActiveFedora handles Creation, Management and Editing of Content, and a small application called Shelver supplies the crossover point by indexing the content into Solr so that it will show up in Blacklight. This three-part approach reflects a strong pattern for designing and/or improving repository solutions. The main pivot of this approach is to treat indexing as its own separate part of the application and to allow that indexing processes to evolve constantly as part of the application development cycle.

This work is the product of combining established best practices, best of breed software, and lessons learned from an iterative approach to application development. While our implementation is focused on Fedora Repositories, the software could be used in multiple contexts and the pattern is certainly applicable to any content-oriented application.

The anatomy of a Hydra Application

Note: This is a working model of the functional structure of a Hydra application. The complete designs for the final features and functionality of Hydra applications reach far beyond what is presented here. For more information on the greater vision around the Hydra project, please refer to the Hydra Project pages on the Fedora Commons wiki.

  • The portion of a Hydra application that handles Creation, Management, & Editing of content is provided by the Hydra Core, which consists of ActiveFedora along with a few Hydra “helpers” which integrate ActiveFedora into Ruby on Rails.
  • The Search & Discovery portion of a Hydra application is a Blacklight installation - nothing more, nothing less. As with any Blacklight installation, its behavior and appearance are likely to be customized but otherwise there is nothing Hydra-specific about it.
  • Shelver (which can be run either from within the application, from the command line, or as a JMS listener) indexes content and its salient metadata in Solr, usually pulling that content from Fedora.

These three components — Blacklight, and Hydra Core and Shelver — work in concert to present a consolidated repository solution to the end user. Meanwhile, the three components are sufficiently decoupled that each could be run as a freestanding application. They interoperate based on a minimal contract that revolves around decisions about what information should be in Solr and how it should be represented in the Solr index in order to achieve the ideal search experience.

In the process of customizing or extending a Hydra application, some changes require modifications to all three components, but most changes impact only one or two of the components at a time. This makes it very easy to iteratively improve the application and adapt to real world needs.

This structure grew naturally out of a process of exploration. In early 2009 developers at UVA and Stanford discovered that it was relatively easy to put Blacklight on top of a Solr index that had in turn been been populated by ActiveFedora — effectively turning Blacklight into a search & discovery interface for that Fedora repository. Based on that, we tried dropping ActiveFedora-driven views & controllers for editing Fedora content into the same Ruby on Rails application as an existing copy of Blacklight. It worked like a charm. The two systems happily coexist. What we found was that as long as we could change and refine how the metadata percolates from Fedora into Solr, getting Blacklight to operate together with the ActiveFedora management component was completely straightforward.

With most Hydra applications, all content is stored in a Fedora Repository. However, there is nothing to prevent you from adding non-Fedora content to solr and having it show up in the (Blacklight) search & discovery views. Of course, that content will not be editable unless you implement the code to integrate with that content’s host system.

Best of Breed: Blacklight & ActiveFedora

Blacklight is a next generation Search & Discovery tool. It was intentionally designed to serve a single purpose - Search & Discovery - without having any knowledge of indexing, cataloging, or even the location of the content it’s searching through. Whatever information you have in your Solr index, Blacklight will help you expose a rich, faceted search interface for exploring through that information and displaying detail views of individual records. This open-ended design made it very easy for us to integrate Blacklight directly into our Hydra applications as-is. The ease with which we achieved this seamless integration is a testament to the quality of Blacklight’s design.

ActiveFedora is a Ruby library that encapsulates the details of interacting with a Fedora repository and provides high-level tools for defining data models, creating Fedora objects, and modifying the data associated with those objects. While opening the door to rapid, iterative application development, ActiveFedora also attempts to expose and accentuate many of the strong design patterns inherent in Fedora. ActiveFedora’s emphasis on flexibility and design patterns provided us with many opportunities to make our Hydra applications robust and re-usable. In particular, ActiveFedora makes it possible for multiple Hydra (and non-Hydra) applications to operate on top of a single Fedora repository, thus achieving the goal of providing many lightweight views onto complex, heterogeneous repository content.

Hydra Core: the building blocks of an interface for creating & editing Fedora content

Hydra Core provides the Ruby on Rails code that handles Creation, Management, & Editing of Fedora content. This primarily consists of Rails helpers for generating edit interfaces and Rails controllers to handle the submissions from those interfaces.

Fedora allows a great amount of freedom with respect to data models and metadata. As a result, we could not simply create a single generic content management interface in Hydra. Instead, we created a number of “helpers” that allow you to deal with your Fedora content and its metadata at a high level of abstraction. For example, the editable_metadata_field helper generates the HTML for displaying an editable version of whichever metadata field you specify. All you have to know is what field you want to display and where it is stored within the object. Everything else is handled for you.

The forms generated by the Hydra helpers need somewhere to submit their data to. This is handled by the Rails controllers provided by Hydra core.

Underneath the helpers and controllers, Hydra Core relies on ActiveFedora to handle connecting with Fedora, modeling Fedora objects & metadata, and performing the basic operations of creation, retrieval, updating and deletion.

Shelver: a script that brought unexpected freedom

When we wrote Shelver, we didn’t anticipate how integral it would become to the application development process. Shelver started out as something extremely simple. A developer at Stanford initially wrote it in order to populate a Solr index with some working data from a Fedora Repository. Over time, as needs arose, we built out the script to be more robust. It soon became apparent how crucial it is to be able to modify and/or augment the behavior of your indexing tool. In most other systems, the indexing tool is either implicit (relational databases) or external to your application and difficult to re-configure (ie. Fedora GSearch). As a result, when working with other systems, discussion of (and changes to) the indexing strategy are kept to a minimum. In contrast, since we had Shelver at our disposal, we found ourselves constantly tweaking it to satisfy new functionality. This ability to tweak our indexing routines gives us radical freedom to explore new features, improve the search experience, and increase the quality of search results.

Eventually, we pulled shelver into the Hydra app itself so that we could trigger it as part of the save/update process, though we retained the hooks for running it as a command line tool as well. We also did this because we found that changes we made to shelver often corresponded to changes in the search interface. Shelver was continually evolving in conjunction with the application, so it made sense to track the code with a single versioning system.

Approaches to Indexing: from RDBMS to Fedora + Shelver

RDBMS (data model & search index combined)

If you rely solely on a Relational Database to drive your application, your data model (the database schema) is also your indexing model — any search oriented changes necessitate changes to your data model. This makes it difficult to refine and extend the search & discovery portion of the application without impacting other areas of functionality.

RDBMS + Solr (separate search index from data without much thought to the conceptual differences)

A number of tools exist for pulling content from a relational database into Solr. This achieves the goal of separating the search index from the data itself, allowing you to have an indexing model separate from your data model. However, often with these systems the indexing methodology remains tightly bound to the data model. This is more of a conceptual stumbling block than a technical one. It’s easy to underestimate the complexity and distinctiveness of indexing for search & discovery. It is not enough to index your data in Solr; you must think differently about how and why you put it there. This “thinking” must be manifest somewhere in the application’s code, ideally separated from the rest of the application.

Fedora + GSearch + Solr (freestanding tool specifically handles indexing)

Fedora is explicitly designed with the idea that you should separate your data model from your indexing solution(s). This allows us to use any variety of content models and metadata schemas to represent our content in Fedora while pulling that information into any number of indexes to suit specific searching needs. The most common indexing approach with Fedora repositories is to use Fedora GSearch to pull Fedora content into a Lucene, Solr or Zebra index. This approach has the benefits of completely separating the data from the index while also providing a freestanding, configurable tool to handle the process of indexing.

GSearch was designed with the goal of enabling 1) full-text searching of Fedora content and 2) indexing of arbitrary XML metadata from Fedora objects. It runs as a web application alongside Fedora, listening for JMS messages or REST API commands telling it to (re)index Fedora objects. The process by which GSearch indexes the content is implemented as a mix of XSLT and Java code.

GSearch establishes the strong best practice of decoupling both the search index and the indexing process from the data itself. This pattern was part of Fedora’s design all along, but thus far GSearch has been the clearest manifestation of it.

Because it was designed specifically to enable full-text indexing using XSL Transformations (XSLT), GSearch operates on the premise that you are transforming the content in order to put it in the index. In a basic system, transformations are sufficient. However, most repository solutions eventually need to actively process the data when indexing it, performing complex actions in order to decide how to populate the search index. Because XSLT does not lend itself to performing such complex processes, you must modify Java code if you want to implement this type of processing in GSearch. Modifying that code has proven daunting for most. Very few projects have taken on the challenge of modifying the GSearch code itself. Those that have modified the code have only done so in minimal and relatively stable ways.

Fedora + Shelver + Solr (allowing the indexing methodology to constantly evolve)

If you want to provide a great search & discovery experience in your application, you must make it easy to iteratively “massage” the indexes. Anyone who manages a Blacklight or VuFind installation on top of their ILS (or anyone who participates in Code4Lib) can attest to the fact that in order to achieve a truly successful search & discovery experience you must continually refine the way you index your metadata. Little changes in your indexing methodology can bear tangible results for end-users.

In building SALT, the first Hydra application to combine Blacklight with ActiveFedora, we created Shelver as an alternative to GSearch because we wanted to be able to specify our indexing process in Ruby code and, where possible, we wanted to use simple mapping files rather than being forced to use XSLT and Java to perform those actions. We assumed that Shelver would be a relatively simple application whose code rarely changed. After all, when using GSearch we rarely changed the XSLT and basically never changed the Java code. We expected that the same would be true with Shelver. We were wrong. Shelver is constantly changing because we are constantly coming up with new things that we want to do to improve the search & discovery utilities in our Hydra applications. As time passes, the code of Shelver itself has stabilized, but the instructions for how to index specific data from Fedora continually morphs as a regular part of the application development process. In fact, touching the Shelver code has become such an integral part of our work that we can’t imagine building a repository solution without this kind of freedom.

Conclusion, Observations and Best Practices

To review, some of the recommendations coming out of this work are to

  • use indexing as the crossover point between decoupled solutions for searching through and managing your content
  • make the indexer an explicit, evolving part of your application
  • use flexible components that were designed with iterative development in mind
  • re-use established best practices where possible
  • combine best of breed solutions for astounding results

We were pleasantly surprised to discover how easy it was to combine Blacklight and ActiveFedora into a single Fedora solution. The three-part pattern that emerged out of this effort, which now constitutes a basic Hydra application, builds on well established practices and serendipitously combines them in a stable, intuitive way. This in turn provides a strong base for us all to carry out a great amount of innovative work in the coming years.

RIRI Year Two Redux

Saturday, July 25th, 2009

This week the University of Prince Edward Island hosted the second installment of Red Island Repository Institute (RIRI).  Participants came from North America and Europe to get a week-long intensive immersion in all things Fedora.  The institute is organized by Mark Leggott and the team at UPEI who created Islandora, a plugin that integrates Fedora into Drupal.  Thorny Staples (Product Director for Fedora Commons), Chris Wilper (Technical Lead for Fedora Commons), and I were the primary instructors.

Due to budget cuts across the board, there were less people in attendance at RIRI this year.  Over the next few years, broad community collaboration is certain to involve much more remote interaction online and less face-to-face exchange at large conferences.  I have a hunch that this will actually be beneficial for the growth of technical communities as long as it doesn’t last too long or become too restrictive.  After all, there is something potent about the ethos of Less Talk, More Code.  In the meantime, those of us who were able to make it to PEI had a really productive week.

Though structured as an intensive week-long training exercise, RIRI has taken on an element of information exchange between all of the participants.  The predominant flow of information was, as you would expect, from the three instructors to the 15 attendees, but I think that Thorny and Chris will agree with me that we have all picked up some good information and ideas along the way.  If anything, the institute has started to migrate in the direction of that quasi-mythical ideal conference where you get the conference-type info about who is doing what in your field while also getting your hands dirty with workshops and in-depth detail and discussion of how they are doing it and how you might build on their efforts.

Learning Curve

Those of us who were attending RIRI for the second time were of the consensus that this year had a much more gratifying learning curve with a sufficient continuity carrying through all five days.  This was predominantly thanks to how much Islandora and ActiveFedora have matured in the past twelve months.  We had everyone up and running, writing and running code against Fedora, in the multiple hands-on sessions through the week.

In contrast with last year, where we spent quite a bit of time on installing and configuring Fedora, this year we were able to use a VirtualBox that the Islandora team set up.  The VirtualBox has a debian image with Fedora, Islandora and Drupal pre-installed.  On Tuesday I installed Ruby, ActiveFedora and Solr on there too.  By Wednesday, we were able to pass around a couple of USB sticks and have everyone running identical development environments for the hands-on sessions.

I’m sure we will soon post the VirtualBox image for anyone to download.  Next year, everyone will probably have it running on their laptops before they even arrive on the Island.

Islandora

The Islandora team at UPEI have been very busy setting up Fedora-based Virtual Research Environments for the variety of scholars at their University.  It seems that once they had hammered out the module that integrates Fedora into Drupal, they fearlessly dove into spinning off deployments for numerous disciplines and projects.  I was thoroughly impressed by how many VREs they have set up, the diversity of content they are collecting, and the range of tools that they are setting up to manage, display, and operate on that content.  Four years ago, I looked at Fedora and saw a tidal wave of potential uses behind it.  That wave has finally begun to break and it is as exciting as we had imagined.

ActiveFedora

We used ActiveFedora as the basis for the hands-on sections of the institute.  On Wednesday we basically ran through the console tour that you can find on the ActiveFedora project site.  In the developer breakout on Thursday, we defined some ActiveFedora models in a rails app, created some Fedora objects based on those models, and explored their RDF relationships.

“That really makes sense.” seemed like the main response to ActiveFedora.  I can’t think of a better confirmation that designing the Domain Specific Language (DSL) for ActiveFedora’s models was worth it.

As part of the workshops, I also ran through the improvements I have planned for ActiveFedora version 1.1.

Hydra

Thorny gave a really great presentation on the background, goals, and status of the Hydra project.    Hydra and Islandora stand as wonderful complements to each other.  Islandora, written in PHP, takes Drupal as the starting point from which it reaches into the wide open space of a Fedora repository.  Hydra takes nearly the opposite approach.  Hydra does not assume an overarching system as its operating context.  Instead it builds outwards from ActiveFedora, which in turn builds upon Fedora’s internal flexibilities and strengths.   Where Islandora functions as a component that you plug into Drupal, Hydra apps are free-standing solutions that in turn rely on a core whose functionality can be integrated into potentially any system.  One is a stalactite, the other is a stalagmite.  Both have grown naturally out of real-world needs.

FeSL

Projects seeking to adopt Fedora right now have great options before them.  Different technologies will suit different projects while the overall vision and best practices pollinate across solutions.  The best example of this cross pollination is the FeSL Project, which Islandora and Hydra are both participating in along with MediaShelf and DuraSpace.  This effort will result in a complete replacement of Fedora’s security implementation so that it can be used more effectively and more flexibly by any of the client applications we write — Hydra, ActiveFedora, Islandora, Muradora, or otherwise.  We are still seeking additional community funding for FeSL.  For more info, see the Fedora Enhanced Security Layer page on the Fedora wiki.

Workflow

One of the more exciting themes for me this year was workflow for Fedora Repositories.  I’ve been actively interested in the topic since Richard Green and Chris Awre presented RepoMMan at OpenRepositories in 2007.  Despite my interest, I have tread softly in that realm because the topic, and particularly technologies like BPEL, have always seemed dangerously askew from the actual problem(s) at hand.  I’ve always felt that we were framing the entire problem incorrectly.  In the past nine months, that exact line of thinking has come to a head, leading people like the Hydra project (including the former RepoMMan team) to explore a variety of approaches.  This resulted in a well-received presentation at OpenRepositories this year in Atlanta, which Thorny presented again for the RIRI participants this week.

The topic of workflow finally congealed for me in an actionable way here at RIRI when Mark Leggott started talking about supporting Taverna in the Islandora VREs.  Taverna is one of a handful of “workflow engines” designed to allow scientists to chain together batches of computational operations that they want to perform on data in their labs.  From an architectural perspective, Taverna isn’t all that different from BPEL tools.  The difference is that Taverna and its ilk are designed with the assumption that the scientist who owns the data will create the workflows.  Consider this in contrast to the idea of a developer or repository manager creating workflows on behalf of the end users.  This simple assumption, that workflows will be constructed and run by the people who own/create the underlying content, leads the technology down a very different path of development.  Until now, we have mostly thought of workflows as services that a repository provides for end users.  It’s time for us to flip that around and instead think of ways to expose repositories and their supporting services as nodes that end users can tie into their own workflows as they see fit.  This approach has the ring of good engineering.  It’s a simpler, loosely-coupled, user-centric solution to the problem.  It has the additional benefit of putting the repository engineer alongside the content owner/creator while they both create and share workflows to perform their corresponding tasks.

Solutions Integration Community

RIRI has also brought focus back onto the utility of the Solutions Integration Community that we have been gradually building this past year.  I’ve now set up a fedora-solutions google group.  Join up if you want to be a part of the stuff described on the solutions integration community page on the Fedora wiki.

A Great Year to Come

Over the next 12 months we are going to see a brilliant array of advances around Fedora Repositories.  MediaShelf will certainly touch many parts of those advances.  We’re here to help you make your mark in this space, and we’re here to make Fedora work for your users.  Contact us if you want to put MediaShelf to work on your repository efforts.

Agile Languages & Fedora — Update from OR09

Wednesday, May 20th, 2009

Leading up to this year’s Open Repositories, it became clear that there was demand for a BOF (Birds of Feather) session focused on agile languages and Fedora.  I pitched the idea in an email to a couple colleagues beforehand and then announced the BOF at my presentation on Monday morning.  Rather than constricting it to Fedora projects, I billed it as Agile Languages and Repositories.  About 30 people showed up.  The split was pretty even between Ruby, Python, and PHP developers.  About a third seemed to be Java developers in the process of defecting.  In addition to people doing stuff with Fedora, there were a handful of DSpace developers and possibly a couple who maintain ePrints repositories.  

For the first half of the BOF we sat in mixed groups, eating our lunches and each talking about the work we do.  We then split up by language (Ruby, Python, PHP) and discussed language-specific topics.  For that second half I sat at the Ruby table where we talked about ActiveFedora, JRuby, RDF support for Ruby, MODS support for Ruby, Solr (solr-ruby and RSolr), and how Blacklight fits into the mix. 

I closed the conversation by asking if we should set up email lists for collaboration.  It seemed reasonable to set up a general mailing list for the solutions community as well as a list specifically for people doing stuff with Ruby, Fedora repositories, and (most likely) ActiveFedora.  I also resolved to encourage the creation of Python-oriented and PHP-oriented equivalents.  For now I have created two lists on Google Groups.  The first one, Fedora Commons Create, is for general discourse about creating client applications for Fedora.  The second, ActiveFedora / Ruby + Fedora Commons, is for Ruby-specific collaboration.

In the end, I was really pleased to realize that for the first time we had a substantial group of people interested in each of the main interpreted languages (Ruby, Python, PHP) and each group had at least one open source Fedora-based project to use as a starting point for their conversations.  The Ruby group had ActiveFedora, the Python group had Ben O’Steen’s work and Peter Herndon’s Django integration, and the PHP/Drupal people had Islandora & Fez to start from. 

This was a comfortable step forward from the scenario as it was a year ago.

Google Groups
Fedora Commons Create
Visit this group
Google Groups
ActiveFedora / Ruby + Fedora Commons
Visit this group

Preview of ActiveFedora DSL

Monday, October 6th, 2008

We have been working hard on creating a Domain Specific Language for declaring object models in ActiveFedora.  We settled on a syntax based on DataMapper.

Here are sample Model declarations for Audio Records and Oral Histories that we are using in a current project.  Keep in mind that this is just a teaser.  The syntax is likely to change over the next few months.

require 'active-fedora'

class AudioRecord

include ActiveFedora::Model

relationship "parents", :is_part_of, [nil, :oral_history]
# Also considering...
#has n, :parents, {:predicate => :is_part_of, :likely_types => [nil, :oral_history]}
# OR
# is_part_of [:oral_history]

property "date_recorded",   :date
property "file_name", :string
property "duration",  :string
property "notes", :text

datastream "compressed", ["audio/mpeg"], :multiple => true
datastream "uncompressed", ["audio/wav", "audio/aiff"], :multiple => true

end

Note that we are making it possible to inject custom methods into a class that search against RDF predicates.  This way, thanks to line 9 below, calling oral_history.parts will return everything pointing at the oral history object with info:fedora/isPartOf.  We are also thinking of supporting constraint paramaters like oral_history.parts(:type => AudioRecord), which would only return the parts that are of type AudioRecord.


require 'active-fedora'

class OralHistory 

    # Imitating DataMapper ...

    include ActiveFedora::Model

    relationship "parts", :is_part_of, [:audio_record], :inbound => true

    # These are all the properties that don't quite fit into Qualified DC

    # Put them on the object itself (in the properties datastream) for now.

    property "alt_title", :string

    property "narrator",  :string

    property "interviewer", :integer

    property "transcript_editor", :text

    property "bio", :string

    property "notes", :text

    property "hard_copy_availability", :text

    property "hard_copy_location", :text

 

    has_metadata "dublin_core", :type => ActiveFedora::MetadataDatastream::QualifiedDublinCore do |m|

      # Default :multiple => true, :refinements => :none

      #

      # on retrieval, these will be pluralized and returned as arrays

      # ie. subject_entries = my_oral_history.dublin_core.subjects

      #

      # aiming to use method_missing to support calling methods like

      # my_oral_history.subjects  OR   my_oral_history.titles  OR EVEN my_oral_history.title whenever possible

      m.field "identifier", :string, :refinements => ["info:fedora", "info:doi"]

      m.field "title", :text, {:multiple => false, :required => true}

      m.field "subject", :text, :refinements => ["dcterms:LCSH", :none]

      m.field "date", :date

      m.field "language", :text

      m.field "location", :text

      m.field "coverage", :text, :refinements => ["dcterms:TGN"]

      m.field "temporal", :text, :refinements => ["dcterms:Period"]

      m.field "abstract", :text

      m.field "rights", :text

      m.field "type", :text

      m.field "SizeOrDuration", :text

      m.field "format", :text

      m.field "medium", :text

    end

    has_metadata "significant_passages" do |m|

      m.field "significant_passage", :text

    end

    has_metadata "sensitive_passages" do |m|

      m.field "sensitive_passage", :text

    end

end

RIRI Day Two: Richard Green on Institutional Repositories

Tuesday, August 12th, 2008

At the moment, I’m witnessing Richard Green from University of Hull masterfully dissecting the notion of an Institutional Repository.  Its a treat to have someone spell this stuff out step by step from such a grounded perspective.  One wonderful element of his presentation was to simply leave some time for people to explore ePrints and DSpace repositories [1][2][3] (from the perspective of public end users).  He made the point that people, myself included, often work with only one repository system (or no repository system) and neglect to simply explore the existing options.

In the midst of his presentation about the RepoMMan project, Richard posed an interesting pair of questions regarding the prospect of giving users a private “My Repository” space for managing their stuff.  He asked us:

  1. What might a user want to get from “My Repository”?
  2. What might a user want to put into “My Repository”?

He allowed the room to ponder these questions for a while.  I must admit that I was left doubting my knee-jerk responses and in turn thinking a bit further about what users really want from systems like this.  Richard then reported that a survey of his users at University of Hull provided a resounding response.  His users wanted:  Storage (safe, backed up), Access (easy and from anywhere), Management (full version control), and Preservation (to know stuff is there when they want it, short and long term).  I found this to be much more straight forward than the responses I expected.

Richard then gave us a tour of the RepoMMan interface.  Some key characteristics of the systems are the fact that the web interface, which is implemented in Flex, mimics an FTP client (to provide familiarity) and the metadata editor uses Data Fountains to pre-populate objects with automatically generated metadata so that users can then review and revise existing metadata rather than starting from a blank form.

The presentation will continue this afternoon.  By the end of the week, Richard’s full slide deck for the presentation will be up in the RIRI repository.

At RIRI: The Red Island Repository Institute fires up

Tuesday, August 12th, 2008

The Red Island Repository Institute (RIRI), hosted by the University of Prince Edward Island (UPEI) has started with a bang.  Sandy Payette spent an entire day feeding the room with a wonderful mix of vision, software architecture, social context, and technical details.

Mark Leggott has put together a great event. There are people here from all over North America, and even one visitor from Australia.  Everyone has been enjoying the beautiful environs of Prince Edward Island and the quality of information being exchanged is top notch.  I particularly like the fact that Mark is “drinking his own kool-aid” by setting up a Drupal/Fedora site for the institute.

This should be a great week.

Fedora Solutions Integration Council

Thursday, July 3rd, 2008

Picking up from the ideas in The Missing Sync for Fedora Commons, I’ve been talking with Thorny and Sandy at Fedora Commons about creating a Fedora Solutions Integration Council.  We haven’t quite figured out the structure of it, but the ideas are coming together pretty quickly. Bottom line, the council’s responsibility is to help everyone make informed decisions and support each other’s work.  

 As a first stab, I’m putting effort into three things:  

  1. bring together the streams of communication (ie. blogs, irc, etc) 
  2. help projects find and connect with others who are doing similar work
  3. identify the major themes: problem areas, innovations, exciting solutions, etc.

Ultimately, I hope this will allow us to shed light on the various avenues of exploration in Fedora-centric application development.  So many people are doing such interesting and exciting work.  It’s time for us to talk more openly and enthusiastically about it.

The other Fedora Solutions Councils are organized around themes like eScience, Museums, and Education.   In contrast, the Integration Council is aimed at addressing the cross-cutting concerns of application development.  We all have to deal with things like access controls, scalability, and workflow.  The best solutions to these types of challenges are often applicable in many contexts, regardless of whether you are an eScience project or a small humanities archive.  Our aim is to get as much information flowing between developers as possible.  I want to let developers decide for themselves which ideas apply to their work.

Watch this space. 

The Missing Sync for Fedora Commons

Thursday, July 3rd, 2008

Last month I picked up a Palm Centro. I quickly discovered that you can’t sync Apple’s calendar and address book applications to Palm OS without a $50 product called “The Missing Sync“. Within a week I had exchanged my Centro for a small, black Samsung dumbphone.

Since then, the topic of synchronization has come up repeatedly in my life.

On my laptop, I’m finally looking into synching Apple iCal with Google Calendar.

At home, I’ve started using Dopplr to figure out travel plans with my family and friends.

In my work life, I’ve started recognizing the fact that I actually play a sync role in the Fedora Commons community. I’m passionate about helping people use Fedora, so I’m constantly asking developers “How did you do that?” or “What went wrong? How did you fix it?”. This has naturally lead me to conversations where I find myself saying “Oh! You should talk to XXX project about the work that they’re doing. It’s right down your alley.” or “I think that someone has already solved that problem. Let’s ping the fedora-users list before we reinvent a wheel.”

I like this new theme. It fits with the way I want to operate in the world.

Fedora Commons is a community-driven project. The team in Ithaca has taken great strides to stabilize and facilitate community process. [In fact, the footwork and brainwork that Sandy Payette has done behind the scenes this year is facinating, but that's a topic for another post.] They now have a Chief Architect (Daniel Davis), a Director of Communications (Carol Minton Morris), and a Director of Community Strategy (Thornton Staples). When these three talented people joined Fedora Commons, I thought “Phew! Problem solved.” What I didn’t realize was that there is still a missing link.

I’ve learned that there is only so much that a centralized organization can do to synchronize community efforts. Ultimately, you still need people who slosh around in the morass of innovations, workarounds and hacks in order to find those gems of best practices and well designed solutions. More importantly, you need those people to put momentum behind the good ideas and ensure that they filter back into the common pool.

Until this month, I had not realized how important this is to community-driven open source software development. There are tons of projects out there who are more than happy to collaborate, to share ideas and solutions, and even to contribute code. However, one thing is consistently true about these projects: their hands are full. They rarely have time to look over each others’ shoulders and trade notes, let alone figuring out how to share their code.

There are, of course, notable exceptions to this rule. For example, Gert Pedersen has done an admirable job of maintaining GSearch and making it generally useful for everyone. Every time a new use case or problem crops up, he usually has a solution on SourceForge within a few weeks.

What about all of the other work that people are doing?

As of late, projects have started inviting me to play an advisory role in their Fedora work, to be their missing sync tool. I’m really excited about this because ultimately it means that I have an opportunity to help more people play to their strengths. I hope that by playing this role, I can help ensure that more great solutions find their way directly into Fedora itself while other solutions join the constellation of tools, services, and documentation that populate the Fedora Commons galaxy.

Inroads to Application Development for Fedora Commons

Wednesday, May 14th, 2008

In recent months, I have increasingly found myself connecting different Fedora developers with each other. Due to the ongoing upsurge of interest in Fedora Commons, and the constant increase in the number of projects using Fedora, it’s difficult to keep track of who is doing what.  Last year, after a lively BoF at OpenRepositories, I created a page on the Fedora wiki listing Fedora User Interface Projects.This page is useful to read, but it doesn’t really answer the question “Where do I start if I want to create my own Fedora client app?”

The Fedora commons team is working hard to set up stable channels of information to help people answer questions like this.  In the meantime, here is my quick rundown.  This is not meant to be a definitive reference.  I’m just rattling off the most salient points from the top of my head in the hope that someone will find it useful.

Java Solutions

Fedora’s APIs are entirely webservice based, so Java is not actually a necessity.  Nonetheless many projects choose to implement their client applications in Java since they already have a Java stack in play on their servers.  

The Fedora Client (distro)

Fedora itself is distributed with a Java Swing GUI client called fedora-admin.  This is a good place to get raw components for a client app.  If you download the source code for the fedora distribution, you can re-use the SOAP stubs from the fedora-admin client to connect your applications to Fedora.  This pretty much gives you the same code that you would get by generating Java code from Fedora’s WSDL, but I hear that it has been tweaked by the Fedora dev team to work more smoothly.

Muradora

Muradora is a Java application created by the DRAMA team in Australia.  Muradora started out as a proof of concept for the DRAMA Authentication and Authorization middleware.  Since then, it has become a prominent end-user client for Fedora.  Muradora does a good job of using Fedora’s existing features as much as possible.  It creates really clean Fedora objects, and will also recognize new Fedora objects automatically as long as you use some very simple, re-usable RELS-EXT relationships to arrange your objects into collections.

If you want to create your own client application, Muradora is a good place to look for samples of best practices for creating and using Fedora objects. 

Struts & Spring

As the Fedora Wiki reflects, there are a number of projects using Struts and Spring to create Fedora client applications.  Muradora is one of those projects.

Grails, Wicket, etc.

I’ve been hearing a lot about the new RAD-inspired Java frameworks like Grails and Apache Wicket.  At the moment, I’m not aware of any Fedora projects that are using these frameworks.  I’m sure that will change soon. 

PHP Solutions

Fez

Fez is a prominent PHP-driven frontend solution for Fedora.  It is maintained by developers at the University of Queensland in Australia.  I haven’t looked at the Fez code in over a year, but that might be a good place to start if you want to create your own PHP client for Fedora. 

Drupal

Numerous projects are currently implementing Drupal modules for Fedora.  I’ve heard about most of them informally, so I can’t list the organizations, but I can tell you that the University of Prince Edward Island is one of them.  UPEI is hosting the Red Island Repository Institute this August.  I anticipate that there will be quite a lot of Drupal-centric skill sharing going on there. 

Python Solutions

Ben O’Steen’s work at Oxford

Ben is a rockstar.  I’ve already praised him in prior posts and, honestly, his work speaks for itself.  If you intend to do any work with Python and Fedora, definitely start by looking at his code.  I also recommend planning for the fact that you will almost definitely want to pull his innovations into your code on an ongoing basis. The best place to get up to date information about Ben’s work is on his blog Less Talk, More Code

Plone

I have heard about at least one substantial project doing work with Plone and Fedora.  They are currently in early stages of development. 

Django

I’m surprised that I still haven’t heard about anyone using Django with Fedora.  This nifty framework by Adrian Holovaty hails from the world of online publishing.  It was originally developed in-house by the Washington Post.  Django seems like a really great framework for doing rapid application development.  Due to its history in publishing, it seems like a perfect fit for quite a few Fedora use cases.  

Though I still have not heard about any active projects using Django with Fedora, I have inspired some local developers here in Minneapolis to push for it in their own organization.  Watch this space.

Ruby Solutions

RubyFedora

I’m pleased to let you know that the RubyFedora gem is available for download from RubyForge.  This library allows any Ruby application or script to use the full REST API of Fedora 3.0.  We here at MediaShelf wrote both Fedora’s REST API and the RubyFedora gem that consumes it, so the two work together really well.

ActiveFedora

ActiveFedora is a work in progress.  When it’s ready, we will post it on RubyForge alongside RubyFedora.  The intention of ActiveFedora is to allow Ruby developers to interact with Fedora repositories in the same ways that they currently interact with relational databases.  We want developers to think of Fedora as “just another component” in their user-driven applications.  We are achieving this by imitating Ruby on Rail’s ORM and database management features. 

The ActiveFedora gem will lay the foundation for true, sustainable, rapid development of Fedora client applications.    It will also provide a structure for figuring out and re-using best practices around Fedora’s Content Model Architecture.

acts_as_fedora

acts_as_fedora is a Rails plugin that ties Fedora into an existing Rails application.  Rather than replacing the database connection completely, it adds hooks into Fedora within your database objects’ lifecycle.  I’m not sure of the status of this gem.  It may not be open source, and may not be actively supported.  If anyone has more information, let me know.

ColdFusion Solution

At the recent JA-SIG conference, I learned about a project at Cornell University that is using ColdFusion to create a Fedora client application.   For more info, check out the page on the JA-SIG website

thinking about developer happiness at JA-SIG

Monday, April 28th, 2008

Five years ago developers spent a lot of time speaking SQL when they talked about writing a database-driven app. Since then, we have enjoyed the arrival of modern webapp frameworks with good ORM. Now developers spend very little time talking about SQL. Instead, they talk about higher level problems and application-specific challenges. In other words, we are able to spend developer resources in more potent ways. This has played a major role in the recent upsurge of innovative, user-driven apps.

Right now I’m sitting in Christopher Brown’s JA-SIG presentation about writing a Fedora App in ColdFusion. Christopher has done valiant work. He’s a trailblazer. More importantly, he has a functioning application that is now in active use. However, I can’t help but feel like we’ve backpedaled five years in terms of developer experience. Christopher’s slides are dominated by Fedora-specific structures and the terminology from Fedora’s APIs. I feel like I’m back in SQL land. Being forced to think about this boilerplate code is an unnecessary burden for developers. It prevents them from fully taking advantage of Fedora’s power.

Now that we’ve had RubyFedora in hand for a few weeks and have been playing with ActiveFedora for a while, it’s really encouraging to be reminded what the alternative is. I’m so eager to set free developers like Christopher, to let them forget about the boilerplate code, so that instead they can invent new ways of helping users do crazy stuff with their digital content.