Jayne Dutra Enterprise Search: Rethinking it in a Web 2.0 World
Jinfo Blog

1st November 2007

By Jayne Dutra

Abstract

In the Land of Web 1.0, we would search by looking for a small box in the top corner of a website. The user would be expected to know a magical keyword or some other bit of information that would unlock the door to a cascade of results ready to be winnowed by hand into piles of carefully hoarded treasure. Publishing to the Web was controlled by a few individuals called 'webmasters' and data was carefully guarded behind moats and firewalls in castles called database stores. Search engines were composed of spiders that crawled the Web to find pages rendered in HTML, which made them understandable only to advanced human intellect and not re-use friendly. Search had to 'stink', which always seemed a bit unsanitary.

Item

In the Land of Web 1.0, we would search by looking for a small box in the top corner of a website. The user would be expected to know a magical keyword or some other bit of information that would unlock the door to a cascade of results ready to be winnowed by hand into piles of carefully hoarded treasure. Publishing to the Web was controlled by a few individuals called 'webmasters' and data was carefully guarded behind moats and firewalls in castles called database stores. Search engines were composed of spiders that crawled the Web to find pages rendered in HTML, which made them understandable only to advanced human intellect and not re-use friendly. Search had to 'stink', which always seemed a bit unsanitary.

Today in a Web 2.0 world

Today things are different. Ordinary people publish blogs and have passionate electronic conversations in wikis. Data is out and about, turning up on iPhones, navigational devices in your auto and podcasts. Bits of content recombine and transform themselves into altered beings with new formats and sexy, fashionable looks. The Web is a movable feast with Twitter <http://www.twitter.com> parties materialising spontaneously as individuals find each other in both virtual and physical space. New connections from rich social interactions on YouTube <http://www.youtube.com> and Facebook.

<http://www.facebook.com> create vibrant energy that renews human discourse. Wisdom is collected, syndicated and documented in Wikipedia and Wikimedia. Rich media, photos, screencasts and other visualisations are tagged for sharing on Flickr <http://www.flickr.com> and del.icio.us <http://www.del.icio.us>.

Where is it all going and how do we, as the Web's virtual cartographers, help others find their way at a time when fellow travellers are empowered beyond our wildest expectations of only a few years ago? How can we add value to information retrieval systems within our organisations that enhances user experience and meets the increased pace of daily activity and multi-tasking? Web 2.0 has raised the bar for those of us involved in enterprise search.

Search is complex

More than ever, search developers are required to understand the core foundation of the organisation's business models. Individual aspects of services, products and processes are needed in formats that can be recombined to report past performance, current status and future 'what if' scenarios. 'Business Intelligence' was once confined to statistics on last quarter's sales, but now corporations want to understand why the business performed as it did, what was successful, what didn't work and how they can develop strategies to capture and retain market share.

Enterprise search is no longer a one-size-fits-all problem. Information retrieval is a complex area that is being increasingly seen as task dependent. In other words, how and why a user searches is directly related to what type of activity he is engaged in. Therefore search solutions must be designed around specific business problems that provide meaningful value to the enterprise. Users have been trained by Google to expect search results with lightning speed. They also want high precision without a personal investment in lengthy exploratory research. In other words, they want information to come to them no matter where they are or how they are connected to both the intranet and the world outside the firewall. Indeed, these differences are blurring more rapidly every day.

There is a cornucopia of new technologies available to help us reach these goals. IT departments and system developers can choose to implement company-wide authentication for seamless access to multiple repositories, enterprise messaging busses for information services, Semantic Web technologies for embedding relationships and collaborative portals with personalisation designed by the user alone or in teams as a natural outgrowth of work activities.

Capturing and leveraging user-generated metadata

Successful enterprise search today doesn't mean making keywords work well. It means creating a holistic information architecture designed for the enterprise that allows input and evolution by the users themselves. Ironically, this usually relies on the time honored and humble practice of generating metadata and controlled vocabularies that enable data connectedness and intuitive recall. For years, we've heard that users won't fill out metadata fields. Then how does one account for the phenomenal success of Flickr? If one enters a set of bookmarks in del.icio.us, doesn't that tell us something about the person's interests and background? New Web 2.0 technologies generate metadata in the wild that can be domesticated if we are wily enough to recognise the opportunity.

A revitalised corporate IT environment should provide a common entry point to multiple repositories with single sign-on capability, user qualification awareness, and a simplified interface. Metadata about people can be reconciled with metadata about objects and process to facilitate personalised content delivery. Knowing an employee's department and role implies something about the tasks associated with that employee. Relevant applications, syndicated feeds and better portlet integration enable customisation of activities and transactions needed by employees. Data should be available without regard to device or location thereby setting the stage for recall in handheld devices or mobile units.

The corporate information environment should be available to access by machines as well as individuals and utilise a common data reference model for improved data consistency. Federated searches, contextual results and composite data sets are all possible. Using new tools, users can enter metadata right into the browser which can be displayed by tag clouds and saved in a personal portlet. Search can be saved for individual or team use and subscribed to as an ongoing service. Graphical representations of results in charts or plots are a personal choice. Browsing by image, video clips or text are now interrelated and can be presented together for wider access by the user.

Foundation pieces and strategic approaches

In order to achieve the seamless integration of data to build our brave new world, a semantic layer that handles data reconciliation and unification of content sources is needed. Most experts recommend starting by understanding the business uses of content and creating a semantic representation of the target data that allows for recombination and presentation in a variety of outlets. The representation of enterprise data is expressed by the enterprise metadata specification and its associated taxonomy. One of the foundation pieces of the search team is to work with engineering system owners to see that the metadata core specification is incorporated into the searchable index. Working with system owners to coordinate data values can be phased over time. Early phases include mapping data fields to the enterprise standard in order to give systems time to adopt standards. Opportunities for systems to incorporate the standards arise when there is a major upgrade of the system or replacement of the system's technology.

Content resides in many places and in many formats. Unstructured data may be appropriate for natural language processing and entity data extraction that facilitate automated tagging. Folksonomies and tag clouds are examples of human tagging. The proposed solution set for an enterprise search task should encompass both these approaches. Objects will be tagged over time through both automated and human actions using the concepts around the Unstructured Information Management Architecture (UIMA).

Instead of implementing a Web crawler to randomly generate search results on arbitrary key- words, the approach of the modern enterprise search team is to leverage a strong information architecture infrastructure resulting in a unification layer for enterprise content. By utilising enterprise metadata standards, deploying reconciliation strategies with gold source vocabularies and building a clearinghouse for data collection, order can be brought to a chaotic information environment.

The ultimate goal is an information environment enhanced by metadata and served up through a number of rich user interactions facilitated by role based access. Unified enterprise search at my organisation is conceived of as a set of integrated systems utilising different types of technologies to provide information quickly and represented with a variety of visualisation techniques including charts, sliders for query definition, and thumbnails of engineering drawing families.

There are numerous benefits for the enterprise, from better information re-use including a higher percentage of winning proposals, shorter product development time, more effective resource management, better decision making and improved business agility. These benefits combine to make a stronger competitor in the marketplace and generate more success in the long run. That's a business case our managers can't afford to ignore.

The research described in this (publication or paper) was carried out at the Jet Propulsion Laboratory, under a contract with the National Aeronautics and Space Administration.


Related FreePint links:

« Blog