ELAG 2001 - Integrating Heterogeneous Resources - Prague, 6-8 June 2001

WORKSHOP #7

Trends and Tools for Integration of Resources 

Workshop Discussion Paper

Scope

The Internet is here. So is the World Wide Web (www). Every computer system seems to use the IP protocols for communication, and the number of servers on the Internet is growing rapidly. It is likely to be true that most large libraries and quite a few smaller ones are attached to the Internet in a way where they could share their information resources with others.

These facts may open for development of systems where several systems and/or information resources in some way or another are integrated. They may even give the user the impression of being one. Based on such services, the user could benefit from having a global view on the distributed information resources in an even more global community.

What do we mean by integration of resources? In the context of this workshop, the term will describe a context where numerous computing systems can work together, share information objects, or act as one system towards a user. While investigating this area, we will talk about distributed as opposed to centralized systems and services. As most systems nowadays tend to be networked and also base their communication on the use of Internet protocols (IP), we can ignore the question about local as opposed to wide spread systems.

Q: What do the workshop define as integration of resources?
Q: Examples?
 

Open Source Software

One of the Internet cultures that has gained new force, and which has significance for many environments, is the work done under the umbrella one might call “open source”. Among the more long-lasting groups of Open Source one find the Free Software Foundation (http://www.fsf.org) with a long tradition in the Internet community. More recently one has recognized the adoption of this thinking within the commercial world (http://www.opensource.org). Open Source means that the source code is available to users, and might be changed and improved to meet the demands of the users context. But it also enables user communities to establish a common framework of software solutions free for all.

Q: In what way could and should the international library community take the advantages of Open Source?
Q: Which are the most significant examples of Open Source in the recent years, and how are they used in the library community?
 

Search engines

One of the most used, least formalized and most pragmatic ways of integration of recourses is the existence of search engines on the Internet. Even for a professional library user searching for information on the Internet, it is likely to be true that a user will use one of the large search engines like AltaVista (http://www.altavista.com), AllTheWeb (http://www.alltheweb.com), Google (http://www.google.com) or Yahoo (http://www.yahoo.com). These search engines are based on information harvested by software robots (harvesters, crawlers), and the search is performed on the content of the information and on metadata generated during harvesting. Automated indexing of such collections of information is based on a set of techniques, whereof only a few are known in the traditional library.

Q: Can the large search engines serve as integrated or integrating resources for libraries worldwide?
 

The XML family

Typically, a search result given by search engines like the ones mentioned above will get some relevance weight based on where a given search term is found within a document. Analyzing a HTML-document will typically give words bound by a <TITLE> tag higher relevance weight than the same words in the <BODY> section. As a consequence of this, the users way of using the HTML-tags may have significant impact on the result of on a search in one of the search engines. HTML seems to be a less structured way to describe semantic structures, while new standards like XML and XSL (see workshop (http://alex.stk.cz/elag2001/Workshop/ws8.html) on XML for more discussion and references) seem to be better suited.

Q: What impact may the XML-family of standards have on integration of resources?
 

Metadata standards

From the libraries point of view, focus for integration of resources has been on metadata. Much work has been spent on definition of common metadata formats (MARC, Dublin Core) and frameworks (like FRBR), and also on protocols for standardized interfaces against OPAC’s (like Z39.50).

Q: What experience do the workshop participants have with respect to open interfaces to their OPAC’s?
 

Open Archives Initiative

One of the recent and practical initiatives in the library community is the Open Archives Initiative. This initiative aims at creating a working standard for sharing of metadata with an interface for harvesters to collect metadata on the specified format. This effort could facilitate the dissemination of metadata, enabling service providers to establish search services covering potentially a huge set of repositories. The architecture and protocol specification is based on the model containing (at least) two parties to establish a search service. These are the metadata provider and the service provider.

Q: How can the library community make benefits based on the OAI model?
Q: What are the pro’s & con’s to this approach?
 

Identification

To be able to share resources of any kind, one must be able to identify the resources in question. Obviously enough.

One of the revolutionary components of the World Wide Web is the identification of services, resources and object on the Internet. The identification mechanism is as you all know based on what is called an Uniform Resource Locator (URL). On the other hand, this identification mechanism has its limitations. Most important is the fact that it implicitly does location. Which in turn has the consequence that it must be changed if the location is changed. And the location is changing. Which implies that the identification is not persistent over time. 

There is obviously a need for a persistent identification mechanism. PURL was an early bird on this arena, but the current choice seems to be between the Uniform Resource Name (URN http://www.ietf.org/html.charters/urn-charter.html) and the Document Object Identifier (DOI http://www.doi.org).

Q: Which name mechanism of the above is preferred – and why?
Q: What do we consider to be the best approach(es) to integrate library services on the Internet?
Q: Could libraries play a role with respect to identification based on new standards like URN or DOI – for example as name authorities?
 
 

Content, digital objects

As the amount of information available in digital formats increases due to digitization and the collection of born digital content, the user will want access to information on digital formats. The information society also tend to be far more multimedial than before, which in turn means that the library eventually has to handle all kinds of information. The exchange of digital objects between libraries and between a library and its user must rely on a common view of attributes of the object.

Q: How can a standardized object model help integration of resources?
Q: Do we believe in such a model?
 

WWW and the Semantic Web

The World-Wide Web (http://www.w3.org) as introduced just a few years ago, based on simple standards as HTTP and HTML, has had enormous impact on the use of the Internet. Wide-spread “standard” software for viewing and browsing (web browsers) make life easier for service providers and software developers.

Q:  Can primitive protocols and standards like HTTP and HTML help us integrate resources? If yes, how?

One of the ongoing development areas of the WWW is what is called the “Semantic Web” (http://www.w3.org/2001/sw/Activity). The Semantic Web work aims at creating environments for automation, integration and reuse of data across applications.

Q: What is the workshops view on the ongoing work on Semantic Webs?
Q: In what way can the library community take part in this work?
 

Metadata embedded in digital objects

Some international and significant standardization activities on digital objects include the use of embedded metadata in the digital objects. One example of these is the work done by EBU (the European Broadcast Union) on standardization of an audio file format. This format, named the Broadcast Wave Format (BWF), has a metadata chunk in the beginning of the file. A file on BWF format carrying its own metadata may travel across networks, and receivers of such a file can easily extract its metadata. 

Q: In what way can embedded metadata be of use in libraries?
Q: In what areas can we expect to make use of this facility?

 


Home | Invitation | Program | Information | Progress report form | Submitted Reports | Workshops | Registration form | Participants by name | Participants by country