LIBR 202 – Section 18 – Dr. Bolin
Assignment # 3
December 4, 2011
Information Retrieval (IR) System Evaluation
Patricia Ayame Thomson
Evaluation of the Information Retrieval (IR) System
Underlying the J. Paul Getty Museum Web site
Home Page: Visual Impact
Bright contrasting colors framed in an aesthetically-pleasing arrangement, and an asymmetrical window design is an apt description of what most users will initially experience when visiting the J. Paul Getty Museum Web site. The Home page is headed with the all-encompassing umbrella title: The Getty.
The non-profit organization Web site is so expansive the Getty is large enough to be a conglomerate consisting of multiple museums, institutions, libraries, archival repositories, educational programs, foundations, and trusts, which in turn has its own collections, databases, catalogs, inventories, indexes, search portals, and search engines.
The well thought-out and complex infrastructure underlying the Web site is a perfect example of an extremely effective and powerful information retrieval system. The functionality of the Getty information retrieval system is strengthened by various implicit and explicit features that help facilitate and enhance relevant information retrieval. As Morville (2005) describes the following features that enhance the effectiveness of information retrieval systems: “Support for taxonomies, controlled vocabularies, faceted classifications, and rich semantic relationships is built into the infrastructure” (p. 132).
Any Web site’s primary purpose and usefulness is to serve as a reflection or an intermediary for a company, person, cause, location, idea, or an object that is being represented online. Although the visual appearance of the Web site is not one of the official evaluation criteria, the Getty Museum Web site exudes a classic, upscale, and striking visual image. Visually and functionally, the Getty Museum Web site succeeds admirably in capturing the essence of the priceless, timeless, and legendary art collection at the world-renowned J. Paul Getty Museum.
The fact that the J. Paul Getty Museum recently expanded to two different locations doubles the cost and effort of posting continuously updated, timely, and accurate information on the Web site about both museums. One of the Getty Museums is located at the original Getty residence in Malibu, CA along the Pacific Coast Highway called The Getty Villa. The other recently-built sprawling museum is called The Getty Center and located in Los Angeles, CA.
Since the Getty Web site has the responsibility to update and download accurate information about two sets of exhibits, activities, and events for two museums, I would imagine a full-time system designer, indexer, or programmer is employed on a regular basis. Consequently, one of the biggest challenges of the Getty Museum Web site is how best to convey two sets of information pertaining to each museum in a clear, accessible, and visually attractive way.
It is apparent from the appearance and execution of that the Getty Web site has a lot of funding that went into the multi-faceted and multi-layered Web design, programming, installment, and operation. Fortunately, with funding from the J. Paul Getty Trust and endowments, membership fees, admission fees, as well as magnanimous donations from benefactors, and as a result the non-profit organization can afford the online magnitude, depth, and scope.
The Getty Web site is a tremendously complex, expansive, and sophisticated information retrieval system with multiple databases, library catalogs, collection inventories, search engines, and archive repositories. To classify, store, and retrieve the many types of artwork (including paintings, sculptures, architecture, photographs, drawings, furniture, and many other incarnations) from antiquity to the present must be an enormously arduous and protracted endeavor.
In regard to the cost, the Getty organization is reputable, esteemed, and well-funded enough to create and copyright their own ontology of controlled vocabularies. The Web page claims that the Getty’s controlled vocabularies closely resemble the terms from the Library of Congress. The benefits and usefulness of the J. Paul Getty Museum’s online domain far outweigh the cost of creating and maintaining the Web site. In order to remain relevant in today’s age of technology, the J. Paul Getty Web site is a necessary marketing tool as well as a way to establish an invaluable online presence.
The J. Paul Getty Exhibitions page: http://www.getty.edu/index.html
Scope
The scope of the Getty Museum Web site is considerably wide and deep. Overall, the features of the Web site is multi-layered, multi-faceted, and extremely complex. Behind each tab claiming a category or subcategory, there is a wealth of information stored, and artfully executed when opened.
What Dr. Bolin (2011) describes as: “The front end of the Libary’s Web site includes the library catalog, but also aggregates into many other open access resources of all kinds,” also describes the front end of the Getty Museum(s) Web site, but at least, ten times larger in scope” (Lecture 13, Slide 9.)
Aside from the embedded databases, inventory indexes, archival repositories, and/or library catalogs, it is astonishing to see the enormous amount of information that can be stored and displayed collectively in a single Web site or domain. There is detailed information contained in layers and layers of categories and sub-categories. Even a novice user can tell that the J. Paul Getty Museum Web site is extremely well designed, heavily funded, and expertly constructed. The kind of detailed care and thoughtfulness in design is seen repeatedly and applied throughout the J. Paul Getty Museum Web site.
The J. Paul Getty Web site is incomparable in scope. For example, the Getty Web site’s platform includes multiple databases and search engines to represent every area of the Museum. For example, there is an open-access Web page providing users with many art-related links to access other databases. As another example, the special library collection is included under the main category of “The Getty Research Institute” (GRI) with their own library catalog. The collection archives also have their own database and search engine, and so on.
Just like Disneyland, there are so many rides it is improbable for a visitor to ride all the rides in one day,” similarly it is also improbable (as well as unnecessary) to mention every single subcategory in the expansive Getty Museum Web site. In summary, the scope of the Web site is extremely large, deep, and wide.
Smart Design: One Web page Providing Information About Both Getty Museums
In his book “Ambient Findability,” Morville (2005) states that: “Design has emerged as one of the world’s most powerful forces. . . Most of the places and objects that shape our experience have been designed by intention. And the Internet has created new frontiers for interaction, information, and communication design” (p. 103).
The Web designer for the Getty Web site found a clever and effective solution to the dilemma of serving two Getty museums located at two different places. Instead of creating independent Web sites (or even Web pages) for each museum, the designer splits the Web page in half to create two vertical columns. The side-by-side columns each represent one of the two Getty museums respectively. The left-hand column on the screen is headed: The Getty Center (Los Angeles) and the right column is headed: The Getty Villa (Malibu).
The concept of dividing a single Web page in half representing both museums is a cost-effective, efficient, and viable solution for the following reasons. Presenting exhibition information about two separate museums on one Web page saves the cost of human resource (web designer or indexer,) labor, and time required to develop and maintain two independent Web sites (or Web pages).
More importantly, the side-by-side presentation gives users the opportunity to view relevant information about both museum exhibitions simultaneously. Viewing the information for both museums at once empowers the user to compare exhibitions, without having to take additional steps to find the information in another place online.
For example, as long as Zipf’s “Principle of Least Effort” holds true, a user who finds information online about one of the Getty museums in one place, may not take the additional steps necessary to find information about the other museum. Furthermore, there is another possibility that a user who is satisfied with the information they found online about one of the museums, may not realize the other museum even exists.
Additionally, if the information is posted on two separate sets of current exhibitions and events in two different locations online (even in the same domain) user confusion is inevitable. As a result, providing information side-by-side for both museums on a single Web page is not only cost-effective, but the design also allows users to view information about both exhibitions at the same time
The single-page compromising design is the most effective way to mitigate confusion for the user by providing information regarding both museums on a single online source. As a result, the two-column design presenting information about both museums together on a single Web page works so well that the concept is duplicated throughout the Web pages for each subcategory under Visit. The effective two-column design starts at the Home page, and repeats throughout each subcategory under Visit. The following subcategories located one tier away and one lower from Home are: [Visit, Museum (the mirror image page,) Research Institute, Conservation Institute, Foundation, and Trust.] Upon closer examination, I noticed the subcategory of Visit is the only category that is not included in the main categories listed at the top of the Getty Web Home page, but is added (on the left-side) of the prominent headings listed horizontally above the large slide show window.
In conclusion, while reviewing the following subcategories for Visit (below), it is easy to understand why the distinguishing design presenting both museums together is the most efficient and clear way to impart the necessary information: [Exhibitions; Event Calendar; Hours, Directions Parking; Things to See and Do; Groups and Families; and En Espanol.] Consequently, the adjacent design supports disambiguation.
The J. Paul Getty Visit page (Hours/Directions/Parking): http://www.getty.edu/visit/hours/
Current Exhibitions and Installations
For example, on the Current Exhibitions and Installations page, the screen is divided in the center by two separate columns displaying each Getty museum respectively. The information displayed on both columns is identical in format.
Under both column headings: The Getty Center and The Getty Villa, the most prominently-featured attractions are the current exhibitions. After all, a large percentage of visitors attend the museum to see the exhibitions.
Under the current exhibition on both museum columns, there is a link to an additional feature: “See the past ten exhibitions at the Getty Center.” The feature is a list of links to the past ten exhibitions listed. Furthermore, there are two more links provided: “See all past exhibitions at the Getty Center” and “See future exhibitions at the Getty Center.) The exhibition titles are hyperlinks that users can click to access information about past and future exhibitions. Technology is amazing since users can view all past (and future) exhibitions from two museum collections from the comfort of their home.
The J. Paul Getty Exhibitions page: http://www.getty.edu/museum/exhibitions/
The Getty Museum Web site: Classification and Categories (or Facets)
The IR system in the J. Paul Getty Museum Web site has faceted classification.
Morville (2005) states the following: “Aided by the flexibility of digital information systems . . . We embrace faceted classification, using multiple fields or “facets” to describe the objects within our collection” (p. 127). For clarification, the following is the definition of faceted classification:
Faceted Classification:
A classification system developed through analysis of the fundamental characteristics of subjects by which they can be divided into subclasses. For example, in his Colon Classification, S.R. Ranganathan identifies five basic characteristics: personality, matter, energy, space, and time (abbreviated PMEST). In such a system, the notation representing a subject is created by combining the notations of its facets.
Online Dictionary for Library and Information Science (ODLIS):
http://www.abc-clio.com/ODLIS/odlis_f.aspx
Since the Web site is for an Art Museum(s), I expected the categories to be divided by either the art’s chronology, country of origin, or period (i.e. Renaissance, Egyptian, Impressionism, Post-Modern, etc.) On the other hand, the system designer of the Getty Museum Web site selected much more profound, meaningful, and salient categories.
The categories (or facets) and subcategories are appropriate and fitting for a non-profit organization with a Mission Statement promoting the inherent and intangible value of art and culture in society. As a result, the categories selected are in alignment with J. Paul Getty Research Institute’s Mission Statement copied below.
The J. Paul Getty Trust: Mission Statement
Mission Statement as stated copied from The Getty Trust page. I mention the distinction since every institution, foundation, museum, and library has its own Mission Statements.
The J. Paul Getty Trust is an international cultural and philanthropic institution that focuses on the visual arts in all their dimensions, recognizing their capacity to inspire and strengthen humanistic values. The Getty serves both the general public and a wide range of professional communities in Los Angeles and throughout the world. Through the work of the four Getty programs—the Museum, Research Institute, Conservation Institute, and Foundation—the Getty aims to further knowledge and nurture critical seeing through the growth and presentation of its collections and by advancing the understanding and preservation of the world’s artistic heritage. The Getty pursues this mission with the conviction that cultural awareness, creativity, and aesthetic enjoyment are essential to a vital and civil society.
The J. Paul Getty Trust Web page: http://www.getty.edu/about/
Six Main Categories or Facets
The six primary categories (or facets) that are the highest and broadest in the hierarchy are listed horizontally across the top of the J. Paul Getty Museum Home page. It is note-worthy that the most important institutions that comprise and support the J. Paul Getty non-profit organization and reflects the Mission Statement are selected as the main categories: Home, Museum, Research Institute, Conservation Institute, Foundation, and Trust.
Each main category (or facet) has links that connect to a set of the subcategories, and those subcategories often link to another subset of subcategories. The subcategories continue to narrow and become more specific as the search and navigation gets deeper into the system. The definition on ODLIS states the same concept as above yet described in another way:
Facet
In indexing, the entire set of subclasses generated when a class representing a subject in a classification system is divided according to a single characteristic, for example, the subclasses “children,” “adolescents,” and “adults” generated by the division of the class “people” according to the characteristic “age.” The number of subclasses depends on the specific characteristic applied.
Online Dictionary for Library and Information Science (ODLIS):
http://www.abc-clio.com/ODLIS/odlis_f.aspx
Additionally, the amount of information expands as the location of data retrieval is layered further and deeper in the faceted classification hierarchy (or taxonomy). On the Getty Web site it is noted that their collection is from antiquity to the present. The expansive scope and the amount of data that has been classified and stored in the IR system are difficult to imagine. Apparently, the IR system’s metadata is fully and exhaustively assigned (and tagged) in order for search queries to be successfully retrieved from such voluminous information from the beginning of civilization to the present.
As an example, The Getty Research Institute Web page is one class removed from the main category, Home and one class lower in the taxonomy. As a result, The Getty Research Institute is considered to be a subcategory (or subclass). And the subcategory: The Getty Research Institute is then divided into another set of seven subheadings appearing as follows: [Exhibitions and Events, Special Collections, Library, Search Tools and Databases, Scholars and Projects, Publications, and About the Getty Research Institute (GRI).]
The J. Paul Getty Museum Web site: User Model
The User Model for the J. Paul Getty Museum is integrated in the Getty Trust’s Mission Statement (or Statement of Purpose.) The following sentence from the Mission Statement above epitomizes the definition for the User Model directly: “The Getty serves both the general public and a wide range of professional communities in Los Angeles and throughout the world.”
Based on the User Model, the underlying philosophy of the Getty Museum is open-minded and all-inclusive. As a result, the User Model includes both the general public, as well as the professional art communities in Los Angeles and throughout the world. Overall, the User Model attempts to reach out to a wide and expansive audience.
Interface Features: Prominent Slide Show
The Getty Museum Home page interface is attractively and elaborately designed including a prominently-featured slide show. A large window takes up three-quarters of the entire screen displaying an automated and constantly changing slide show.
By presenting constant motion on an ordinarily static Web site, the slide show is intended to attract the user’s attention. Morville (2005) emphasizes the point by quoting: “Gary Marchionini of the UNC School of Information and Library Science explains: the IR problem itself has fundamentally changed and a new paradigm of information interaction has emerged” (p. 58). What could be more appropriate than the use of multi-media on a Web site representing art museums?
For example, on a highly-visible large window, five slides appear alternately in approximately ten-second intervals. Each slide corresponds with one of five distinct topics. For instance, the current slide show topics are: Pacific Standard Time, Modern Antiquity, Lionel Feininger Photographs, Getty Search Gateway, and Joint Project of the GCI (Getty Center Institute) and the Getty Foundation. The first three slides provide information about current exhibitions displayed at both Getty museums, and the last three slides provide information about a joint project with the Getty Research Institute and the Getty Foundation, and a search portal to multiple Getty databases.
One of the slides in the slide show prominently depicts the search portal called the Getty Search Gateway. Users are able to access various Getty repositories and databases through the search portal.
In the slide show, an identifiable image depicting the current exhibition: “Pacific Standard Time” appears on both the main Home and Museum pages, and the image is carried out through the Web site for artistic continuity and visual reinforcement of the theme.
Another feature worth noting located at the bottom-right side of the slide show has five functional buttons next to a small caption saying: Exhibitions, events, and online archive. The operational buttons allow users to break the slide show’s automated cycle and manually link to the user’s slide of interest.
The convention of using a slide show on the interface is maintained throughout the subcategories and not only the main categories. From one of the main categories named Museum, the user can link to any of the subcategories by clicking on the appropriate tab. The subcategories are listed on tabs lining (left to right, horizontally, and located about one-quarter away from the top of the page) are: Exhibitions, Collection, Education, Research and Conservation, Publications, Public Programs, and About the J. Paul Getty Museum. If the user follows the links (or string) starting with the main category Museum, all of the Web pages for the subcategories under Museum continuously have the same interface, which is namely a large window prominently featuring a slide show.
The J. Paul Getty Home page: http://www.getty.edu/index.html
Interface Design: “Mirror-image/Reflection”
In reference to the description of Home page interface above, I found something very interesting. When I clicked on the tab named Museum located next to the one named Home, I realized the Web page for Museum is the mirror-image, reverse, or reflection of the Home page.
The concept is intriguing and I wonder what the designer’s thought process is behind the interface design. Whether the reverse image on the Home and Museum Web pages is based on the designer’s convenience of not having to create a new Web design, cost-efficiency, or any other programming reason, I found the concept unique and worth noting. If so inclined, please compare the mirror-image by clicking on the link provided above to the Home page and the link provided below for the Museum page.
The J. Paul Getty Museum page: http://www.getty.edu/museum/
One-quarter of the Web page without the Slide Show:
In contrast to the slide show’s large window, the remaining one-quarter of the width of the Home page (on the left-side of the screen) there are three small windows arranged in a vertical row.
Inside each of the three windows is a black-and-white still photograph, and the caption underneath announces other activities offered at the Getty Museums. Presently for example, the caption under the top black and white photograph reads: Studio Visits – (Meet the Artists,) the middle photograph’s caption reads: Film Series – (Artists in Film,) and the bottom window’s caption reads: Family Fun – (Free Events and Programs).
As I was examining the Web site, I discovered a quirky discrepancy between the Home and Museum Web pages that are seemingly identical in a mirror-image. On the Museum web page which is a reverse image of the Home page, the third window down on the non-slide show side of the screen is a caption for a book titled: “Books: A Living History” by Martyn Lyons (2011).
At first glance, I thought it was a link to the Getty Library because the photograph portrayed a row of books on a shelf. However, when I clicked on the link I discovered it is a separate Web page advertising the hardcover book: “Books: A Living History” for $34.95. On closer inspection, I discovered the book is published by Getty Publishers. In addition, when I noticed the heading on the Web page it said The Getty Museum Store. Somehow, I followed the link with the books and before I knew it I was at the museum store page! What a clever way to get the user to visit the museum store and purchase their products.
In addition, I noticed The Getty Museum Store is linked to almost every page of the Web site. Not to disappoint them, I had to check out the store. When I was browsing through the products, I realized the museum store Web page also has its own implicit database and search engine. Furthermore, I also realized the enormity of the Getty Web site’s expansive scope and tremendous power. As Morville states: “Today’s marketplace offers opportunities for interaction, insight, and innovation unseen since the ancient bazaars of spices, silks, and magical stones” (p. 102).
For various other events offered at the Museum, the user can find further information by clicking on the main category: Museum followed by clicking on the Public Programs subcategory. The Public Programs link contains another subclass including: [Lectures and Conversations; Performances and Films; Artist’s Studio Visits; Courses and Demonstrations; and Tours and Gallery Talks.]
The J. Paul Getty Home page: http://www.getty.edu/index.html
Additionally, the amount of information expands as the number of facets also multiplies with each subclass layer, and the location of data retrieval is layered further and deeper in the classification hierarchy (or taxonomy). On the Getty Web site it is noted that their art collection is from antiquity to the present. The expansive scope of the amount of data that has been classified and stored in the IR system is so immense it is difficult to imagine. Apparently, the Getty IR system’s metadata is fully and exhaustively assigned (and tagged) in order for search queries to be successfully retrieved from such voluminous information from the beginning of civilization to the present.
As an example, The Getty Research Institute Web page is one class removed from the main category, Home and one class lower in the taxonomy. As a result, The Getty Research Institute is considered to be a subcategory (or subclass). And the subcategory: The Getty Research Institute is then divided into another set of seven subheadings appear as follows: [Exhibitions and Events, Special Collections, Library, Search Tools and Databases, Scholars and Projects, Publications, and About the GRI (Getty Research Institute).]
Current Exhibition: Online Information
(E.g. Current Exhibition: “Pacific Standard Time” at the Getty Center)
The user can get to the Web page called “Current Exhibitions and Installations” presenting information for the exhibitions at both museums (the Getty Center and the Getty Villa) in the following two ways. Navigating from the Home page, click on the Visit tab then click on the Exhibitions tab. Or from the Home page click on the main category: Museum, then click on the Exhibitions tab, either way it takes two steps to arrive at the Web page mentioned-above.
For example, the current exhibition is called “Pacific Standard Time” at the Getty Center, and consists of the following four independent exhibits: “Crosscurrents in L.A. Painting and Sculpture in L.A. 1950-1970,” “Greetings from L.A. Artists and Publics. 1950-1980,” “From Start to Finish. De Wain Valentine’s Gray Column” followed by the last exhibit: “In Focus Los Angeles. 1945-1980.”
On the “Pacific Standard Time” Web page the independent exhibits are depicted in approximately one-quarter equidistant sections horizontally across the page. The visual image of the exhibition online is black and white for an overall monochromatic theme, accented by primary colors particularly chrome yellow and fire engine red. For the sake of visual continuity, the independent exhibits are also intentionally represented in monochromatic hues.
One-quarter of the Web page is featured as hyperlinks linking the user to the four exhibits under the main heading: “Pacific Standard Time.” For instance, if the user clicks on a particular exhibit: “Crosscurrents in L.A. Painting and Sculpture in L.A. 1950-1970,” the system will take the user to a Web page that provides information specifically about that exhibition.
On the right-bottom corner of the same Web page, there is small window to view a short film about the exhibition. When the user clicks on the Start arrow, the 11:42 minute film starts to play on the Web page and describes the exhibition. Underneath the film’s window the tiny caption says: “Exhibition Film: Learn about the birth of the L.A. art scene from the people who shaped it.”
The film about the exhibition also advocates the concept of collaboration and shared standards for the museum community (ie: LACMA, MOCHA, Metropolitan, Smithsonian, Louvre, etc.) Dr Bolin (2011) stated: “That [shared standards] is a valuable concept for librarianship. Shared norms and standards are what keep us going. We have shared standards for cataloging, and standards like z39.50” (Lecture 13, slide 5).
Relevance
In regards to relevance, Meadow (2007) explains why conventional and mathematical measurements (i.e. metric) fail to be appropriate measures for the following reasons: “In information retrieval (IR), we do not have the equivalents of these physical measures . . . As we shall see, there is no universally accepted way to measure the relevance of a text to a query (another text)” (p. 317).
As a result, Meadow (2007) suggests the following: “The term evaluation may mean the same as measurement or may be used to refer to a composite metric applicable to a system as a whole” (p. 318). Meadow argues his point as follows: “To speak of evaluation of an information retrieval system is not meaningful without further definition. We can divide the information retrieval measure into three broad categories: performance, outcome, and environment” (p. 318).
According to Meadow’s (2007) measure of performance, outcome, and environment, the Getty Museum’s Web site and information retrieval system excels in most every aspect of measure. Additionally, the information retrieval system underlying the Getty Museum’s Web site stores, aggregates and discriminates, and retrieves enormous amounts of information in an information retrieval system with a powerful search engine, effective function design, and multiple databases and search engines.
I found one of the best definitions of the term “relevance” in the Advanced Search Tips page on the Getty Museum Web site stating as follows: “Relevance is determined by number of matched search terms, the proximity with which terms are located, and the importance of the field in which terms are found.” Since every database and search engine has slightly different infrastructures, so are the instructions and search tips to use them.
The Getty IR system is so enormous, that multiple institutions and foundations within the Getty organization possess their own database and search engine. For example, The Getty Research Institution also has its own database and search engine, as well as its own User Model and Mission Statement.
Precision and Recall
As Morville (2005), Meadow (2007,) as well as other online professionals and consultants recognize, the two most important functions in an information retrieval system are precision and recall. In addition, precision and recall is one method of measuring the rate of effective searches. Precision measures the rate of accuracy or relevance of retrieved results in relationship to the user’s query. Recall measures the number of results of all the relevant retrievals in response to the user’s query.
In other words, “precision and recall” is similar in relationship to “quality” and “quantity.” Morville (2005) further describes the concept more eloquently and precisely as follows: “Precision and recall, our most basic measures of effectiveness, are built upon this common-sense definition. Precision measures how well a system retrieves only the relevant documents. Recall measures how well a system retrieves all the relevant documents” (p. 49).
Morville (2005) elaborates further: “The relative importance of these metrics varies based on the type of search.” As an example, he describes: “For the sample search in which a few good documents are sufficient, precision outweighs recall” (p. 49). In other words, the above-mention concept brings to mind again Zipf’s “Principle of Least Effort” when users want the most important, convenient, and fastest access to information (Morville, 2005, p. 44). Morville continues to explain that precision is crucial when the user knows the information already exists (which is called a Known Search,) and he describes the significance as follows: “Precision is even more important for the known-item or existence search in which a specific document (or web site) is desired” (p. 49). This type of search has one correct answer. Finally, Morville cites the last example of the most common search: “For the exhaustive search when all or nearly all relevant documents are desired, recall is the key metric” (p. 50).
Morville (2005) claims: “The upshot of all this analysis is that while recall fails fastest, precision also drops precipitously as full-text retrieval systems grow larger” (p. 52). In other words, as the system gets larger and stores a greater number of documents, the system’s ability to retrieve documents from memory began to fail faster, but the level of relevance also begins to fail as the system becomes inundated with more and more information. Morville conveys that: “The larger system returns too many results with too many meanings” (p. 53). In the face of obstacles, Morville suggests there are things we can do to improve the system. For example, “That’s where metadata enters the picture. Metadata tags applied by humans can indicate aboutness thereby improving precision,” claims Morville (p. 53.) Thus, the more detailed and complete the aboutness of the entity is assigned and described in the metadata, the more precisely and efficiently the system is able to gather similar documents together (aggregate,) distinguish the ones that are not relevant (discriminate,) and achieve successful retrieval for the user.
Getty Search Gateway
On a Web page headed: The Getty Search Gateway, a caption on the right side of the page asks: “What is The Getty Search Gateway?” The answer is stated as follows: The Getty Search Gateway allows users to search across several of the Getty repositories, including collections databases, library catalogs, collection inventories, and archival finding aids. The Getty Search Gateway contains ten related databases and they are all owned by the J. Paul Getty Trust.
Above the “Search Box” it says: “Bringing the vast, ever-expanding Getty resources to researchers, scholars and educators. Enjoy your exploration.” The subcategories under the Search Gateway category are: [Search; Gateway Home; Search History; and Help.]
Search Gateway Help Page
In her article, “Library Catalogue Users Are Influenced by Trends in Web Searching,” Susan Haigh (2006) states the following: “Because users do not generally know or care about the structure of a bibliographic record, and many have little concept of what a library catalogue is for or what it contains, Novotny suggests that user instruction needs to address these basics.”
(Web link:) http://ejournals.library.ualberta.ca/index.php/EBLIP/article/view/56
Primarily due to the truth in Haigh’s (2006) statement above, I admire the extent to which the Search Gateway Help page provides extensive, useful, and detailed searching tips. Suggestions to enhance and facilitate successful search retrievals are presented in an articulate and clear way, including visual aids as helpful references. I found the search tips on the Search Gateway Help page to be so helpful and useful that I copied them below for future reference in librarianship.
The Getty Search Gateway Help page: http://search.getty.edu/gateway/help.html
Search Gateway Help Tips
Help Topics
- Getty Search Gateway vs. Getty.edu Website Search
- Browsing with Getty Search Gateway
- Searching with Single Terms and Phrases
- Using the Filters
- Advanced Search Tips
Getty Search Gateway vs. Getty.edu Website Search
The Getty Search Gateway tool is not the same as the search tool that is available on most pages of the getty.edu website. This search box on getty.edu only searches Web pages on the Getty website, and includes just a small portion of all resources in the Getty’s collections and databases.
Browsing with Getty Search Gateway
From Getty Search Gateway Home you can choose to browse resources by source of repository or type of resource. On the results page, a set of filters on the left side of the page can be used to further refine the results.
Searching with Single Terms and Phrases
Enter any search term in the search box. Use quotations to indicate when two or more words should be searched as a phrase, e.g., “oak tree.” The search engine will look for results that include the exact words or phrases entered, as well as words with the same root, or ‘stem.’
Example: Enter paint in the search box and the search engine will also search for painter, painting, paintings, and paints.
Using the Filters
Filters on the left side of the page can be used to refine your search.
Example: Click on the [+] sign to expand (or aggregate) filter categories.
Click on the [-] sign to narrow (or discriminate) filter categories.
Click the [+] sign next to a filter type to open the list of categories and see how many records were retrieved in various categories.
Feature: Next to each topic (or category) listed on the left-side of the page, numbers appear to indicate how many records fall into each category.
The five filter types include the following categories:
- Type—the formats of the resources; e.g., book, painting, drawing, manuscript, photograph
- Topic—subject matter of the resources; e.g., portraits, natural world, George Washington, European history, plants, gods
- Name—the authors, creators, makers, or publishers associated with the resources; e.g., Paul Strand, Joris Hoefnagel, Yoshi Shirahata, Chicago University Press
- Place—geographic locations associated with the resources; e.g., Asia, France, California, Roman Empire, United Kingdom
- Source—the Getty collection which is the ‘source’ of the resource record
- Highlights—pre-selected search queries provided for specific topics or collecting areas; e.g., new acquisitions, collections about Van Gogh across the Getty
Selecting a category, by clicking the check box next to it, will filter your original search and display only those records included in that category.
You can add and remove as many category filters as you wish—all of your filters are displayed at the top of the filter list, under “Your Filters.”
Example of Filter Use: This user filtered the original search (terms entered in the search box), by selecting the filters “flowers” and “people and occupations.” The resulting 15 records displayed include the 9 records from the category “people and occupations” and the 6 records from the category “flowers.”
Advanced Search Tips
Using Multiple Search Terms
Understanding how the search engine treats multiple search terms will help you refine your search. Results with higher relevance will appear higher on the results list. Relevance is determined by number of matched search terms, the proximity with which terms are located, and the importance of the field in which terms are found.
- 1 or 2 search terms—each search result MUST contain the term/s
- 3–5 search terms—each search result MUST contain the number of terms minus one. Results containing all terms will have the highest relevance and appear at the top of the results list.
Example: You enter 4 search terms. Search results must each contain at least 3 of those terms.
- More than 5 search terms—each search result must contain at least 75% of the terms. The system rounds fractions down.
Example: You enter 10 search terms. Search results must each contain at least 7 of those terms.
Indicate a Required Term
Use a + sign to specify a word or phrases that MUST be found in search results.
Examples:
+”apple tree” » retrieves only records that contain the phrase “apple tree”
+apple tree » all retrieved records must contain the word “apple.” Some retrieved records may also contain the word “tree”
Exclude a Term
Use a – sign to specify prohibited search terms from search results.
Examples:
print -monotype » retrieves records with the word “print,” but which do not contain the word “monotype”
irises -“van gogh” » retrieves records that contain the word “irises” but which do not contain the phrase “van gogh”
+apple -tree » retrieves records that must contain the word “apple,” but which must not contain the word s
A Different Boolean Operator
In search engines that apply the feature, Boolean operators function to aggregate (broaden) and discriminate (narrow) to retrieve the terms in the search query. According to ODLIS, Boolean operators are defined as:
Boolean
A system of logic developed by the English mathematician George Boole (1815-64) that allows the user to combine words or phrases representing significant concepts when searching an online catalog or bibliographic database by keywords. Three logical commands (sometimes called “operators”) are available in most search software: AND, OR, and NOT.
Online Dictionary for Library and Information Science (ODLIS):
http://www.abc-clio.com/ODLIS/odlis_b.aspx
The Getty Search Gateway’s databases and search engines have their own mechanism for aggregating and discriminating the pre-indexed metadata to match the user’s query.
In order to meet the user’s specified search requirement, the use of Boolean operators which include the terms “AND,” “OR,” and “NOT” are commands provided for users to aggregate (or broaden) and/or discriminate (or narrow) their search query.
Although the concept is the same, the primary difference between Boolean operators and the Getty Search Gateway’s enhanced search feature is the use of notations (or symbols) instead of words (or operators) “AND”, “OR” and “NOT” to communicate and enhance the command. Along the same vein, the Getty Search Gateway uses symbols (or notations) instead. For example, the symbol [+] is used to aggregate and the symbol [-] is used to discriminate the user’s search.
Morville (2005) quotes Alfred Korzybski stating: “Man’s achievements rest upon the user of symbols” (p. 119). Whether they are Boolean operators or the use of plus and minus symbols, the ability to aggregate and discriminate is one of the most essential elements to enhance successful retrieval.
In Dr. Bolin’s (2011) Introduction to Information Retrieval handout, she quotes Meadow, Boyce, and Kraft (2000) to make the following point:
Emphasize the importance of selectivity – only some of the documents in the collection are relevant to a given request, and not all the relevant documents are equally relevant. One of the important features of an information retrieval system is its ability to aggregate and discriminate – to find all and only those documents which will meet the information need.
Under the Advanced Search Tips page, one of the suggestions is to: Indicate a Required Term as follows. The instruction states: “Use a + sign to specify a word or phrases that MUST be found in search results.” Although it is not mentioned as such, the above search method is comparable to the Weighted Search method. As explained in Dr. Bolin’s (2011) Handout I under the topic: Taxonomy of Search Engines, the Weighted Request System is described as follows: “Each search request may consist of multiple terms with user-assigned weights attached to the terms.”
Getty Controlled Vocabularies
As Morville (2005) states eloquently: “We develop controlled vocabularies to manage the ambiguity of language. For our preferred terms, we define equivalence relationships to handle synonyms (variant terms that are equivalent for the purposes of retrieval) and we specify associative relationships to support links” (p. 129). In addition, Dr. Bolin (2011) provides the definition as follows: “Controlled vocabularies are a list of authorized subject terms that have been developed with a certain user model – to serve a particular audience or clientele” (Lecture 4, Slide 8).
Amazingly, the J. Paul Getty Trust owns four ontologies of controlled vocabularies and they were all licensed and copyrighted. Regarding the term: ontology Morville (2005) quotes from Berners-Lee stating:
In philosophy, an ontology is a theory about the nature of existence, of what types of things exist . . . Artificial intelligence and Web researchers have co-opted the term . . . The most typical kind of ontology for the Web has a taxonomy and a set of inference rules. (p. 131)
Web link: http://sciam.com
Also, the User Model for Getty vocabularies targets a lot narrower audience compared to the User Model for the Getty Museum. As stated in the Mission Statement, the Getty Museum serves the general audience and professionals in the art community in Los Angeles and around the world. In contrast, on the Getty Vocabularies Web page, the search system targets artists and art experts as their audience as follows: “The Getty vocabularies contain structured terminology for fine art, architecture, decorative arts, archival materials, and other material culture.”
The Getty Vocabularies are comprised of four different ontologies listed below. All four ontologies are created, owned, and copyrighted in 2010 by the J. Paul Getty Trust. The four basic categories of the Getty ontologies are as follows: Geographic Names, Art and Architecture Thesaurus, Artists Names, and Cultural Object Names. The topics are intended to help aggregate and discriminate the user’s searches and enhance retrieval results. Since there is only one list (or ontology) of Getty vocabularies, the four ontologies function together implicitly by combining all controlled vocabularies to create a single Getty vocabulary. As a result, when a user enters a query the IR system responds to the commands referring to a single ontology of Getty vocabularies.
Each of the four ontologies are intended to direct (or guide) the user’s search query from various aspects, depending on what the user knows and by reviewing which search they select. For example, the user can choose to begin the search by location or country, art terminology and thesaurus, artist’s names, and/or cultural object names.
The Getty Search Tools and Databases: http://www.getty.edu/research/tools/
The Four Getty Ontologies:
The Getty Thesaurus of Geographic Names (TGN)®
London or Londinium? TGN contains names, variant names, and hierarchical context for current and historical cities, towns, nations, empires, physical features, and archaeological sites.
Search Tips
For the Find Name field, you may use AND and OR [e.g., 1) san carlos, 2) carlos OR charles, 3) carl* OR charl*, 4) san AND carlos, 5) carlos AND (san OR saint), 6) (carlos OR charles) AND (san OR saint)] Boolean operators must be in all caps (AND and OR). Wildcard is the asterisk (*); right truncation only. To find an exact match rather than a key word, use quotes [e.g., “carlos”]. There is an implied AND between the Find Name, Place Type, and Nation fields.
The Art & Architecture Thesaurus (AAT)®
Catherine wheel or rose window? AAT contains terms, synonyms, definitions, and relationships for objects, styles, materials, and other topics related to art, architecture, and other material culture.
Search Tips
For the Find Term or Note field, you may use AND and OR (all in upper case) [e.g., 1) windsor chairs, 2) chairs OR rockers, 3) chairs OR rockers OR armchairs, 4) bow-back AND windsor, 5) windsor AND (rockers OR chairs), 6) (windsor OR boston) AND (rockers OR chairs)]. Wildcard is the asterisk (*); right truncation only. To find an exact match rather than a key word in the Find Term field, use quotes [e.g., “chairs”]. If you wish to search the term and note together, click on the buttons for AND or OR.
The Union List of Artist Names (ULAN)®
Titian or Tiziano Vecellio? ULAN contains names, variant names, and biographical information for artists, architects, studios, firms, and repositories of art.
Search Tips:
For the Find Name field, you may use AND and OR [e.g., 1) eldon garnet, 2) garnet OR carnet, 3) garnet OR carnet OR karnette, 4) eldon AND carnet, 5) eldon AND (garnet OR carnet), 6) (eldon OR elton) AND (garnet OR carnet)]. Boolean operators must be in all caps (AND and OR). Wildcard is the asterisk (*); right truncation only. To find an exact match rather than a key word, use quotes [e.g., “carlos”]. There is an implied AND between the Find Name, Role, and Nationality fields.
Cultural Objects Name Authority (CONA)™
Mona Lisa or La Gioconda? CONA, a new vocabulary coming in 2011, includes titles, attributions, and other information for works of art and architecture.
Accessing CONA:
The CONA online search module is under development. It is scheduled to go live in early 2012. In the meantime, please contact us at vocab@getty.edu if your institution wishes to contribute to CONA. As with the AAT, TGN, and ULAN, CONA grows through contributions from the user community.
Copyright Rules:
Copyright © 2010 The J. Paul Getty Trust. All rights reserved. The ULAN and the other Getty vocabularies are made available via the Web browsers to support limited research and cataloging efforts. Companies and institutions interested in regular or extensive use of the vocabularies should explore licensing options by reading about Obtaining and Sample Data or by contacting the Vocabulary Program.
Getty Vocabularies
The Getty vocabulary data can be obtained in several ways:
- On the Getty Web site, free of charge, for searching individual terms and names. The data is refreshed every two weeks. The Getty vocabularies are made available via the Web to support limited research and cataloging efforts only. Licensing is required for more extensive use of these tools.
- By licensing the raw data files. The AAT, ULAN, and TGN are currently available in XML and relational tables, which are released in July annually. This data includes no software. Customized versions of these files are not available. The data is also available via Web Services, for which data is refreshed every two weeks; this data is “in process,” meaning it may change with the annual release in July of each year. Please go to the Download Center for more information about licensing, Web Services, and sample data.
- As part of a collection management system or other information system from one of the following vendors: Adlib Information Systems, Cuadra Associates, Gallery Systems, KE Software/KE EMu, Luna Imaging, Questor Systems, Vernon Systems, or Willoughby Associates.
The J. Paul Getty Vocabulary License, Copyright and Download page: http://www.getty.edu/research/tools/vocabularies/obtain/download.html
In conclusion, Morville (2005) provides suggestions for two methods that will improve the system’s precision and recall. For enhancing precision, Morville suggests the use of controlled vocabularies. He suggests the following: “Controlled vocabularies (organized lists of approved words and phrases) for populating metadata fields can further improve precision through their discriminatory power” (p. 53). In order to enhance recall, Morville suggests connecting various non-linear relationships between terms, in other words, integrate a syndetic structure in the system. Morville states the relationships connecting terms in the following way: “The specification of equivalence, hierarchical, and associative relationships can enhance recall by linking synonyms, acronyms, misspellings, and broader, narrower, and related terms” (p. 53). In his book, “Ambient Findability,” Morville makes many insightful and valuable recommendations to facilitate information retrieval, and ultimately improve the infrastructure of the information architecture.
Reference:
Berners-Lee, T., Hendler, J. and Lassila, Ora. (2001, May). The Semantic Web. Sceintific American.
Bolin, M. (2011). SJSU-SLIS: Information Retrieval Course. LIBR 202 – Section 18.
Marchionini, Gary. (1995). Information seeking in Electronic Environments. New York, NY: Cambridge University Press, 224
Meadow, Charles T., Boyce, Bert R., & Kraft, Donald H. (2000). Text Information Retrieval Systems, 2nd ed. San Diego: Academic Press.
Morville, P. (2005) Ambient Findability. Sebastopol: O’Reilly Media, Inc.