LIBR 202 – Section 18 – Bolin
Discussion Week 5
Reading from Morville’s Book
Patricia Ayame Thomson
“Beginnings of Information Retrieval”
Morville takes us on a fascinating journey through the beginnings of “Information Retrieval.” I’m glad to learn about Calvin Mooers, who coined the term “information retrieval” in 1948, as well as the Mooers’ Law (p. 44). Both Mooers and Zipf seemed to have arrived at the same discomforting conclusion, that in brief, most people choose the path of least resistance. In other words, having to choose between quickly and easily accessed information and accuracy, most people will choose the former. However lofty-minded I may sound right now, given the circumstances and time constraints I will probably do the same. Therefore, Morville claims that it is critically important to understand the user and their information-seeking behavior. Morville emphasizes that more research is necessary on information-seeking behavior from all aspects: geographic, psychologically (evolutionary psychology,) etc. In particular, Morville gives credit to a note-worthy individual stating: “Marcia J. Bates deserves credit for shaping our understanding of information seeking behavior” (p. 59). Bates is one of the rare pioneers in understanding the importance of examining information-seeking behavior in order to enhance information collection, storage, and retrieval.
I find Morville’s description of the user being “The People Problem” amusing and true (p. 54). Like he says, the user is the unpredictable variable in the equation. Morville humorously describes it this way: “Today we call this infuriating variable ‘the user’ and we recognize that research must integrate rather than insolate the goals, behaviors, and idiosyncrasies of the people who use the systems” (p. 54). I find it ironic that developers, programmers, and indexers are bending over backwards to meet the demands from consumers to make the products “faster, easier, and more compact,” at the same time they are catering to an audience of users who walk the path of least resistance (ie: needing immediate gratification? — for lack of better terms.) I realize it’s more about streamlining the system that translates to mean increased profit, influence, and growth for the company.
Presenting the challenges we encounter in the face of the technological revolution, Morville articulately states: “At the heart of these challenges and principles lies the concept of relevance. Simply put, relevant results are those which are interesting and useful to users” (p. 49). That means the programmer’s goal is to get as many “relevant hits” as possible in the user’s search results. In order to measure the effectiveness of the searches, the two most important goals are “precision and recall” (p. 49). He defines the two as follows: “Precision measures how well a system retrieves only the relevant documents. Recall measures how well a system retrieves all the relevant documents. As I understand, it is easier to maintain both precision and recall in a smaller collection of documents, but as the collection grows (ie: into the billions) the recall suffers.
Therefore, the following forms of information representation are used to increase the indicator of aboutness: “Controlled vocabularies help retrieval systems manage the challenges of ambiguity and meaning inherent in language. And they become increasingly valuable as the systems grow larger” (p. 53). Another method is folksonomy, which means the public adds metadata tags. “Metadata tags applied by humans can indicate aboutness thereby improving precision” (Morville, p. 53). In addition, Morville suggests: “The specification of equivalence, hierarchical, and associative relationships can enhance recall by linking synonyms, acronyms, misspellings, and broader, narrower, and related terms” (p. 53). (According to the lecture, I think Morville forgot to include “polysemy/homonymy.”) Morville claims for the above reasons, information retrieval is not going to be perfect or an easy system to create and use.
Although the basic concepts of libraries collecting, storing, and retrieving information endures, the advent of rapidly-advancing technology has shifted the paradigm in how librarians, users, as well as the rest of the population perform the process. Morville claims: “Fast, cheap processors powered a personal computer revolution and enabled the information explosion we call the Internet” (p. 44). Based on the reading and the “Attribute Elicitation Exercise,” defining information is a difficult task. There are two separate parts: defining what information is in it of itself, and defining the representation of information. First, it is challenging to describe what information is; define its contents and its “aboutness.” Second, it is challenging to find the exact words to define the information of an entity (ie: the smallest unit of information.) Morville also says: “Though relevance ranking algorithms can factor in the location and frequency of word occurrence, there is no way for software to accurately determine aboutness” (p. 53). In order to represent of the entity of information, the programmer has to use words. Words are ambiguous, and can be difficult for the program to decipher, distinguish, understand, and put in to context.
The above is due to the fact that computers cannot think like the human brain. Not yet anyway. There are prophetic and cautionary tales about what may happen once the computer is able to think on its own. Movies like “2001 Space Odyssey” by Stanley Kubrick, and more recently numerous films have been coming out about artificial intelligence, (ie: “Matrix,” “I, Robot,” “A.I.”) I believe it’s a matter of time before the computer will be able to think, but at this stage in the game, scientists and medical experts haven’t figured out how the human brain works yet. However, experiments with sensory, visual, and other human-like experiences have been developing in robot technology at the Carnegie Melon Institute, as well as other Universities. Like Des Carte said: “I think therefore I am.”
In conclusion, I’ve always wondered what the exact definition of the term “information science” meant. When I refer to the degree I’m pursuing, I say: “Library” – I can say proudly, because I have a pretty good handle on what that means, but I mumble through the “. . . and Information Science” — because, truth-be-told, I didn’t have a clear idea before. In case you harbored the same ambiguous notion as I did, Morville quotes the following as the definition of “Information Science:” The science that investigates the properties and behavior of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability. The processes include the origination, dissemination, collection, organization, storage, retrieval, interpretation, and use of information. (“Information Science: What is it?” Harold Borko. American Documentation, 1968.)