Topic 9- Indexing Techniques

Topic 9 – Indexing Techniques

Introduction

We are all familiar with indexes, they are those handy things in the back of textbooks that allow you to look up the locations of selected references to sets of keywords. They are the same thing in the computer world. Since you are in data structures, you will have to think about – how would you implement and maintain an index.

Why Indexes?

The world is sloppy and some times we simply don’t want to take all this unstructured data and rearrange it all to be neat. Indexes allow us a nice easy way to keep track of it in a separate entity. Is this all worthwhile? If you can index the material on the internet, organize it to be quickly searchable, and make it available through a nice, fast interface – you have Google. It seems to have worked out well for them, so we will take a little look at indexing.

Indexes and Structures

In reality, indexes are actually an application of the field of data structures – but they are important enough that we are covering them here. To creatively index some data/information/text/etc you need 2 things, an algorithm or (1) process by which you will index, and (2) a structure to store the index. Suppose you are going to index something like a book, you will first want to identify keywords in the book that you plan to index.

1. Need a list of keywords – meaning a data structure for the List

Now we will want to know how we will reference the keywords. Since books are sequentially sorted by pages, the page numbers make perfect sense, we simply associate page numbers with key words. Since key words can be on multiple pages, and pages can hold multiple keywords – we must realize that the association between keywords and page numbers can be many to many. We will need a data structure to store this associated page numbers and key words.

2. Need a structure (List?) – to store page numbers and association with keywords

Unless you plan to do the indexing by hand, you are going to need to go through your book and (1) find all the keywords, and (2) find associated page numbers. This will use the structures in both 1 and 2 above.

3. Create an algorithm that will create (and maintain) the index

Finally – the index really needs to be usable by people, meaning it will also need to likely be sorted in a manner where people can easily find the keyword and of course the associated pages.

4. Sorting or organization scheme for the keyword/page number structure

So as you can see indexing involves a number of basic data structures.

Indexing

Indexing is used extensively in computers, with many of the services built directly on the OS – like https://en.wikipedia.org/wiki/Indexing_Service Here are the basic computer science based indexes that you should know.

Lookup Table – https://en.wikipedia.org/wiki/Lookup_table

Associative Arrays and Dictionary – https://en.wikipedia.org/wiki/Associative_array

Database Index – https://en.wikipedia.org/wiki/Database_index