PROJECT SYNOPSIS PROJECT TITLE Context Based Search Engine COLLEGE TEAM MEMBERS SPONSERED BY
GUIDES Internal Guide
:
External Guide
:
PROBLEM DEFINITION Our project is basically a research oriented project where we are required to understand the working of search engines such as google, msn etc and study the concepts of data mining and data warehousing and also to research on available search algorithms. We also would be doing a comparative analysis of Google Search API and Lucene framework. TECHNOLOGY USED Java PLATFORM Windows
SOFTWARE AND HARDWARE REQUIREMENTS JDK 1.5 Microsoft Windows XP Professional SP2
512 Mb RAM 80 Gb HDD Pentium 4 processor PROJECT DESCRIPTION Although several new operating systems attempt to provide users with content-based search capabilities, they are limited to text documents. A key challenge in implementing a content-based similarity search system for feature-rich data is that such data is noisy and complex. For example, consider two different photographs of an identical scene, or two separate recordings of a person speaking the same sentence. Despite the high degree of similarity between the two images or between the audio recordings, the digital representations are different at the bit level. Comparing noisy, featurerich data requires matching based on similarity instead of exact match, and thus searching for noisy data requires similarity search instead of exact search. However, similarity search in high-dimensional spaces is notoriously difficult (the so called curse of dimensionality). Hence, practical advanced search solutions, such as database tools and search engines (e.g. Google), have been limited to searching for exact matches and tend to work only for text documents and text annotations. To date, there is no practical content-based search engine for massive amounts of inherently noisy, feature-rich data. Our application would be a code indexing and search application. It will be an application of Search API and Lucene framework. This is a branched out specialized domain from context based searching.