PROPOSED Unlike normal documents, the document streams

PROPOSED SYSTEM
This section provides the problem formulation, the purpose of the proposed system, methodology followed to solve the problem besides an algorithm that is used to achieve the solution.
A) Problem Definition
Document streams are the streams of documents that continuously flow into a system. Unlike normal documents, the document streams are dynamic in nature and efficient processing of such documents is challenging. Defining similarity metric that is required by documents and queries is another important aspect to be considered. Provided set of documents D and set of queries Q, continuous top-k queries and monitoring them is the problem is to be addressed.
B)Purpose of the System
In the proposed system, a framework is designed and implemented to have continuous monitoring and approximation of document streams to Top-k queries of different users. Thus the proposed system yields more utility to end users than existing system. Top-k queries instead of preferences can provide the intent of users more clearly. Thus the filtered documents can reveal the user intention in making such queries. An algorithm named Adaptive Identifier Ordering (AIO) is implemented to achieve this. AIO adapts to the runtime dynamics of streaming besides using top-k queries to reports users with most appropriate documents. We build a prototype application to demonstrate proof of the concept.
C) Methodology
The methodology followed in the proposed system is as follows. Set of documents that arrive as a stream is denoted as D={d1, d2, …, dn}. Each document has number of terms. The terms are denoted as T={t1, t2, …, tn}. Each term is associated with a weight denoted as f. Adaptive Identifier Ordering (AIO) is the algorithm proposed to have continuous monitoring of document streams with top-k queries. Each document has its ID denoted as dID. A set of words in dictionary is used to make a set of lists. Each term has its own list Li containing (dID, fi). A set of queries given by users is denoted as Q={q1, q2, …, qn}.
D) Adaptive Identifier Ordering
This algorithm is meant for achieving top-k results from document streams. It takes set of documents, set of queries and dictionary word collection as input and produces top-k results for each query in the given set of queries.
Algorithm: Adaptive Identifier Ordering (AIO)
Input: Set of documents D streamed at server, set of queries Q, Dictionary W
Output: Top k results for each query

Initialize vector for list L
Preparing for ID Ordering
For each word w in W
For each document d in D
For each query q in Q
Compute TF-IDF fi for w
Update list L with dID and fi for adapting
Save L
End For
End For
End For
Finding Top-K Results
For each query q in Q
Sort all lists based on ID
Find the average weights
Display top k results for query q
End For
The AIO algorithm takes the streamed documents, set of user queries and set of dictionary words. For each dictionary word, it computes set of lists and finally finds top k documents based on the relevancy. The relevancy is computed using TF-IDF approach of Okapi BM25. Similarity between a query q and the document d is computed as in Eq. 1 according to cosine similarity measure.
C (q, d) = (q . d)/(?q??d? ) = ?_(1?i??T?)??Wi fi? (1)
With the help of similarity measure, it is easier to find out similarity of documents based on given user queries. Moreover the proposed approach is adaptive in nature which continuously adapts to the new dynamics of documents and queries.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now
x

Hi!
I'm Owen!

Would you like to get a custom essay? How about receiving a customized one?

Check it out