PROPOSED Unlike normal documents, the document streams

PROPOSED SYSTEMThis section provides the problem formulation, the purpose of the proposed system, methodology followed to solve the problem besides an algorithm that is used to achieve the solution.

A) Problem Definition Document streams are the streams of documents that continuously flow into a system. Unlike normal documents, the document streams are dynamic in nature and efficient processing of such documents is challenging. Defining similarity metric that is required by documents and queries is another important aspect to be considered.

Don't waste your time
on finding examples

We can write the essay sample you need

Provided set of documents D and set of queries Q, continuous top-k queries and monitoring them is the problem is to be addressed. B)Purpose of the System In the proposed system, a framework is designed and implemented to have continuous monitoring and approximation of document streams to Top-k queries of different users. Thus the proposed system yields more utility to end users than existing system. Top-k queries instead of preferences can provide the intent of users more clearly. Thus the filtered documents can reveal the user intention in making such queries.

An algorithm named Adaptive Identifier Ordering (AIO) is implemented to achieve this. AIO adapts to the runtime dynamics of streaming besides using top-k queries to reports users with most appropriate documents. We build a prototype application to demonstrate proof of the concept. C) MethodologyThe methodology followed in the proposed system is as follows. Set of documents that arrive as a stream is denoted as D={d1, d2, …

, dn}. Each document has number of terms. The terms are denoted as T={t1, t2, ..

., tn}. Each term is associated with a weight denoted as f. Adaptive Identifier Ordering (AIO) is the algorithm proposed to have continuous monitoring of document streams with top-k queries. Each document has its ID denoted as dID. A set of words in dictionary is used to make a set of lists. Each term has its own list Li containing (dID, fi). A set of queries given by users is denoted as Q={q1, q2, .

.., qn}. D) Adaptive Identifier OrderingThis algorithm is meant for achieving top-k results from document streams.

It takes set of documents, set of queries and dictionary word collection as input and produces top-k results for each query in the given set of queries. Algorithm: Adaptive Identifier Ordering (AIO)Input: Set of documents D streamed at server, set of queries Q, Dictionary WOutput: Top k results for each query Initialize vector for list LPreparing for ID Ordering For each word w in W For each document d in D For each query q in Q Compute TF-IDF fi for w Update list L with dID and fi for adapting Save L End For End For End ForFinding Top-K Results For each query q in Q Sort all lists based on ID Find the average weights Display top k results for query q End ForThe AIO algorithm takes the streamed documents, set of user queries and set of dictionary words. For each dictionary word, it computes set of lists and finally finds top k documents based on the relevancy. The relevancy is computed using TF-IDF approach of Okapi BM25. Similarity between a query q and the document d is computed as in Eq. 1 according to cosine similarity measure. C (q, d) = (q .

d)/(?q??d? ) = ?_(1?i??T?)??Wi fi? (1)With the help of similarity measure, it is easier to find out similarity of documents based on given user queries. Moreover the proposed approach is adaptive in nature which continuously adapts to the new dynamics of documents and queries.


I'm Owen!

Would you like to get a custom essay? How about receiving a customized one?

Check it out