Geração de impressão digital para recuperação de documentos similares na web

Nenhuma Miniatura disponível
Data
2004
Título da Revista
ISSN da Revista
Título de Volume
Editor
Resumo
This paper presents a mechanism for the generation of the “finger-print” of a Web document. This mechanism is part of a system for detecting and retrieving documents from the Web with a similarity relation to a suspicious do-cument. The process is composed of three stages: a) generation of a fingerprint of the suspicious document, b) gathering candidate documents from the Web and c) comparison of each candidate document and the suspicious document. In the first stage, the fingerprint of the suspicious document is used as its identifica-tion. The fingerprint is composed of representative sentences of the document. In the second stage, the sentences composing the fingerprint are used as queries submitted to a search engine. The documents identified by the URLs returned from the search engine are collected to form a set of similarity candidate do-cuments. In the third stage, the candidate documents are “in-place” compared to the suspicious document. The focus of this work is on the generation of the fingerprint of the suspicious document. Experiments were performed using a collection of plagiarized documents constructed specially for this work. For the best fingerprint evaluated, on average87.06%of the source documents used in the composition of the plagiarized document were retrieved from the Web.
Descrição
Palavras-chave
Citação
PEREIRA JUNIOR, A. R.; ZIVIANI, N. Geração de impressão digital para recuperação de documentos similares na web. In. II Workshop de Tecnologia da Informação e Linguística, II. 2004. Salvador. Anais. Salvador: Workshop de Tecnologia da Informação e Linguística, 2004. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/til04.pdf>. Acesso em: 18/10/2012.