on 10-13-2004 10:37 PM
Hi There,
Does anyone know if its possible to make some kind of setting on an index, which excludes files from the indexing process. I've created a repository with a lot of files, but I would like the TREX only to return documents of the type *.doc.
Regards,
HCO
Hi,
the only way to exclude different file types from indexing is to write an extension for TREX. In your
written extension it is possible to check which mime-type
is allowed for indexing. In case of undesirable documents
an exception should be raised so that these documents are not indexed.
Best regards,
Jochen
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I've checked the crawler options in EP60 and NW04. I cannot find a solution in this way. If you want to write an extension for TREX please see the Docu in <%TREX_Root%>/doc/apidoc/trexext. This Docu gives an overview about existing extensions and a HowTo description.
Regards,
Jochen
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi HCO,
have you also looked at the options to exclude files already during crawling? At least in an EP-KM scenario you should...
Regards,
Karsten
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hallo,
you have to edit the TREXValidMimeTypes.ini in the TRex-Root. There are all Mime Types note, which TRex include in the indexing process.
Regards,
Gerhard
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi again Gerhard,
I just tried what you suggested. I Created a document in KM with an "unknown" mime type. Ie. its not listed in the ValidMimetypes.ini file. The problem is that TREX does include the file in the indexing process, but you can't search the content of the document.
Take this example:
Document 1
title: "This is TREX.ppt"
Mimetype = application/vnd.ms-powerpoint
Document 2
title: "This is TREX.ppt"
Mimetype = dont_index_me
Besides that the documents are completely identical - except from their Mimetypes.
If I make a search on "TREX", i get 2 hits - doc 1 and 2. Its only possible to see the content of Doc 1.
If I make a search on some words which are within Doc 1 and 2, I only get Doc 1 as result.
The problem is that TREX returns Doc 2, even though its of an unknown mimetype. You can't read it, but the title is indexed.
So - How do you exclude documents completely - ie. that TREX just skips the document in the indexing process???
Regards,
Hco
| User | Count |
|---|---|
| 3 | |
| 3 | |
| 2 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.