cancel
Showing results for 
Search instead for 
Did you mean: 

Exclude files from indexing process

Former Member
0 Kudos

Hi There,

Does anyone know if its possible to make some kind of setting on an index, which excludes files from the indexing process. I've created a repository with a lot of files, but I would like the TREX only to return documents of the type *.doc.

Regards,

HCO

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi,

the only way to exclude different file types from indexing is to write an extension for TREX. In your

written extension it is possible to check which mime-type

is allowed for indexing. In case of undesirable documents

an exception should be raised so that these documents are not indexed.

Best regards,

Jochen

Former Member
0 Kudos

Hi Jochen,

That was very nice to hear - at least that there is some way around a problem, which ought to be simple to solve.

You say I'll have to write an extension for TREX. Can you elaborate on that, give me some hints on where/how to do it?

Cheers,

hco

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi,

I've checked the crawler options in EP60 and NW04. I cannot find a solution in this way. If you want to write an extension for TREX please see the Docu in <%TREX_Root%>/doc/apidoc/trexext. This Docu gives an overview about existing extensions and a HowTo description.

Regards,

Jochen

Former Member
0 Kudos

Hi Jochen,

That sounds great, Ill dig into that documentation, and see if I can make it work

Regards,

hco

0 Kudos

Hi HCO,

have you also looked at the options to exclude files already during crawling? At least in an EP-KM scenario you should...

Regards,

Karsten

Former Member
0 Kudos

Hallo,

you have to edit the TREXValidMimeTypes.ini in the TRex-Root. There are all Mime Types note, which TRex include in the indexing process.

Regards,

Gerhard

Former Member
0 Kudos

Hi Gerhard,

That sounds nice, but won't that exclude the mimetypes from all the indexes??? I want to only exclude certain mimetypes from a single index.

regards,

hco

Former Member
0 Kudos

Hi again Gerhard,

I just tried what you suggested. I Created a document in KM with an "unknown" mime type. Ie. its not listed in the ValidMimetypes.ini file. The problem is that TREX does include the file in the indexing process, but you can't search the content of the document.

Take this example:

Document 1

title: "This is TREX.ppt"

Mimetype = application/vnd.ms-powerpoint

Document 2

title: "This is TREX.ppt"

Mimetype = dont_index_me

Besides that the documents are completely identical - except from their Mimetypes.

If I make a search on "TREX", i get 2 hits - doc 1 and 2. Its only possible to see the content of Doc 1.

If I make a search on some words which are within Doc 1 and 2, I only get Doc 1 as result.

The problem is that TREX returns Doc 2, even though its of an unknown mimetype. You can't read it, but the title is indexed.

So - How do you exclude documents completely - ie. that TREX just skips the document in the indexing process???

Regards,

Hco