cancel
Showing results for 
Search instead for 
Did you mean: 

Indexing custom WPC forms

Former Member
0 Kudos

Hello All-

I have a WPC site which uses custom forms for much of it's content. For some reason, none of the content within these forms gets indexed. Search results return only the page title and the message "No document excerpt available". If the same content is placed in the delivered WPC forms, and stored in the same site structure in KM, the contents are indexed correctly. The pages are also custom.

Can someone tell me what I need to change in order to make the content in my custom forms available to the indexer?

thanks

Tim

EP7, SPS17

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Tim

In order for me to help you the most, I would need some error entries from the TREX preprocessor logs.

But the problem could be if you use custom layout templates. View your custom layout templates, and notice the entry "Supported User Agents". This entry should be set to blank in order for the TREX to access the web page content.

Best regards,

Martin Søgaard

Former Member
0 Kudos

Hi Martin-

I've been looking at this more today with your advice and I think you're on the right track, but I'm still having an issue.

I created 2 new WPC pages: StandardLayout (from the delivered content) and CustomLayout (based on our own jsp/template). Within both pages I added exactly the same content pieces, one custom form and one html file.

Then I reindexed the site. I followed the indexing process in the TREX Administration app, the index completed successfully with no errors.

When searching this index, only results from StandardLayout are returned (even though it has exactly the same content as CustomLayout). This means the custom form is not causing the issue, but the custom page layout is. I compared all of the properties of our page template to the delivered templates and reset the 'Supported User Agents' to none (the -Select- option). What else could be preventing the indexing from taking place?

Many thanks,

Tim

Former Member
0 Kudos

Hi Tim

If the Supported User Agents attribute was not set to None for your custom layout template (which cannot be indexed), but was set to None for the standard layout template, then I think you should try deleting the entire index containing the WPC pages and recreate it from scratch.

The TREX administration tool from the portal does not return any errors because the TREX is allowed to reach the document and index everything but the content. Therefore you need to enter the TREX Admin Tool on the TREX server in order to take a look at the trexpreprocessor and trexfilter logs.

Best regards,

Martin Søgaard

Former Member
0 Kudos

Hi Martin-

After deleting/recreating the index several times today I noticed a new anomaly. The index also did not contain any PDF info (and I included several PDFs within the site contents). What I realized is that I had created each index using the WPC crawler (according to this blog: https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/8096). I had also added the custom property contentUrlProp: wpc:wpc_wcm_trex_url per the blog.

So instead, I created a new index using the standard crawler and no custom properties. Now, I have exactly the opposite result. All of my custom content is indexed, but anything on a delivered forms is excluded. Interestingly, only the custom form's content is returned in the search result.......not the containing page like what was returned when using the config from the blog. Perhaps this is what the contentUrlProp property does for the index???

Also, I'm wondering why my custom forms/pages are not included in the wpc crawler. The wpc crawler parameters for result filter are limited to the item ID pattern, "output.xml" so I assume this is what is restricting the crawling of my custom content. Why would my content not conform to this item ID? Is this something I need to add to the custom JSP?

Thanks

Tim

Former Member
0 Kudos

Hi Tim

Without access to your system, it is very hard to pinpoint your problem. I can give you some hints and a good tool for further analysis on your own.

-- After deleting/recreating the index several times today I noticed a new anomaly. The index also did not contain any PDF info (and I included several PDFs within the site contents). What I realized is that I had created each index using the WPC crawler (according to this blog: https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/8096). I had also added the custom property contentUrlProp: wpc:wpc_wcm_trex_url per the blog. --

I would recommend to create a specific index for your WPC pages and other indexes for your office documents etc, and then use the wpc crawler for your wpc content. The wpc crawler only lets output.xml files through to the TREX because it is only the output.xml file that should be indexed when it comes to wpc pages. A wpc page is in reality a folder containing the elements you have added to the page along with the output.xml file. And since the output.xml file contains the output, you dont want the subparts of the web page to be indexed also.

In order to understand the Web Page Composer architechture I STRONGLY recommend to deploy the Repository Framework Explorer (google search RF explorer eg) and use it to better understand the architechture - and especially what properties are set on different elements and investigate if they have been set at all.

-- So instead, I created a new index using the standard crawler and no custom properties. Now, I have exactly the opposite result. All of my custom content is indexed, but anything on a delivered forms is excluded. Interestingly, only the custom form's content is returned in the search result.......not the containing page like what was returned when using the config from the blog. Perhaps this is what the contentUrlProp property does for the index --

Now, you created an index that will search all content in the folders - including the subcontent of the wpc page I mentioned before. But I dont think the output.xml file is indexed correctly without the custom property on the index as Im quite sure it contains the link the TREX uses in order to access/process the output.xml file.

I cant explain why you experience a different behavior between the standard forms and custom forms other than they are not similar.

-- Also, I'm wondering why my custom forms/pages are not included in the wpc crawler. The wpc crawler parameters for result filter are limited to the item ID pattern, "output.xml" so I assume this is what is restricting the crawling of my custom content. Why would my content not conform to this item ID? Is this something I need to add to the custom JSP? --

Your forms should not be indexed seperately at all. Only your wpc pages: the output.xml file that is. Whenever you have published a wpc page, the output.xml file is created/updated. So dont worry about adding code to your custom jsp. I suggest you start from scratch and doublecheck the following:

- There is no entry in the "Supported User Agents" in any of your layouts

- You make totally sure that you have entered the wpc content url correctly when creating the index AND that you have pressed the Add-button before saving the index (otherwise the custom property)

- You have published the wpc pages you want to be searchable

Best regards,

Martin Søgaard

Former Member
0 Kudos

Thanks Martin. Your advice has helped me solve my issue.

It turns out that there were 2 things happening that were preventing the files from being indexed. First, one of the custom layouts was not properly formed and included some erroneous code. This showed up in the TREX preprocessor log. Second, while checking all of the properties on my custom layouts I discovered that the PCD permissions for the folder containing the custom layouts was restricted ('Everyone' did not have read access or End User permission). When I added the Everyone group to this folder's permission, the indexing and search worked as expected. I'm not sure, however, what the correct permission setting should be for this or why it affected the indexing process.

Also, I took your advice and created 2 separate indexes and it appears to be working very well.

Thanks for your help.

Tim

Answers (1)

Answers (1)

Former Member
0 Kudos

Tim/Martin,

With regards to TREX search for web pages created using WPC; I have my index and search working fine. But the issue that I am facing is that when the search results are displayed and when the link is clicked it ends up opening the output.xml corresponding to the WPC page that was outputted in the search result. How do I avoid the same and ensure that the actual WPC page corresponding to the output.xml is rendered on the screen?

We use custom page layout template and custom web forms in WPC.

I have ensured that NO entry is set for the "Supported User Agents";

The pages are published

The PCD permissions corresponding to the Page layout template folder is

set to READ for EVERYONE.

Let me know if I am missing anything. Thanks in advance.

Former Member
0 Kudos

Vijaya,

If I understand correctly, the links in your search result open the wrong page. I can think of 2 things that might cause this.

First, check the output.xml from your WPC page after you publish it. There should be a contentlink property that references the page's GUID. This should match the GUID in the content link property of the page itself (under details -> access links).

Second, if the output.xml looks OK, then there might be a simple issue with your search result layoutset properties. Check the collection renderer for your resultset and ensure the contentlink property is referenced.

Tim

Former Member
0 Kudos

Tim,

The links in the search result DO NOT open the wrong page. They point to the right page and the

content link property of the WPC match does match. There are NO issues in this front. The real issue is

that when the search result is output, the links point to the KM navigation path of the output.xml

(i.e /irj/go/km/docs/wpccontent/Sites/<sitename>/Web Pages/<webpagename>/output.xml) and NOT to the actual WPC page url.

Wondering as to how do I display the url associated with the contentlink property for the WPC page result that has been outputted? I did try using the below code snippet in my search component to retrieve the WPC url associated with the resource, but unfortunately the property associated with 'contentlink' returns NULL.

IPropertyName propName = PropertyName.getPN
("http://sapportals.com/xmlns/cm/rendering", "contentlink");
IProperty property = resource.getProperty(propName);
if(property != null) 
{
     String portalNavUrl = property.getStringValue();
      .....
}

Here property is getting evaluated to 'null'?

Pls advice as to how are you able to get the content link url for the WPC page that is output in the search results.

Former Member
0 Kudos

Tim,

Not sure as to what was the issue. I am now able to access the contentlink property with the same code snippet. I was under the impression that it would not work because, I had the same problem when developing a custom component using JSPdyn page. But luckily it worked for this custom search component. I am still not able to understand as to why it is working here and not in the other custom component...