Friday, September 14, 2012

Attivio SharePoint 2010 search integration issues and concerns you need to know

If you implement Attivio SharePoint 2010 search connector, users could search SharePoint content and metadata through centralized Attivio search interface with other contents such as wiki, email, files share, Documentum, and eRoom. After reviewing the Attivio architecture and SharePoint connector guide, we have some concerns and questions that should be resolved for SharePoint search integration. Some of them are critical that need to be addressed before going to production.

1. System pages and galleries crawling - Attivio SharePoint 2010 connector guide indicated SharePoint Object Model connector does not support crawling system galleries. However, this limitation does not indicated in SharePoint Web Services connector which is we are using. Since SharePoint pages and galleries will contain many pages and exclude them from the crawling will be best practice to reduce the performance impact. The following galleries and system page should be excluded from the crawling and there may be more need to be excluded you may check the sacreen shot for SharePoint site.
  • Web Part Gallery
  • Site Template Gallery
  • List Template Gallery
  • Master Page Gallery
  • Theme Gallery
  • From Template system files
  • IWConvertedForms
  • Workflow Forms
 
2. SharePoint 2010 permissions crawling - Attivio SharePoint 2010 connector guide indicated target audiences and audience filtering are not supported. There is no way to return the target audience of an item. As a result, the search will not apply target audiences permission. Users not inside target audiences might be able to search and view the content. This needs to be verified and addressed.

3. SharePoint 2010 content type crawling - Attivio SharePoint 2010 connector guide indicated content types are not supported and we had concern that content with customized content type might not be indexes. Attivio consultants have confirmed this is not correct and all content with different content types will be indexed and will be searchable. This needs to be tested.

4. SharePoint 2010 crawling configuration - Attivio SharePoint 2010 connector guide indicated the NoCrawlproperty for lists and sites is not available. As a result, we could not exclude any list or sites to be excluded in Attivio search. We have some secrete site collections in the system we do not exposure to any users except some restricted users. Owners of these sites might not want to expose any content through other search UI even the permission has been properly applied. We may need to identify some workaround to address this.

5. SharePoint 2010 MySite crawling - Attivio SharePoint 2010 connector guide indicated to pass http://host:port/personal/username rather than http://host:port/MySite. SharePoint treats MySites as separate repositories. We are not sure whether we need to pass each and every personal my site URL which is more than 10,000 in our company. This need to be address if MySite content need to be searchable through Attivio.

6. SharePoint 2010 Meeting Workspaces crawling - Attivio SharePoint 2010 connector guide indicated crawling Meeting Workspaces causes the server to queue child pages such as Workspace Pages that do not exist, which in turn causes an Exception error message during a crawl. This needs to be testing and verified.

7. SharePoint 2010 audit and logs – SharePoint will contain audit logs and other logs inside content database. At this point, we are not sure whether Attivio will index any of these. We are hoping these will not be indexes to avoid performance issue. This needs to be confirmed.

8. SharePoint 2010 entitlement policy – We are implementing Nextlabs SharePoint entitlement solution to deny certain group users to selected site content even those users are granted permissions through SharePoint. The SharePoint search will be integrated with Nextlabs SharePoint entitlement policies and will block those users to search or view selected content. However, Attivio SharePoint 2010 connector will not aware of the Nextlabs SharePoint entitlement policies and may expose selected content to those users. We might need to customize Attivio search to Nextlabs SharePoint entitlement policies through Nextlabs policy web services before display search result to end users.

9. SharePoint 2010 retention policy – We are implementing retention policy to some site content. For example, if we apply the retention policy to one site as seven year policy, content will be deleted automatically after seven years. The SharePoint backup tape may have one year retention policy and will be recycled after one year. The same one year policy should be applied to Attivio index tapes. In other words, anything deleted from SharePoint should not exist on Attivio side even backup tapes.

10. Attivio SharePoint 2010 connector web service – This web service contains several interfaces that will not only read but also update and delete SharePoint contents. Although this is not a real issue now but we are surprised that crawling process web service contains update and change interfaces. We would need to be careful only grant Attivio SharePoint crawling account as READ only and may utilize the following update interfaces.
  • CancelCheckOut
  • CheckOut
  • Checkin
  • CopyItem
  • CreateDocument
  • CreateFolder
  • DeleteItems
  • DeleteVersion
  • MoveItems
  • Promote
  • SetAttachments
  • SetPermissions
  • UpdateItem
If you found anything else we need to be concerned on Attivio SharePoint 2010 search, please share with us.

1 comment: