Recently, I got a mail from my collegue describing a problem in SharePoint search.
Issue : We have a portal on MOSS 2007. Documents are standard office document types to pdf etc. I have a very unique situation during a search. Find attached ppts. One is pptx, and other ppt ( 97- 2003) , content wise both are same. When searching for “Case study I” it lists both ppt’s, but when searching for autoparts (this autoparts is content within a textboxin the ppt , slide 2), it lists only pptx. I cant figure out the reason for this behavior, any suggestions. In fact, any keyword provided from the content in slide 2 does not bring the 97-2003 format of pptContent Crawling SharePoint crawling process accesses and parses content and its properties, sometimes called metadata, to build a content index from which search queries can be served. The result of successfully crawling content is that the individual files or pieces of content that you want to make available to search queries are accessed and read by the crawler. The keywords and metadata for those files are stored in the content index, sometimes called the index. The index consists of the keywords that are stored in the file system of the index server and the metadata that is stored in the search database. The system maintains a mapping between the keywords, the metadata associated with the individual pieces of content, and the URL of the source from which the content was crawled. Lets come to the issue ... Crawling content is sucessful incase of PPTX since Office 2007 uses New file formt called Office XML format which has lot of benefits one among them is effective search when compared to other previous MS office formats. Textbox content are also stored as part of XML which was not available in the previous formats.