U2U Blog

for developers and other creative minds

Indexing and Searching Documents in Multiple Languages (Part I)

The good thing about dedicated workshops like the Search Workshops we have done last week in Brussels and Copenhagen, is that after the course, you end up with a lot of questions that were answered and that should somehow end up in blog postings, articles or whatever. Finding the time to do this is of course always a problem. But I'll do my best, certainly if it adds to the material covered in our latest book:

Inside the Index and Search Engines: Microsoft® Office SharePoint® Server 2007
by Patrick Tisseghem, Lars Fastrup

Read more about this book...

 

One of the interesting questions was regarding the indexing and the searching for documents created in a specific language. I'll cover a bit here in this first posting and continue with that later this week.

How does the crawler detect the language of the content of the document?

First of all, the detection of the language is dependent on the IFilter that is used to index the content of the document. There is a full explanation of the internals of IFilters and also a guide how to build your own one in chapter 9 of the book.

The built-in IFilter that is part of the MOSS indexing architecture is capable of looking at an Office document and collect plenty of information. This information gathering is actually the task of one of the internal plug-ins named the Metadata Extraction plug-in. It relies on an internal language detection algorithm (developed by Microsoft Research) to find out about the language of the content. When it was able to retrieve the language (represented by a number), it stores this information in a hidden managed property called DetectedLanguage.

How do I search for a document in a specific language?

Let's have a look first at the out-of-the-box experience. I have for example here a document library storing different documents each authored in a language. I configured a content source that indexed all of this data.

image

The advanced search page allows us to filter on the language very easily using the language picker. By default there are a couple of options but if you open the tool pane and configure the XML that is set as the value for the Properties property of the AdvancedSearchBox Web Part, you are able to offer more choices.

In the XML you find a list of LangDef elements each one representing one language and the number for it. Note that it is not very clear how Microsoft got to these numbers (they do not match for example the LCID numbers).

   1: <LangDefs>
   2:         <LangDef DisplayName="Arabic" LangID="1"/>
   3:         <LangDef DisplayName="Bengali" LangID="69"/>
   4:         <LangDef DisplayName="Bulgarian" LangID="2"/>
   5:         <LangDef DisplayName="Catalan" LangID="3"/>
   6:         <LangDef DisplayName="Chinese" LangID="4"/>
   7:         <LangDef DisplayName="Croatian/Serbian" LangID="26"/>
   8:         <LangDef DisplayName="Czech" LangID="5"/>
   9:         <LangDef DisplayName="Danish" LangID="6"/>
  10:         <LangDef DisplayName="Dutch" LangID="19"/>
  11:         <LangDef DisplayName="Finnish" LangID="11"/>
  12:         <LangDef DisplayName="French" LangID="12"/>
  13:         <LangDef DisplayName="German" LangID="7"/>
  14:         <LangDef DisplayName="Greek" LangID="8"/>

The language picker will show all the languages that are defined within the Languages element:

   1: <Languages>
   2:         <Language LangRef="12"/>
   3:         <Language LangRef="7"/>
   4:         <Language LangRef="17"/>
   5:         <Language LangRef="10"/>
   6:         <Language LangRef="19"/>
   7:         <Language LangRef="25"/>
   8:         <Language LangRef="22"/>
   9:     </Languages>

A query using the language picker will result in the inclusion of the match on the DetectedLanguage managed property as shown here:

image

image

The Advanced Search Page is not the only place where you can use this managed property. You can also immediately type it in in the search box where you formulate your keyword syntax query. You just have to find out the number of the language (see the above XML).

image

In a next posting I'll show you how you can customize the search experience using the language information.

Norwegian Developer Conference Presentations

Last week was the first edition of the Norwegian Developer Conference, a great initiative by Microsoft and ProgramUtvikling. As some of you probably know, Norway is one of my favorite countries to travel to and I always enjoy the people (and the beer) there. However, slavery still exists I think in Norway because I was the only speaker in the SharePoint track resulting in the delivery of 6 sessions in a row. Needed a lot of drinks after that!

Anyway, I have zipped up all of the presentations and they are available for download here. Have fun with them.

Determine if MOSS is installed by checking the Registry key

Today I found a post that explains a way to determine if MOSS is installed by checking on the existence of certain folders. Another way is to check on the existence of its the registry key.

image

The RegistryKey class is located in the Microsoft.Win32 namespace of the mscorlib.dll.

bool isMossInstalled = false;
string keyname = @"SOFTWARE\Microsoft\Office Server\12.0";
using (RegistryKey key = Registry.LocalMachine.OpenSubKey(keyname))
{
   if (key != null)
   {
       string version = key.GetValue("BuildVersion") as string;
       if (version != null)
       {
           Version buildVersion = new Version(version);
           if (buildVersion.Major == 12)
           {
              isMossInstalled = true;
           }
        }
   }
}

In a similar way you can determine if WSS is installed. In the registry you can find the following information:

image

Translated into code:

bool isWssInstalled = false;
string keyname = @"SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0";
using (RegistryKey key = Registry.LocalMachine.OpenSubKey(keyname))
{
  if (key != null)
  {
    object wssvalue = key.GetValue("SharePoint");
    if (wssvalue != null && wssvalue.Equals("Installed"))
    {
      isWssInstalled = true;
    }
  }
}

U2U Caml Query Builder Feature: new version released...

This version contains following enhancements:

  • Code snippet generation
  • DateTime enhancements
  • Query options for SPQuery
  • Enhancements for Calendar lists

You can download the new version from here.

You can read the documentation here.

 Remark: The U2U Caml Query Builder previously was part of the U2U SharePoint Solution Package. If you installed this solution, you will have to retract and delete this solution before you can install the new solution. The other features like U2U List Properties and U2U Site Properties will be available soon in separate solutions.

Code Snippet Generation

With this feature you can build your CAML queries in a WYSIWYG way. When you click the Preview button, a Result section opens showing a text box with the query and a datagrid containing the rows resulting from the execution of the query.  Until now the text box only contained the CAML but there has always been some confusing about which tags to add where. So I thought it would be nice to generate code snippets that show you the proper use of the generated CAML. You can copy the code snippet, which is a complete function, and paste it right away in your code! You only have to pass the SPList object on which you want to execute the query.

You can now choose if you want to generate code in C# or VB.NET or get the CAML back as before. 

image

If you choose to have a code snippet in C# or VB.NET, you can choose if you want to get a DataTable returned or an SPListItemCollection. I'll show you the difference with the query I often use on the Employees table I have in my SharePoint site. The pure CAML looks like this:

image

If you ask to have a code snippet in C# returning an object of type DataTable, the code snippet looks like this:

image

If you ask to have a code snippet in VB.NET returning an object of type SPListItemCollection, the code snippet looks like this:

image

The preview page which is accessible from the item Edit Control block within the CAML Query dedicated list contains the same control.

One small note: if the Preview section doesn't open up when you clicked the Preview button, check if you filled out a name for the query at the top of the page.

Also this enhancement will be added to the windows version soon.

 

Date and Time enhancements

Until now I thought (and I'm sure I'm not the only one) that the time part of a date was always ignored by a CAML query when retrieving rows from a list using SPQuery or the GetListItems method of the Lists.asmx web service. I just recently read in a blog post that you CAN query time values so I decided to include it in the feature version of the U2U CAML Query Builder. A new version of the windows version will follow soon.

Let's say you have a table with date/time values and today we are the 5th June.

image

Open the CAML Builder via the Actions button on the list. Expand the filter on section and select the DateTime field. Choose an operator f.e. Greater than. Then you have 3 options:

  • Fixed date
  • Today
  • Now

image

If you choose Today, all dates greater than today's date will be retrieved:

image 

If you choose Now, then the query will take the time part into account and also todays rows with a time value in the future will be retrieved:

image

Take a look at the CAML query: an extra attribute is added to the DateTime field: the IncludeTimeValue attribute instructs SharePoint whether to take care of the time value or not:

<Where><Gt>

           <FieldRef Name='TestDate' />

           <Value IncludeTimeValue='TRUE' Type='DateTime'>2008-06-05T11:11:42Z</Value>

</Gt></Where>

You can also work with a fixed date. In this case you can also work with a time part or not: if you leave the 12 AM unchanged, the IncludeTimeValue attribute will not be added in the CAML and the query will not take a time value in account.

Fixed date

If you change this, the feature will add the IncludeTimeValue attribute and the query will take the time value into account when retrieving the rows.

image

 

Query options for SPQuery

Euh? you will say... yes, query options for SPQuery. Most of the query options you can pass to the GetListItems method of the Lists.asmx web service can be applied to properties of the SPQuery object. But a lot of developers don't know all the different possibilities and properties of this SharePoint object so I decided to shed a light on it and add a section called Query Options. When you have made your choices and click the Preview button, the code snippet in C# and VB.NET will reflect these choices. The pure CAML will not because the QueryOptions node is not part of the CAML that can be used with SPQuery.

This is the complete Query Options section:

image

You can limit the rows returned in the result set by setting the RowLimit property.

But lets start with an easy one: IncludeMandatoryColumns. Check this option in the Query Options section.

image

In the View Fields section I only indicate FirstName, LastName and Phone. As EmployeeID and EmailAddress are defined as required in the SharePoint List Settings, they are also returned in the result set:

image

There are also some options to query folder. If you don't check the Folder option, only the root folder is queried. If you want to query also the sub folders you have to set the Look in all folders and sub folders option:

image

When clicking the Preview button this causes the query.ViewAttributes property to be set to "Scope = 'Recursive'".

You can also query a sub folder. In that case you have to check the Specify a folder option. The text box will become available and will contain the url of the root folder as initial value:

image

When clicking the Preview button this causes the query.Folder property to the specified folder.

To conclude an example of the Query option ExpandUserField:

image

If your list is of type Agenda, you can set the Meeting Instance ID to return only rows of a certain meeting instance:

image

When you click the Save button, the query will be saved to a dedicated list for the CAML Query Builder. The query options will be saved as an xml node. If you want to see the code snippets again, navigate to the Caml Query List which is a list that stores.  This list does not show up on the Quick Launch but can be accessed via the View All Site Content button.

Caml Query List

... and choose Preview CAML Query from the Edit Control Block.

Preview

A page opens where you can view the code snippets. The query is executed and the resulting rows are        displayed in a datagrid.

 

Enhancements for Calendar lists

CAML queries for calendar lists are a bit more complicated, especially when working with recurring events. You can execute a normal CAML query on such a list and that's ok for normal events. But you will never get back the entries that are created based on a recurring event.

If your list is a Calendar, you will have to more options in the Query Options section:

image

If you check the Expand recurrences option also the instances created for a recurring event will be returned in the result set.

image

You can choose a fixed day from a date picker or you can choose Today. In that case the query will take the date on which the query is executed into account. You can also choose to see the recurring instances for a day, a week, a month or a year. The tool sets the CalendarDate property of the query and adds a DateRangesOverlap part to the Where clause.

The generated VB.NET code snippet then looks as follows:

image

In my example I have an event that occurs twice a week. These are the rows returned by the query when asking for a monthly overview as of the 1st June.

image

Note that each instance of a recurring event has the same ID as the recurring event that produced it. To learn more about the internals of a calendar list, read this excellent blog post.

 

Hope you like these extensions!

SharePoint @ The Island of Jersey

This week I am teaching in one of the Channel Islands called Jersey for a company called Contract5. The moment I flagged my MSN account with the info, I got questions like: Where is Jersey? SharePoint there? Well, it is a British Crown dependency just off the coast of Normandy (France). So rather small, only about 90.000 habitants but SharePoint is big here since the island houses a lot of financial institutes with a very attractive tax rate. And it is now on my list of places I'd like to visit once with the girls.

Image:Uk map jersey.png

Jersey has a rich history (read more on Wikipidea) and lots of nice places to visit. I am staying at the brand new Radisson near the small marina bay in St. Helier (the capital) and just opposite of the Elizabeth Castle.

Where to deploy my Silverlight XAP file?

Tomorrow I'll leave for my annual trip to TechEd US again in Orlando. Looking forward to spend time with my buddies Ted and Fitz, and the rest of the gang in the US. Fitz recently joined Nintex and we'll have to catch up on that. From what I read on his blog, Nintex is not all about workflows. This week they'll RTM an interesting reporting layer for SharePoint. Read more here.

Anyway, I am presenting my session first thing Thursday morning. Title is 'Light Up Your SharePoint Web Site with Microsoft Silverlight and AJAX' and I'll highlight techniques for working with Web 2.0 technologies in the SharePoint space. Focus will be on hosting the Silverlight applications, your options to transfer data back and forth, consuming Web Services and WCF services, databinding and more.

I'll show also different techniques for the deployment of the Silverlight XAP file. A XAP file is basically a ZIP file containing the Silverlight solution components such as the compiled XAML and code-behind, an application manifest and possibly one or more assemblies delivering Silverlight controls. There are to me three possible places where you can make the XAP file available and the final choice depends more or less on your scope.

  • Drop the XAP in the ClientBin of your IIS Web Application. This is the most popular approach and also the technique we have used for the BluePrint samples. If you deploy in this location, it means that the Silverlight application can be picked up by any SharePoint code that runs in the site collections and sites hosted on the IIS Web Application. One of the disadvantages with this approach is that it is not directly possible to have the copying of the XAP in the ClientBin operation as part of the manifest of your SharePoint solution delivering the containers for the application (e.g. a Web Part). That is why for the BluePrint, we modified the SharePoint Solution Installer so that this extra step was taking care for.
  • Drop the XAP in the 12 Folder. If the Silverlight application needs to be scoped wider, you have for example a custom field type (which is by default a global deployed SharePoint solution), it is good to deploy in a sub folder of the 12\Template\Layouts or in the 12\Template\ControlTemplates folder. The 12\ISAPI is also a candidate. Deploying here means that you can include all of the deployment steps in your SharePoint Solution.
  • Drop the XAP in a Document Library. This is very often now my preferred choice of place to drop the XAP. You can create one central document library within your site collection (or if you want a more narrow scope, for your site) where to drop the XAP files. Just like with the previous 12 folder location, you can include this deployment also nicely in your Feature that for example makes available the Web Part hosting the Silverlight application.

Say for example that you create a Web Part with the Visual Studio Extensions for WSS 3.0. You can add the XAP file as part of the Feature folder in your Solution Explorer.

image

A Module and a File element in the element manifest file (lines 9-11) can take care of the provisioning of the XAP during the activation of the Feature in an existing document library (e.g. named XAPS).

   1: <Elements Id="c1f27c3d-0fab-46dc-b04d-a070b2713bbd" xmlns="http://schemas.microsoft.com/sharepoint/" >
   2:     
   3:   <Module Name="WebParts" List="113" Url="_catalogs/wp">
   4:       <File Path="HelloDevDays.webpart" Url="HelloDevDays.webpart" Type="GhostableInLibrary" >
   5:           <Property Name="Group" Value="DevDays Web Parts"></Property>
   6:       </File>
   7:   </Module>
   8:  
   9:     <Module Name="XAP" Url="XAPS">
  10:         <File Path="HelloDevDays.xap" Url="HelloDevDays.xap" Type="GhostableInLibrary" />
  11:     </Module>
  12:     
  13: </Elements>

In the Web Part where you create the Silverlight control, you then can point (line 6) to the XAP file using the following code:

   1: protected override void CreateChildControls()
   2: {
   3:     base.CreateChildControls();
   4:  
   5:     Silverlight ctrl = new Silverlight();
   6:     ctrl.Source = SPContext.Current.Site.RootWeb.Url + "/XAPS/HelloDevDays.xap";
   7:     ctrl.ID = "HelloDevDays";
   8:     ctrl.Width = new Unit(400);
   9:     ctrl.Height = new Unit(300);
  10:     ctrl.Version = "2.0";
  11:  
  12:     this.Controls.Add(ctrl);
  13:  
  14: }

As said, a nice and clean way to include all of the Silverlight files in your SharePoint Solution so that no more additional steps need to be taken after the deployment of the Web Part.

It is also very easy to upgrade the XAP files this way. But don't forget to clear your browser cache when you do this. Download the Internet Explorer Developer Toolbar to help you with that.

Making Business Data Searchable: Business Data Catalog or Custom Federated Search Connectors?

Assume that you are a company that has plenty of data locked up in SQL Server databases, Oracle database, or line-of-business (LOB) systems such as SAP, Siebel or Microsoft CRM, and there is the need to make all of that data searchable. What is a good choice? BDC or the newest technology in the search space called custom federated search connectors?

I was asking myself this question during a session of Michal Gideoni I attended yesterday. Michal did a great job explaining in one hour your options (from a dev perspective) with federated search and the extensibility of them. Here are some interesting bullet points I wrote down:

  • Federated search is at this moment only available if you install MSS (Microsoft Search Server) 2008 or MSSX (Microsoft Search Server Express) 2008. It seems there will be a 'rollup' hotfix end of June/early July that will make federated search also available in the MOSS 2007 search centers. You might want to subscribe to the search blog on http://blogs.msdn.com/enterprisesearch to follow-up on this one.
  • She demonstrated the use of patterns in the federated location definition using a nifty little tool called Expresso. Download a trial here. I like the mashup demo btw.
  • Microsoft released protocol handlers for Documentum and FileNet. Read more here.
  • Plenty of search-related tools, wrappers and docs are available in the Search Community Toolkit available on CodePlex.
  • The Web Parts that support the federated search are not sealed, so you can inherit from them and make them do what you want them to do. This is good news, I actually ignored this because all of the other search Web Parts are all sealed (still like to get an explanation on the why for that one).

So in all, a good session. Picked up nice new ideas for new demos. But back to the title of the posting: to BDC or to Custom FSC?

I foresee a great future for custom federated search connectors and it is so easy to build them. Basically a custom federated search connector is a layer (typically an ASPX page) sitting in front of the business data store accepting a query in the form of a template (e.g. http://litware:6500/searchindatabase.aspx?q=contoso). The query will be processed and translated internally in the ASPX and an RSS feed will be returned as output. This output is then picked up by the federated search Web Part and displayed to the user.

If the requirements are quite simple: no worries about ranking the results, well-defined queries (such as look for this in our customer database), no heavy requirements regarding security or caching, this technique of building custom connectors could be quite interesting instead of the heavy infrastructure and configurations you have to do for making that same data searchable via the Business Data Catalog. Of course, once the BDC is configured, you can do so much more with that data, but if it is only for searching... have a look at federated search.

Learn more also here on www.microsoft.com\enterprisesearch . I plan to come up with some additional material on this when I have the time.

Creating a Table Of Content in Reporting Services 2008

In Reporting Services 2005 we have document maps which creates an interactive table of content (TOC): every entry is a hyperlink. Nice for interactive use, but nearly useless when you render your report for printing. In print, we want page numbers, which is very, very hard to get in SSRS2005.

In Reporting Services 2008, it is still very hard to get a TOC with page numbers... unless you render to Word!

So, go ahead, deploy the Adventureworks Sample reports, open up the Product Catalog report and verify it has a document map in it. Next, export this to a Word file (it uses Word 2000, which can easily be used by Office 2002, 2003 and 2007 as well). Open up the Word file. There is no TOC in there, but we can create one explicitly. Only, do not be to hasty! If you create a regular TOC, it will use the Heading styles to place them in the TOC, and the Word export does not have Heading styled elements. So we have to create the TOC. In Word 2007, go to the References ribbon, click the Table Of Content button, and select Insert table of content. Next, click the Options... button, and in there, deselect Styles and select Table entry fields. Next, click twice the OK button and enjoy your TOC!

image