Indexes

An index is a data structure optimized for data retrieval – which means that it’s much faster to query an index than it is to search through a database directly. You can create the following types of indexes in DynamicWeb:

Each of these is used to index a particular type of content on a solution, except for the SQL index which is a flexible option when you need to create non-standard indexes. This won’t be necessary very often – most solutions will need a Product-index, a Content-index, and maybe a User-index. But it’s there for those edge-cases that always pop up.

An index contains the following elements:

Instances – the files being searched when a query is executed. You typically want at least 2, so there’s always one to query while the other one is being built
Build configurations – instructions detailing exactly how you want the index to be built. Various settings allow you to tweak how the builder behaves in more advanced scenarios
Fields – used to specify exactly which data to put in the index. In most cases you will use a default configuration based on a schema extender we deliver out of the box, but you will also sometimes add fields manually when you want a different behaviour than the one we have preconfigured
Field types – custom field types used when you want to analyze field content in a non-standard manner before adding it to the index

Instances and build configurations are covered in our index-specific articles.

Fields & SchemaExtenders

Lucene indexes are structured as collections of small documents divided into named fields. These fields can either hold content that's searchable or data that can be fetched later. All index fields can be stored, indexed, and/or analysed:

Stored fields have their values stored in the index
Indexed fields can be searched, the value is stored as a single value
Analysed fields have their values run through an analyzer and split into tokens (words)

To put this into context, generally speaking:

A field you want to display in frontend must be indexed
A field where you want to search for part of the value in free-text search must be analysed
A field which will be published using the Query publisher should be stored
A field where you want to display the field values as facets should be indexed but not analysed

For our standard index types – product, content, user, and file – we’ve created predefined field collections which are added via so-called SchemaExtenders:

The ProductIndexSchemaExtender adds product fields to an index
The ContentIndexSchemaExtender adds page and item fields to an index
The UserIndexSchemaExtender adds user fields and a set of calculated user-related fields to an index
The FileIndexSchemaExtender adds file-related fields to an index

To add fields to an index using a SchemaExtender:

Open the repository and switch to the Fields-tab
Click Manage and then New index field
Select Schema extender
Specify a name and a system name
Select the appropriate schema extender
Save

After saving, all fields from the schema extender are added to the index. SchemaExtenderFields

For all of these collections, we’ve had to decide how the fields are indexed - if they're stored, indexed and analysed. And while the predefined field collections should meet about 95% of your needs, there will be scenarios where you want to manually add fields to the index because you need e.g. a string field which isn't analyzed so you can use the values for facets, etc.

In those cases you can manually add fields - you have access to:

Simple fields
Summary fields
Grouping fields

Simple fields

A simple field is an index field which maps e.g. a single product attribute to an index field.

To add as simple field to an index:

Open the repository and switch to the Fields-tab
Click Manage and then New index field
Select Field
Provide a name and a system name
In the Field-section:
- Select a field type matching the data you're indexing
- Optionally set a boost value to rank search results matching this field higher than default
- Check Stored, Indexed, and Analyzed as appropriate
Select a source field
Save

You can select any source field made available by the index builder - typically standard fields - as well as custom sources made available by a builder extender.

Summary fields

Summary fields takes data from multiple source fields, maps it to a single index field, converts it to text, and splits the field into tokens on whitespace.

To create a summary field:

Open the repository and switch to the Fields-tab
Click Manage and then New index field
Select Summary field
Provide a name and a System name
In the Field section:
- Optionally set a boost value to rank search results matching this field higher than default
- Check Stored, Indexed and/or Analyzed as appropriate
Add a number of source fields
Click Save and close

Summary fields are always of the System.String-type and the content is always converted to a string.

Grouping fields

Grouping fields make it possible to group values together under a label. This can be used to create facet-friendly index entries from values not inherently suited to faceting, e.g. if a field has too many different terms.

For example, specific colors like "navy blue", "sky blue", and "electric blue" can all be grouped under the label "blue" for simplified searches and cleaner data presentation.

To create a grouping field:

Open the repository and switch to the Fields-tab
Click Manage and then New index field
Select Grouping field
Provide a name and a system name
In the Field section:
- Select the data type you want the source data to be indexes as
- Optionally set a boost value to rank search results matching this field higher than default
- Check Stored, Indexed and/or Analyzed as appropriate
Select a source field
Specify a number of groups with non-overlapping intervals befitting the data
Click Save and close

When you create a facet based on the grouping field, each facet will show the values in the selected interval.

Field types & analyzers

All (analysed) standard fields added to an index use the Lucene StandardAnalyzer to parse and tokenize the source data before adding it to the index. It behaves in the following manner:

Splits words at punctuation characters, removing punctuation. However, a dot that’s not followed by whitespace is considered part of a token
Splits words at hyphens, unless there’s a number in the token, in which case the whole token is interpreted as a product number and is not split
Recognizes email addresses and internet hostnames as one token

It also has a built-in list of stop words which will not be indexed (and cannot be searched for). The default stop words are: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with. You can override the default list by creating a file called stopwords.txt in the /Files/System/Repositories folder – the format must be 1 stop word per line.

While the StandardAnalyzer is perfectly adequate for most applications, you may occasionally need to do more advanced processing of the data before it is indexed.

To create custom field type using a different analyzer:

Open the repository and switch to the Fields-tab
In the Field-types section click Manage
Click New field type
Provide a name and select a data type
Optionally set a boost value to rank search results matching this field higher than default
Select an analyzer to use
Save