Table of Contents

File indexes

A file index is used to index data about files on the system – NOT the content of the files. As such you can use file index to e.g. create searchable media libraries over images, pdf-datasheets, and so on.

It currently supports the following file formats:

  • PDF
  • GIF
  • JPG
  • JPEG
  • PSD
  • BMP
  • PNG
  • TIFF
  • TIF
  • AI

To create a file index:

  1. Go to Settings > Repositories and open/create a repository.
  2. Under the indexes section, click manage.
  3. Click New index.
  4. Provide a Name to the index.
  5. Select a Balancer
    • Dynamicweb.indexing.balancing.ActivePassive selects the next instance on the list of instances – so if instance A is unavailable (building, has failed), instance B will be used unless it’s unavailable, in which case instance C will be used, and so on.
    • Dynamicweb.indexing.balancing.LastUpdated directs operations to the most recently updated index, ensuring users interact with the freshest data.
  6. Click Save and close.

On solutions with heavy traffic and frequent product data updates we recommend using the LastUpdated mode to ensure that visitors are always shown the most recently updated product data. On solutions with only two instances (the vast majority of solutions) it is not necessary to select a balancer mode, as the “other index” will always be used when an index is unavailable.

This creates an empty index. You should now add instances to it. FileIndexes_01

Adding Instances

An instance refers to a specific file stored in the file archive. When a query is executed, it's this file that gets searched. It's common for instances to be rebuilt regularly to incorporate the latest changes to product data. For this reason, it's recommended to maintain at least two instances. Having multiple instances ensures that while one is being updated or rebuilt, the other remains available for searches.

  1. In the Indexes section in your repository, enter the index you want to add an instance to
  2. Click the Actions button the on top right corner and select Manage instances
  3. Click New instance
  4. Provide a name – you could call the first instance ‘Images A’ and the other instance ‘Images B’
  5. Select the LuceneIndexProvider
  6. Specify a folder to place the instance file under.
  7. Click Save and close
  8. Repeat the process for the second instance

Once created, the instances will look like this:

FileIndexes_02

When an instance is built a set of index files are generated under System > Indexes > YourIndexName > YourInstanceName – but before you can build it you must create a build configuration.

Adding a Build Configuration

So now that you have two instances you want to build them – to do so, you need to create a build definition. Each type of index has a specific builder associated with it – in the case of a file index, this builder is helpfully called the FileIndexBuilder.

To add the build configuration:

  1. Enter the Index in which you want to create a Build
  2. Under the Builds section, select Manage
  3. Click New build
  4. Provide a name
  5. In the Builder section, select Dynamicweb.Content.Files.FileIndexBuilder - this opens a selection of builder settings
  6. Choose the Builder action. Currently, only the Full build option, which rebuilds the entire index, is available.
  7. Review the builder settings FileIndexes_03
  8. Set up Notifications if appropriate
  9. Click Save and close

The following builder settings are available – please review carefully to see if any of them are relevant for your setup:

Setting Value Comments
Recursive Boolean – defaults to checked This setting controls whether subfolder content is indexed – by default it is.
StartFolder Defaults to Files Controls which folder to index
SkipMetadata Boolean – defaults to unchecked Check to skip metadata like EXIF, XMP, IPTC on image files
SkipDynamicwebMetadata Boolean – defaults to unchecked Check to skip Dynamicweb metadata

Now you’ve specified how you want the index to be built – next, you should specify what you want to include in the index.

Adding Fields

Lucene indexes are composed of small documents, with each document divided into named fields which contain either content which can be searched or data which can be retrieved. Each field added to the index can therefore be stored, indexed, and analysed depending on what you want to use it for:

  • Stored fields have their values stored in the index
  • Indexed fields can be searched, the value is stored as a single value
  • Analysed fields have their values run through an analyser and split into tokens (words)

Generally speaking:

  • A field you want to display in frontend must be indexed
  • A field where you want to search for part of the value in free-text search must be analysed
  • A field which are to be published using the Query publisher should be Stored
  • A field you want to display as facets should be indexed, but not analysed

To make things (a lot) easier for you, we’ve created a default set of fields typically used in files indexes – this default field set is defined in something called the FileIndexSchemaExtender.

To add the fields from the schema extender to the index:

  1. Click the Fields tab
  2. Under the Fields section, click Manage
  3. Click New index field and select Schema extender
  4. Provide a name
  5. Provide a system name
  6. In the Field section, select FileIndexSchemaExtender
  7. In the Settings section, select the fields you want to Include
  8. Click Save and close

This adds a whole bunch of fields to the index.

FileIndexes_04

The following standard data is indexed:

  • File name
  • Directory path (/Files/whatever/Folder/OtherFolder/)
  • Directory (OtherFolder)
  • ParentDirectory (Folder)
  • RootDirectory (Files)
  • Extension (i.e. jpg, png, txt etc)
  • Filesize in bytes
  • LastWriteTime

The following fields are generated:

  • FileFullName - file path and name
  • Date created time/Date created time UTC
  • Last access time/Last access time UTC
  • Last write time UTC
  • Is read only

We also index metadata (EXIF, XMP, and IPTC) for certain types of (image) files – currently .pdf, .gif, .jpg, .jpeg, .psd, .bmp, .png, .tiff, .tif, and .ai.

For each field you can see the Name, System Name, and Type – and whether the field is stored, indexed and analyzed. You can also add fields to the index manually – see the Indexes article.

Building the Index

Once you’ve added instances, a build configuration, and a set of fields to the index, you should build it – to do so, click the Build button beneath each instance you want to build.

FilesIndexes_05

Of course, you don’t want to do this manually every time – you want to rebuild the index on a schedule – see the article on tasks.

To top