Indexes are usually critical to a commerce solution; both storefronts, product catalogs, search experiences, filtering, user lookups, and integrations depend on them. When indexes are healthy, search is fast and predictable. When they are stale, incomplete, or corrupted, users can see outdated results, missing content, poor relevance, or failures in dependent features.
In production, the goal is usually not just to build indexes quickly. The goal is to keep at least one good index available at all times while updates are applied safely in the background. This article is for the person responsible for keeping a site fast, searchable, and up to date in a production environment.
It complements the other parts of the implementer documentation by focusing on operations:
- Which balancing strategy to use
- When and how often to rebuild indexes
- What load index builds place on a solution
- How to recognize when something has gone wrong
- How to recover safely without taking a good index instance offline
Balancing strategies
An index can have more than one instance. This allows one instance to serve queries while another instance is being rebuilt. How exactly this happens is controlled by the index balancing strategy - there are two possibilities:
ActivePassive: One instance is considered online, and another instance is rebuilt in the background. This is often the easiest strategy to reason about operationally because it gives a clear "current serving instance" and a clear standby instance.LastUpdated: The newest healthy instance becomes the preferred instance. This can be useful when you want the most recently completed build to become active automatically.
The balancing strategy is selected when you create an index - afterwards, it can be changed via the context menu for the index in the settings tree.
Rebuild frequencies
How often you rebuild an index depends on how quickly data changes and how expensive a rebuild is.
Small indexes are typically fast to rebuild and have limited effect on CPU, disk, and memory.
These are often suitable for:
- frequent scheduled rebuilds
- rebuilds after content changes
- daytime rebuilds if traffic is modest
Examples:
- small content indexes
- limited product assortments
- user indexes with moderate volume
The important thing is to choose a rebuilt cadence which matches the freshness requirements for the index. For this you can use business logic:
- Search must reflect changes almost immediately - rebuild often or trigger targeted updates where possible
- Search can lag by minutes - rebuild on a short schedule
- Search can lag by hours - rebuild during maintenance windows
Understand the operational impact of a rebuild
The right rebuild schedule is usually a balance between freshness and stability. Rebuilding too often can create unnecessary pressure on the solution without improving the user experience in a meaningful way.
An index rebuild can affect:
- application CPU usage
- database load
- disk throughput
- memory pressure
- background task concurrency
- query latency if the same machine is already under load
Watch for these symptoms:
- slow front-end responses during build windows
- increased SQL response time
- longer-than-normal build duration
- multiple heavy background jobs competing for resources
- queues of scheduled tasks falling behind
If builds are expensive, avoid stacking them with:
- imports
- synchronization jobs
- feed generation
- batch updates
- deployment operations
When to rebuild manually
You can rebuild an index automatically and on a schedule and manually - in a production scenario, manual rebuilds are only appropriate when:
- an index schema changed
- relevance behavior changed significantly
- index data was deleted
- the UI shows an interrupted or failed instance that has not recovered automatically
- the build state is inconsistent after maintenance or infrastructure incidents
You should avoid manual rebuilds when a build is already running, the underlying problem is still present, or you are about to restart the environment again.
A healthy production setup
A healthy setup should aim for all of the following:
- at least one good index instance is always available for queries
- unfinished or interrupted builds are visible
- a failed or incomplete instance is repaired before other standby instances are rebuilt
- a solution restart does not leave an index permanently stuck in an ambiguous state
- operators can tell the difference between the last attempted build and the last successful build
To help you achieve this you may find the operations checklist useful.
Daily operations checklist
Check these regularly on busy solutions:
- When was the last successful build for each critical index?
- Is any index instance stuck in
StartingorRunningfor too long? - Is there an
InterruptedorFailedinstance waiting for repair? - Are builds finishing within the expected time window?
- Has index size or duration changed significantly after imports or schema changes?
- Does one healthy instance remain available while another is rebuilding?
Operational dashboards and alerts
Good operational dashboards usually include:
- Current lifecycle state
- Current online instance
- Last attempted build
- Last successful build
- Last heartbeat
- Build duration trend
- Failure reason or interruption reason
For business-critical indexes, you should also monitor and alert on:
- No successful build within expected time window
- An instance stuck in
Runningbeyond expected duration - Any
FailedorInterruptedinstance - Missing healthy standby instance
- Repeated long build durations
Useful metrics include:
- Build duration
- Time since last successful build
- Active instance count
- Failure count
- Interruption count
- Index size growth over time
Signs that your indexing strategy needs improvement
Consider revisiting design or scheduling if you see any of these patterns:
- Builds regularly overlap with peak traffic
- Indexes are large enough that rebuild windows are difficult to complete
- One instance stays stale for a long time while others keep rebuilding
- Operators cannot tell whether the latest build was successful
- Troubleshooting depends on guessing rather than clear state and logs
Possible improvements include:
- Adding or using multiple index instances
- Moving rebuilds to quieter periods
- Reducing rebuild frequency where freshness requirements allow it
- Using better monitoring around duration, failures, and interruption events
- Reviewing whether
ActivePassiveorLastUpdatedbetter fits the solution
Common failure modes
Index builds can fail in several ways:
- The application pool or process restarts during a build
- The server runs out of disk space
- The build process throws an exception
- The source data is temporarily unavailable
- A long-running build is interrupted by deployment or infrastructure maintenance
- A machine is recycled or restarted before the build reaches completion
- Diagnostics or state files become unreadable
The most important operational distinction is this:
Failedmeans the build ended with an explicit failureInterruptedmeans the build did not reach a clean end, typically because the process stopped unexpectedly
These conditions should not be treated as normal completion.
What happens when a build fails or is interrupted
The exact behavior depends on configuration and version, but operationally you should expect the following principles:
- The last good instance should remain the safe instance for queries
- An incomplete instance should not be treated as healthy
- The next recovery attempt should target the broken or unfinished instance first
- After restart, stale running builds should be detected and moved into a recoverable state
Important
If one instance is incomplete, do not continue rotating rebuilds across the other instances as if nothing happened. Repair the incomplete instance first so you keep a clear good instance and a clear repair target.
Debugging a bad index state
When search results look stale or a build appears stuck, start with these questions:
- Is there at least one healthy instance available?
- Which instance was last built successfully?
- Which instance is currently serving traffic?
- Is another instance in
Running,Failed, orInterruptedstate? - Was there a recent restart, deployment, or infrastructure event?
Then inspect:
- Index status in the UI
- Diagnostics logs and tracker history
- The durable build-state file, if available
- Recent application restarts
- Disk space and machine health
Safe recovery approach
When an index is in a bad state, use a cautious recovery sequence:
- Confirm that one healthy instance is still available for queries.
- Identify the broken or incomplete instance.
- Review the latest failure or interruption details.
- Check for environmental causes such as disk space, restarts, or data-source issues.
- Rebuild the affected instance first.
- Verify successful completion before rotating or rebuilding other instances.
If the solution was restarted during a build, a good operational model is:
- Detect stale running state
- Mark the build as interrupted
- Restart the repair on that same instance
- Keep the healthy instance serving traffic until repair is complete
Files and folders to know
The default production locations are:
- Index data:
/Files/System/Indexes - Diagnostics and tracker history:
/Files/System/Diagnostics - Durable build state:
/Files/System/Diagnostics/IndexBuildState/{repository}/{index}/{instance}/state.json
Tracker history for an instance is stored under a repository, index, and instance path below the diagnostics folder, typically with dated subfolders for each run.
Operationally:
- The index folder contains the actual built index data used by the solution
- Tracker history contains build history, progress, and failure details for previous runs
state.jsoncontains the latest durable lifecycle state for an index instance
The durable state.json file typically contains information such as:
- repository
- index name
- instance name
- current lifecycle state
- build name
- operation id
- start time
- last heartbeat
- finish time
- last successful build time
- resume cursor, if supported
- server name
- error summary and details
This file is intended to answer operational questions like:
- Is this instance still building?
- Did it fail explicitly or get interrupted?
- When was it last known healthy?
- Should this instance be repaired before another one is rebuilt?
Can these files be deleted?
Yes, but not casually.
Deleting tracker history removes build history and log context. It does not remove the index data itself.
This can be acceptable when:
- you want to clean up old history
- you are sure no build is currently running
- you do not need the old diagnostics for troubleshooting
This is risky when:
- a build is in progress
- you still need failure details
- you expect the system to infer recent behavior from tracker history