Searcher

Searcher is the Sitevision API entry point to query a Search Index. It is assembled using a SearcherBuilder instance. A search returns a SearchResult, which typically contains zero or more SearchHit.

var searchFactory = require('SearchFactory');
var searcherBuilder = searchFactory.getSearcherBuilder();

// Create a generic Searcher (no explicit SearcherBuilder state set)
// - it will query the Standard Index
// - it will use the ExtendedDismaxParser
// - it will use the default legacy permission checking strategy
// - query will not be logged
var searcher = searcherBuilder.build();

// Execute
var searchResult = searcher.search('my query', 10);

// Process the SearchResult
if (searchResult.hasHits()) {
   var hits = searchResult.getHits();
   while (hits.hasNext()) {
      var searchHit = hits.next();
      /* Process hit data */
   }
} else {
   /* Handle no hits at all */
}

The behaviour of a Searcher is stipulated via it's assembly of components. The following components are available:

  • Parser - what parser should be used to handle the queries, i.e. what field/fields should be queried, how should the hits be boosted?
  • Filter - how should the result be limited?
  • Sort - how should the result be sorted?
  • Highlight - which fields in the result should be highlighted and how?
  • SpellCheck - should the search engine try to get suggestions (did you mean)?
  • Monitor - how should querying be monitored? (e.g. should search query logging mode be on or off?)
  • PermissionCheck - how should permission checks be performed? (Note! Only applicable when querying a sv:nodeIndex)

All components and are assembled using their component-specific Builder. And Searcher itself is assebled using a SearcherBuilder. All components are optional.

What Index to use is also specified via SearcherBuilder. If no specific index is set, the built Searcher will query the Standard Index.

Parser

The Parser component determines how the query string should be transformed to a format the search engine understands. A Parser can also determine what default index fields to use etc. There are two Parsers available:

  • ExtendedDismaxParser. This is the default parser and it has lots of options to tweak querying and the result. It is assembled using a ExtendedDismaxParserBuilder.
  • StandardParser. This is a simple but robust parser. It is assembled using a StandardParserBuilder.
var searchFactory = require('SearchFactory');

// Create ExtendedDismaxParser
var parser = searchFactory.getExtendedDismaxParserBuilder()
                .setMinimumShouldMatch('50%')
                .addQueryField('title.analyzed')
                .addQueryField('headings.analyzed', 1.2)
                .addQueryField('content.stemmed.analyzed', 0.5)
                .addQueryField('metadata.analyzed.keywords')
                .build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setParser(parser) // Set parser
                  .build();

// Execute
var searchResult = searcher.search('my query', 10);

Filter

The Filter component is used to specify the filter queries (fq) Searcher should use. A filter query is a "normal" query that is not scored (i.e. will not have any impact on hit scoring). It is typically used to limit what sections of the search index to query. A Filter is assembled using a FilterBuilder.

var searchFactory = require('SearchFactory');

// Create Filter (find "new" articles only)
var filter = searchFactory.getFilterBuilder()
                .addFilterQuery('+svtype:article')
                .addFilterQuery('+lastpublished:[NOW-12MONTHS TO *]')
                .build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setFilter(filter) // Set filter
                  .build();

// Execute
var searchResult = searcher.search('my query', 10);

Sort

A search result is always sorted! If no explicit sorting is specified, the hits are "sorted" by their hit score, i.e. "best hits first". Scoring is performed by the search engine and score can also be adjusted ("boosted") via ExtendedDismaxParser.

Searcher also allows for custom sorting. Such custom Sort is assembled using a SortBuilder, where SearchSortField instances are added.

var searchFactory = require('SearchFactory');

// Create Sort (last published date, descending)
var sortField = searchFactory.getSearchSortField('lastpublished', false);
var sort = searchFactory.getSortBuilder()
              .addSortField (sortField)
              .build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setSort(sort) // Set sort
                  .build();

// Execute
var searchResult = searcher.search('my query', 10);

Highlight

The Highlight component is used to mark up the query terms in the search hit excerpt. A highlighted field must be stored and typically analyzed. Highlight is assembled using a HighlightBuilder. The (potentially) highlighted value of each hit is fetched via the SearchHit.getHighlightedField method.

var searchFactory = require('SearchFactory');

// Create Hightlight (for the "summary" field)
var HL_FIELD_NAME = 'summary';
var highlight = searchFactory.getHighlightBuilder()
                  .addHighlightedField(HL_FIELD_NAME)
                  .setFragmentPreString('<mark>')
                  .setFragmentPostString('</mark>')
                  .build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setHighlight(highlight) // Set highlight
                  .build();

// Execute
var searchResult = searcher.search('my query', 10);

// Process the SearchResult
if (searchResult.hasHits()) {
   var hits = searchResult.getHits();
   while (hits.hasNext()) {
      var searchHit = hits.next();
      
      /* Process hit data */
      var excerptText = searchHit.getHighlightedField(HL_FIELD_NAME, 200); // Get highlighted data
      /* ... */
   }
} else {
   /* Handle no hits at all */
}

SpellCheck

The SpellCheck component enables the search engine to calculate potential suggestions (typically also referred to as "did you mean"). The result is a list of Suggestion that is fetched via the SearchResult.getSuggestions method. Note that there can be suggestions regardless of the search result has any hits or not. SpellCheck is assembled using a SpellCheckBuilder.

var searchFactory = require('SearchFactory');

// Create SpellCheck
var spellcheck = searchFactory.getSpellCheckBuilder().build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setSpellCheck(spellcheck) // Set spellcheck
                  .build();

// Execute
var searchResult = searcher.search('my query', 10);

var suggestions = searchResult.getSuggestions(); // Get spellcheck result
if (!suggestions.isEmpty()) {
   /* Handle Suggestions */
}

if (searchResult.hasHits()) {
  /* Handle search hits */
}

Monitor

The Monitor component determines if a query should be logged or not. A Searcher without Monitor will not log the query at all. The query logging shows up in the statistics of the queried index. Typically you only want such logging for user-driven queries (i.e. not for basic search-based listings et al). Monitor is assembled using a MonitorBuilder.

var searchFactory = require('SearchFactory');

// Create Monitor
var monitor = searchFactory.getMonitorBuilder().build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setMonitor(monitor) // Set monitor
                  .build();

// Execute (this search will be logged)
var searchResult = searcher.search('a query from an end-user', 10);

PermissionCheck [@since 2023.09.1]

The PermissionCheck component determines the strategy to use when checking permissions for the search hits. PermissionCheck is assembled using a PermissionCheckBuilder.

Strategy/behaviour for a PermissionCheck is specified via PermissionStrategy:

  • PermissionStrategy.EARLY_CHECK:
    • Early strategy uses a user-specific filter query to restrict the hits retrieved from the search engine.
    • The default strategy for the PermissionCheck component.
  • PermissionStrategy.LATE_CHECK:
    • Late strategy filters all hits after they have been retrieved from the search engine (but before they are exposed to the caller).
    • This strategy should typically only be used when Sitevision READ permissions are based on other criterias than groups and users. Typically when IP requirements or such are needed to fulfill a role.
var searchFactory = require('SearchFactory');
var CHECK_STRATEGY = require('PermissionStrategy.EARLY_CHECK');

// Create PermissionCheck with appropriate strategy
var permissionCheck = searchFactory.getPermissionCheckBuilder()
                         .setPermissionStrategy(CHECK_STRATEGY)
                         .build();

// Note! EARLY_CHECK is default strategy for PermissionCheck, so this would be enough
// var permissionCheck = searchFactory.getPermissionCheckBuilder().build();

// Create Searcher
var searcher = searchFactory.getSearcherBuilder()
                  .setPermissionCheck(permissionCheck) // Set PermissionCheck
                  .build();

// Execute (early permission checking strategy will be used)
var searchResult = searcher.search('a query from an end-user', 10);

Unspecified PermissionCheck

A Searcher without any PermissionCheck component will implicitly use the legacy default strategy (a mix of EARLY/LATE checks).

var searchFactory = require('SearchFactory');

// Create Searcher
var searcher = searchFactory.getSearcherBuilder().build();

// Execute (default permission checking strategy will be used)
var searchResult = searcher.search('a query from an end-user', 10);

PermissionCheck is only applicable when the Searcher queries a sv:nodeIndex (i.e. the Standard Index or a Custom Index).

PermissionCheck was introduced in Sitevision 2023.09.1

Index

Searcher queries the Standard Index by default but another index can be queried using the setIndex method of SearcherBuilder. See specific index for code example:

Builder state

All Builders are stateful. A Builder can be re-used to create multiple instances of the object it is targeted to build.

It is NOT recommended to import any Builder in a WebApp2.

When a Builder is imported in two files in an app, both files will share the identical Builder instance. This can cause subtle bugs and confusion due to shared state.

Import SearchFactory instead, and use it to get the Builder you need. This will guarantee a new (non-shared) Builder instance.

Potentially dangerous:

import searcherBuilder from '@sitevision/api/server/SearcherBuilder'
...
const searcherInstance = searcherBuilder.build(); 

Always safe:

import searchFactory from '@sitevision/api/server/SearchFactory'
...
const searcherBuilder = searchFactory.getSearcherBuilder();
const searcherInstance = searcherBuilder.build();