Stopwords

Some neutral complementary words in a human language can be frequently used in texts. For example the english words are, it and was. Such neutral words could potentially distort a search result completely:

  • A neutral word in the query can result in lots of irrelevant hits
  • Due to its frequent occurrence in texts it can also be scored high, i.e. lots of the "top hits" are completely irrelevant.

Such words are called stopwords. A index field that is stopword-aware completely ignores these words. They are not indexed at all by the field (hence not part of the field's queryable data).

Index language is important

Stopwords are language-dependant. The index language determines what stopwords to use.

The Index language determines the Stopwords that are used

Swedish Stopwords

  • alla, allt, att, av
  • blev, bli, blir, blivit
  • de, dem, den, denna, deras, dess, dessa, det, detta, dig, din, dina, ditt, du, då, där
  • efter, ej, eller, en, er, era, ert, ett
  • från, för
  • ha, hade, han, har, henne, hennes, hon, honom, hur, här
  • i, icke, ingen, inom, inte
  • jag, ju
  • kan, kunde
  • man, med, men, mellan, mig, min, mina, mitt, mot, mycket
  • ni, nu, någon, något, några, när
  • och, om, oss
  • samma, sedan, sig, sin, sina, sitta, själv, skulle, som, så, sådan, sådana, sådant
  • till
  • under, upp, ut, utan
  • vad, var, vara, varför, varit, varje, vars, vart, vem, vi, vid, vilka, vilkas, vilken, vilket, vår, våra, vårt
  • åt
  • än, är
  • över

English Stopwords

  • a, an, and, are, as, at
  • be, but, by
  • for
  • if, in, into, is, it
  • no, not
  • of, on, or
  • such
  • that, the, their, then, there, these, they, this, to
  • was, will, with

When Stopwords are modified, you should typically re-index all data (so already indexed data will correspond and act according to the "new" Stopwords setup)