Copyright© 2008-2022 Sitevision AB, all rights reserved.
@Requireable(value="QueryStringUtil") public interface QueryStringUtil
An instance of the Sitevision class implementing this interface can be obtained via
SearchFactory.getQueryStringUtil()
.
See SearchFactory
for how to obtain
an instance of the SearchFactory
interface.
Modifier and Type | Field and Description |
---|---|
static String |
MATCH_ALL_QUERY
The "match all" query string.
|
Modifier and Type | Method and Description |
---|---|
String |
getDateAsString(Date aDate)
Returns a date formatted according to the Solr date string representation.
|
String |
getFieldQuery(String aFieldName,
String aValueExpression)
Returns a field query that is properly grouped.
|
String |
removeQuerySyntaxChars(String aQueryString)
Removes all query syntax characters from a query string and trims the result.
|
String |
removeQuerySyntaxChars(String aQueryString,
boolean aLenientRemove)
Removes query syntax characters from a query string and trims the result.
|
String |
smartWildcard(String aQueryString)
Gets a prefix/wildcard query that potentially will be scored.
|
String |
splitCollectionToQueryParts(Collection<String> aStringsToSplit,
String aSplitExpression)
Transforms multiple strings with delimiters to a string that could be used in a field-grouped query expression.
|
String |
splitToQueryParts(String aStringToSplit,
String aSplitExpression)
Transforms a string with delimiters to a string that could be used in a field-grouped query expression.
|
String |
stripLocalParams(String aQueryString)
Strips Local params for a query string.
|
String |
stripTrailingAnyChars(String aQueryString)
Strips all trailing "any" chars.
|
static final String MATCH_ALL_QUERY
This is the special query syntax ("*:*"
) to use when querying "everything".
A common misunderstanding is that a single wildcard (i.e. "*"
) would also query "everything". That is a false assumption.
A single wildcard is less efficient and it will only match docs that has data in the default query fields of the parser (i.e.
a single wildcard will potentially not include "everything").
String stripTrailingAnyChars(String aQueryString)
The question mark character is a query syntax char (the "any" char) and can potentially screw up querying (i.e. the parser fails to parse the query or return unexpected result). This method removes all trailing "any" chars (i.e. removes all trailing question marks).
aQueryString | Returned |
---|---|
"when is halloween" | "when is halloween" |
"when is halloween?" | "when is halloween" |
"when is halloween??" | "when is halloween" |
aQueryString
- the query stringString stripLocalParams(String aQueryString)
Local params are a query string prefix that starts with "{!
" and ends with "}
".
The Local Params can override/sidestep or affect desired search behaviour. This method strips Local params to prohibit that.
Leading whitespace of Local params will also be stripped.
aQueryString | Returned |
---|---|
null | null |
"" | "" |
"hello query" | "hello query" |
"{!}" | "" |
"{!}hey" | "hey" |
"{!whatever}foo" | "foo" |
"{!whatever} bar" | " bar" |
" {! whatever }baz" | "baz" |
"{!whatever" | "{!whatever" |
aQueryString
- the query stringString removeQuerySyntaxChars(String aQueryString)
Current query syntax characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
Note! This is a legacy shortcut for (strict/non-lenient)
removeQuerySyntaxChars(aQueryString, false)
.
aQueryString
- a non-null query expressionremoveQuerySyntaxChars(String, boolean)
String removeQuerySyntaxChars(String aQueryString, boolean aLenientRemove)
Current query syntax characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
Processing:
"ma?nus" -> "manus"
"ma&&nus" -> "ma&nus"
and "ma||nus" -> "ma|nus"
aLenientRemove
is true
. Lenient behaviour will
try to keep all dashes that can be interpreted as "word separators" ("bindestreck" in swedish).
"This is *so* funny!" -> "This is so funny"
aQueryString | aLenientRemove | Returned |
---|---|---|
"(Site?vision: *Enterprise) !?" | true / false | "Sitevision Enterprise" |
"Anna-Karin?" | true | "Anna-Karin" |
"Anna-Karin?" | false | "Anna Karin" |
aQueryString
- a non-null query expressionaLenientRemove
- whether or not to handle syntax chars in a lenient matterString smartWildcard(String aQueryString)
The general purpose/advantage of a raw wildcard query (i.e Prefix query) is that it will result in hits also for a partial word. Typical a good thing for all "live-search/type-ahead" solutions. The downside is that the search result of such query can be a real mess since all wildcard-hits are scored exactly the same ("constant scoring"). In practice, this means that the hits of such search result can show up in random order.
This method returns a "smart" wildcard query that combines the prefix-matching advantage of a raw wildcard query with potential scoring capabilities. This is achieved by a expanding the word to multiple terms and adding the wildcard to one of them and use an implicit OR. In other words: "build a query that matches the exact word or the wildcarded word".
The query string "Car"
transformed into a smart wildcard query "+(Car car*)"
could conceptually result in a search result like this:
The word that is wildcarded will also be lowercased for better matching (typically the query parser is primarily set up to use/query
analyzed fields, i.e. typically lowercased).
A word with a dash is potentially further duplicated for increased matching (dash is the "any" syntax char but is handled lenient).
A word that ends with a syntax character
will typically not be wildcarded at all.
A word that contains a syntax character will typically get a raw wildcard as-is.
aQueryString | Returned |
---|---|
null | null |
" " | null |
"Car" | "+(Car car*)" |
"Car*" | "+(Car car*)" |
"Car?" | "Car?" |
"title:Car" | "title:Car*" |
"Anna-Carin" | "+(Anna-Carin AnnaCarin (+Anna +carin*) anna-carin* annacarin*)" |
"019-173030" | "+(019-173030 019173030 (+019 +173030*) 019-173030* 019173030*)" |
The smart wildcard query downside/caveat is that the actual query is more complex. This increased complexity will typically distort the pattern matching for the Solr Elevation component, i.e. "elevated/sponsored" hits will typically never work for smart wildcard queries.
aQueryString
- the query stringString splitToQueryParts(String aStringToSplit, String aSplitExpression)
This is a convenience method when you want to query something based on items in a string that are delimited by some token. A typical example is a "keyword" metadata that contains multiple keywords delimited by a comma char.
This method splits the aStringToSplit
with the aSplitExpression
and each part is
then trimmed and appended to the resulting string, separated with a space. Parts that contains a space char
is quoted.
aStringToSplit | aSplitExpression | Returned |
---|---|---|
"one" | "," | one |
"one,two" | "," | one two |
"one, two" | "," | one two |
"one, two, three four" | "," | one two "three four" |
"one" | "aNonMatchingExpression" | one |
"one,two" | "aNonMatchingExpression" | one,two |
"one, two" | "aNonMatchingExpression" | "one, two" |
"one, two, three four" | "aNonMatchingExpression" | "one, two, three four" |
null | "," | null |
null | null | null |
"one" | null | one |
"one,two" | null | one,two |
"one, two" | null | one, two |
"one, two, three four" | null | one, two, three four |
aStringToSplit
- the string that should be transformedaSplitExpression
- the regular expression to split up aStringToSplit
in partsaStringToSplit
is null
, null
will always be returned.
if aSplitExpression
is null
, aStringToSplit
will always be returned.
if aSplitExpression
is a non-matching expression, a trimmed aStringToSplit
will
always be returned, and it will be quoted if aStringToSplit
contains a space char.String splitCollectionToQueryParts(Collection<String> aStringsToSplit, String aSplitExpression)
This is a convenience method that executes splitToQueryParts(String, String)
for a collection of strings
and appends each returned value to a combined result, separated with a space. Whitespace only or null
values will be ignored.
See splitToQueryParts(String, String)
how each string of the collection will be transformed.
aStringsToSplit
- a collection of stringsaSplitExpression
- the regular expression to split up the strings in the aStringsToSplit
collection in partssplitToQueryParts(String, String)
operation for all strings in aStringsToSplit
.
if aStringsToSplit
is null
or empty, null
will always be returned.splitToQueryParts(String, String)
String getFieldQuery(String aFieldName, String aValueExpression)
This method trims the aValueExpression
and analyzes the space-separated parts, quoted and unquoted.
The result will be a grouped field query if there are multiple parts in aValueExpression
and a
non-grouped field query if there are only one part in aValueExpression
.
Note that this is a convenience method only. Neither field or value will be syntactically checked in any way. The caller of this method is responsible for passing values that the query parser used later on will accept.
aFieldName | aValueExpression | Returned |
---|---|---|
content.analyzed |
sitevision |
content.analyzed:sitevision |
+content.analyzed |
sitevision* |
+content.analyzed:sitevision* |
-content.analyzed |
enterprise |
-content.analyzed:enterprise |
content.analyzed |
"sitevision enterprise" |
content.analyzed:"sitevision enterprise" |
content.analyzed |
sitevision enterprise |
content.analyzed:(sitevision enterprise) |
content.analyzed |
portal "sitevision enterprise" |
content.analyzed:(portal "sitevision enterprise") |
aFieldName
- the field expressionaValueExpression
- the value expressionnull
will be returned if aFieldName
or
aValueExpression
is null
or whitespace only.String getDateAsString(Date aDate)
All dates in Solr (Lucene) are stored using UTC (zulu time 'Z'). When a date is converted to a string that should be sent to Solr (for example as a part of a query) the timezone must be taken into consideration since no adjustments will be performed by the query parser.
aDate
- the dateSitevision - Portal and Content Management Made Easy
Sitevision is an advanced Java enterprise portal product and a portlet container (JSR 286) that implements Java Content Repository (JSR 283).
Copyright© 2008-2022 Sitevision AB, all rights reserved.