Payloads are a good way of controlling the scores in SOLR/Lucene.
This post by Grant Ingersoll gives a good introduction to payloads, I also found http://www.ultramagnus.org/?p=1 pretty useful.
What I will describe here is how to use the payloads and have the functionalities of the DisMaxQParser in SOLR.
SOLR already has a field type for analysing payloads
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" > | |
<analyzer> | |
<tokenizer class="solr.WhitespaceTokenizerFactory"/> | |
<!-- | |
The DelimitedPayloadTokenFilter can put payloads on tokens... for example, | |
a token of "foo|1.4" would be indexed as "foo" with a payload of 1.4f | |
Attributes of the DelimitedPayloadTokenFilterFactory : | |
"delimiter" - a one character delimiter. Default is | (pipe) | |
"encoder" - how to encode the following value into a playload | |
float -> org.apache.lucene.analysis.payloads.FloatEncoder, | |
integer -> o.a.l.a.p.IntegerEncoder | |
identity -> o.a.l.a.p.IdentityEncoder | |
Fully Qualified class name implementing PayloadEncoder, Encoder must have a no arg constructor. | |
--> | |
<filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/> | |
</analyzer> | |
</fieldtype> |
and we can also define a custom Similarity to use with the payloads
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.digitalpebble.solr; | |
import org.apache.lucene.analysis.payloads.PayloadHelper; | |
import org.apache.lucene.search.DefaultSimilarity; | |
public class PayloadSimilarity extends DefaultSimilarity | |
{ | |
@Override public float scorePayload(int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) | |
{ | |
if (length > 0) { | |
return PayloadHelper.decodeFloat(payload, offset); | |
} | |
return 1.0f; | |
} | |
} |
then specify this in the SOLR schema
<!-- schema.xml -->
<similarity class="uk.org.company.solr.PayloadSimilarity" />
<similarity class="uk.org.company.solr.PayloadSimilarity" />
So far so good. We now need a QueryParser plugin in order to use the payloads in the search and as mentioned above, I want to keep the functionalities of the DisMaxQueryParser.
The problem is that we need to specify PayloadTermQuery objects instead of TermQueries which is down deep in the object hierarchies and cannot AFAIK be modified simply from DismaxQueryParser.I have implemented a modified version of DismaxQueryParser which rewrites the main part of the query (a.k.a userQuery in the implementation) and substitutes the TermQueries with PayloadTermQueries.
First we'll create a QParserPlugin
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.digitalpebble.solr; | |
import org.apache.solr.common.params.SolrParams; | |
import org.apache.solr.common.util.NamedList; | |
import org.apache.solr.request.SolrQueryRequest; | |
import org.apache.solr.search.QParser; | |
import org.apache.solr.search.QParserPlugin; | |
public class PLDisMaxQParserPlugin extends QParserPlugin { | |
public void init(NamedList args) { | |
} | |
@Override | |
public QParser createParser(String qstr, SolrParams localParams, | |
SolrParams params, SolrQueryRequest req) { | |
return new PLDisMaxQParser(qstr, localParams, params, req); | |
} | |
} |
which does not do much but simply exposes the PLDisMaxQueryParser which is a modified version of the standard DisMaxQueryParser but with PayloadQuery objects.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.digitalpebble.solr; | |
import java.util.HashSet; | |
import java.util.Iterator; | |
import java.util.List; | |
import java.util.Map; | |
import org.apache.lucene.index.Term; | |
import org.apache.lucene.queryParser.ParseException; | |
import org.apache.lucene.search.BooleanClause; | |
import org.apache.lucene.search.BooleanQuery; | |
import org.apache.lucene.search.DisjunctionMaxQuery; | |
import org.apache.lucene.search.PhraseQuery; | |
import org.apache.lucene.search.Query; | |
import org.apache.lucene.search.TermQuery; | |
import org.apache.lucene.search.payloads.MaxPayloadFunction; | |
import org.apache.lucene.search.payloads.PayloadFunction; | |
import org.apache.lucene.search.payloads.PayloadNearQuery; | |
import org.apache.lucene.search.payloads.PayloadTermQuery; | |
import org.apache.lucene.search.spans.SpanQuery; | |
import org.apache.solr.common.params.DisMaxParams; | |
import org.apache.solr.common.params.SolrParams; | |
import org.apache.solr.common.util.NamedList; | |
import org.apache.solr.request.SolrQueryRequest; | |
import org.apache.solr.search.DisMaxQParser; | |
import org.apache.solr.util.SolrPluginUtils; | |
/** | |
* Modified query parser for dismax queries which uses payloads | |
*/ | |
public class PLDisMaxQParser extends DisMaxQParser { | |
public static final String PAYLOAD_FIELDS_PARAM_NAME = "plf"; | |
public PLDisMaxQParser(String qstr, SolrParams localParams, | |
SolrParams params, SolrQueryRequest req) { | |
super(qstr, localParams, params, req); | |
} | |
protected HashSet<String> payloadFields = new HashSet<String>(); | |
private static final PayloadFunction func = new MaxPayloadFunction(); | |
float tiebreaker = 0f; | |
protected void addMainQuery(BooleanQuery query, SolrParams solrParams) | |
throws ParseException { | |
Map<String, Float> phraseFields = SolrPluginUtils | |
.parseFieldBoosts(solrParams.getParams(DisMaxParams.PF)); | |
tiebreaker = solrParams.getFloat(DisMaxParams.TIE, 0.0f); | |
// get the comma separated list of fields used for payload | |
String[] plfarray = solrParams.get(PAYLOAD_FIELDS_PARAM_NAME, "") | |
.split(","); | |
for (String plf : plfarray) | |
payloadFields.add(plf.trim()); | |
/* | |
* a parser for dealing with user input, which will convert things to | |
* DisjunctionMaxQueries | |
*/ | |
SolrPluginUtils.DisjunctionMaxQueryParser up = getParser(queryFields, | |
DisMaxParams.QS, solrParams, tiebreaker); | |
/* for parsing sloppy phrases using DisjunctionMaxQueries */ | |
SolrPluginUtils.DisjunctionMaxQueryParser pp = getParser(phraseFields, | |
DisMaxParams.PS, solrParams, tiebreaker); | |
/* * * Main User Query * * */ | |
parsedUserQuery = null; | |
String userQuery = getString(); | |
altUserQuery = null; | |
if (userQuery == null || userQuery.trim().length() < 1) { | |
// If no query is specified, we may have an alternate | |
altUserQuery = getAlternateUserQuery(solrParams); | |
query.add(altUserQuery, BooleanClause.Occur.MUST); | |
} else { | |
// There is a valid query string | |
userQuery = SolrPluginUtils.partialEscape( | |
SolrPluginUtils.stripUnbalancedQuotes(userQuery)) | |
.toString(); | |
userQuery = SolrPluginUtils.stripIllegalOperators(userQuery) | |
.toString(); | |
parsedUserQuery = getUserQuery(userQuery, up, solrParams); | |
// recursively rewrite the elements of the query | |
Query payloadedUserQuery = rewriteQueriesAsPLQueries(parsedUserQuery); | |
query.add(payloadedUserQuery, BooleanClause.Occur.MUST); | |
Query phrase = getPhraseQuery(userQuery, pp); | |
if (null != phrase) { | |
query.add(phrase, BooleanClause.Occur.SHOULD); | |
} | |
} | |
} | |
/** Substitutes original query objects with payload ones **/ | |
private Query rewriteQueriesAsPLQueries(Query input) { | |
Query output = input; | |
// rewrite TermQueries | |
if (input instanceof TermQuery) { | |
Term term = ((TermQuery) input).getTerm(); | |
// check that this is done on a field that has payloads | |
if (payloadFields.contains(term.field()) == false) | |
return input; | |
output = new PayloadTermQuery(term, func); | |
} | |
// rewrite PhraseQueries | |
else if (input instanceof PhraseQuery) { | |
PhraseQuery pin = (PhraseQuery) input; | |
Term[] terms = pin.getTerms(); | |
int slop = pin.getSlop(); | |
boolean inorder = false; | |
// check that this is done on a field that has payloads | |
if (terms.length > 0 | |
&& payloadFields.contains(terms[0].field()) == false) | |
return input; | |
SpanQuery[] clauses = new SpanQuery[terms.length]; | |
// phrase queries : keep the default function i.e. average | |
for (int i = 0; i < terms.length; i++) | |
clauses[i] = new PayloadTermQuery(terms[i], func); | |
output = new PayloadNearQuery(clauses, slop, inorder); | |
} | |
// recursively rewrite DJMQs | |
else if (input instanceof DisjunctionMaxQuery) { | |
DisjunctionMaxQuery s = ((DisjunctionMaxQuery) input); | |
DisjunctionMaxQuery t = new DisjunctionMaxQuery(tiebreaker); | |
Iterator<Query> disjunctsiterator = s.iterator(); | |
while (disjunctsiterator.hasNext()) { | |
Query rewrittenQuery = rewriteQueriesAsPLQueries(disjunctsiterator | |
.next()); | |
t.add(rewrittenQuery); | |
} | |
output = t; | |
} | |
// recursively rewrite BooleanQueries | |
else if (input instanceof BooleanQuery) { | |
for (BooleanClause clause : (List<BooleanClause>) ((BooleanQuery) input) | |
.clauses()) { | |
Query rewrittenQuery = rewriteQueriesAsPLQueries(clause | |
.getQuery()); | |
clause.setQuery(rewrittenQuery); | |
} | |
} | |
output.setBoost(input.getBoost()); | |
return output; | |
} | |
public void addDebugInfo(NamedList<Object> debugInfo) { | |
super.addDebugInfo(debugInfo); | |
if (this.payloadFields.size() > 0) { | |
Iterator<String> iter = this.payloadFields.iterator(); | |
while (iter.hasNext()) | |
debugInfo.add("payloadField", iter.next()); | |
} | |
} | |
} | |
Once these 3 classes have been compiled, jarred and put in the classpath of SOLR, we must add
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<queryParser name="payload" class="com.digitalpebble.solr.PLDisMaxQParserPlugin" /> |
to solrconfig.xml.
then specify for the requestHandler :
<str name="defType">payload</str>
<!-- plf : comma separated list of field names --> <str name="plf"> payloads </str>
The fields listed in the parameter plf will be queried with Payload query objects. Remember that you can use &debugQuery=true to get the details of the scores and check that the payloads are being used.
Awesome post. This was exactly what I needed, well done.
ReplyDeleteGreat post. It works well for DisMax. Would it be possible to use the same technique for Extended DisMax (edismax)?
ReplyDeleteThanks. Am pretty sure the same could be done with edismax (note : I haven't looked at the edismax code). Please post a comment if you manage to get it to work
ReplyDeleteHi great post, but i can't get the whole thing working.
ReplyDeleteI compiled the classes in a jar and added to my classpath (lib dir)
i've a field with payloads and debugging the query i get:
PLDisMaxQParser
so the modified parser is used however the payloads are not used
checking the debugQuery...
I've not specified a request handler but i'm running the query using
select/?q=&plf=&defType=payload&qf=
Am i doing something wrong?
Sorry, the request is not the one above but the follwing
Deleteselect/?q=something&plf=fieldwithpayload&defType=payload&qf=fields
Furthermore i forgot to say that my payload
value is a multivalued field.
May this be the problem?
The last bit "then specify for the requestHandler :" does it mean I have to create a new one or just modify the dismax requestHandler to payload
ReplyDeleteIt's been a long time since I wrote this and I can't remember the details. Maybe try modifying the existing one to see if it works? Sorry not to be more helpful
ReplyDelete