java - Lucene search of two or more words not working on Android -
java - Lucene search of two or more words not working on Android -
i using lucene 3.6.2 on android. code used , observations made below.
indexing code:
public void indexbookcontent(book book, file externalfilesdir) throws exception { indexwriter indexwriter = null; niofsdirectory directory = null; directory = new niofsdirectory(new file(externalfilesdir.getpath() + "/indexfile", book.getbookid())); indexwriterconfig indexwriterconfig = new indexwriterconfig(lucene_36, new standardanalyzer(lucene_36)); indexwriter = new indexwriter(directory, indexwriterconfig); document document = createfieldsforcontent(); string pagecontent = html.fromhtml(decryptedpage).tostring(); ((field) document.getfieldable("content")).setvalue(pagecontent); ((field) document.getfieldable("content")).setvalue(pagecontent); ((field) document.getfieldable("content")).setvalue(pagecontent.tolowercase()); } private document createfieldsforcontent() { document document = new document(); field contentfieldlower = new field("content", "", yes, not_analyzed); document.add(contentfieldlower); field contentfield = new field("content", "", yes, analyzed); document.add(contentfield); field contentfieldnotanalysed = new field("content", "", yes, not_analyzed); document.add(contentfieldnotanalysed); field recordidfield = new field("recordid", "", yes, analyzed); document.add(recordidfield); homecoming document; } public jsonarray searchbook(string bookid, string searchtext, file externalfieldsdir, string filter) throws exception { list<searchresultdata> searchresults = null; niofsdirectory directory = null; indexreader indexreader = null; indexsearcher indexsearcher = null; directory = new niofsdirectory(new file(externalfieldsdir.getpath() + "/indexfile", bookid)); indexreader = indexreader.open(directory); indexsearcher = new indexsearcher(indexreader); query finalquery = constructsearchquery(searchtext, filter); topscoredoccollector collector = topscoredoccollector.create(100, false); indexsearcher.search(finalquery, collector); scoredoc[] scoredocs = collector.topdocs().scoredocs; } private query constructsearchquery(string searchtext, string filter) throws parseexception { queryparser contentqueryparser = new queryparser(lucene_36, "content", new standardanalyzer(lucene_36)); contentqueryparser.setallowleadingwildcard(true); contentqueryparser.setlowercaseexpandedterms(false); string wildcardsearchtext = "*" + queryparser.escape(searchtext) + "*"; // query parser used. query contentquery = contentqueryparser.parse(wildcardsearchtext); homecoming contentqueryparser.parse(wildcardsearchtext); }
i have gone through this: "lucene: multi-word phrases search terms", , logic didn't seem different.
my uncertainty fields getting overwritten. also, need chinese language back upwards works code except problem of 2 or more word support.
one note, front:
seeing search implementation seems bit strange. looks overly complicated way linear search through available strings. don't know need accomplish, suspect improve served working on appropriate analysis of text, rather doing double wildcard on keyword analyzed text, perform poorly, , not provide much flexibility in search.
moving on more specific issues:
you analyzing same content in same field multiple times different analysis methods.
field contentfieldlower = new field("content", "", yes, not_analyzed); document.add(contentfieldlower); field contentfield = new field("content", "", yes, analyzed); document.add(contentfield); field contentfieldnotanalysed = new field("content", "", yes, not_analyzed); document.add(contentfieldnotanalysed);
instead, if need these analysis methods available searching, should indexing them in distinct fields. searching these doesn't create sense, shouldn't in same field.
then have sort of pattern:
field contentfield = new field("content", "", yes, analyzed); document.add(contentfield); //somewhat later ((field) document.getfieldable("content")).setvalue(pagecontent);
don't this, doesn't create sense. pass content constructor, , add together document:
field contentfield = new field("content", pagecontent, yes, analyzed); document.add(contentfield);
especially if opt go on analyzing in multiple ways in same field, there no way 1 among different field implementations (getfieldable
homecoming first 1 added)
and query:
string wildcardsearchtext = "*" + queryparser.escape(searchtext) + "*";
as mentioned, won't work multiple terms. runs afoul of queryparser syntax. end like: *two terms*
, searched as:
field:*two field:terms*
which won't generate matches against keyword field (presumably). queryparser won't sort of query @ all. you'll need build wildcard query here:
wildcardquery query = new wildcardquery(new term("field", "*two terms*"));
java android search lucene
Comments
Post a Comment