node.js - Search keywords in files stored in mongodb -
node.js - Search keywords in files stored in mongodb -
i have stored .txt file in mongodb using gridfs node.js. can store .pdf , other format? when tried store .pdf , retrieve content on console, displays text in doc , junk values in it. used line retrieve "gridstore.read(db,id,function(err, filedata)" there other improve way it?
can text search on content in files stored in mongodb directly? if how can that?.
also can please tell info of files stored in mongodb , in format? help in regard great. --thanks
what seem want here "text search" capabilities, in mongodb requires store "text" in field or fields within document. putting "text" mongodb simple, supply "text" content field , mongodb store it. same goes other info of type simply stored under field specify.
the general case here seem want "text search" , must store "text" of data. before implementing that, let's talk gridfs , not, , how not think is.
gridfsgridfs not software or special function of mongodb. in fact specification functionality implemented available drivers sole intent of enabling store content exceeds 16mb bson storage limit.
for purpose, implementation uses 2 collections. default these named fs.files
, fs.chunks
in fact can whatever tell tour driver implementation use. these collections store indicated default names. beingness unique identifier , metadata "file" , other collection storing
here quick snippet of happens info send via gridfs api document in "chunks" collection:
class="lang-js prettyprint-override">{ "_id" : objectid("539fc66ac8b5e6dc058b4568"), "files_id" : objectid("539fc66ac8b5e6dc058b4567"), "n" : numberlong(0), "data" : bindata(2,"agqaadw/cghwcgokzgj.... }
for context, info belongs "text" file sent via gridfs api functions. can see, despite actual content beingness text, beingness displayed here "hashed" form of raw binary data.
this in fact api functions do, reading info provide stream of bytes , submitting binary stream, , in manageable "chunks", in likelihood parts of "file" not in fact kept in same document. point of implementation.
to mongodb these ordinary collections , can treat them such general operations such , find , delete , update. gridfs api spec implemented driver, gives functions "read" of chunks , homecoming info if file. in fact info in collection, in binary format, , split across documents. none of going help performing "search" neither "text" or contained in same document.
text searchso seem want here "text search" allow find words searching for. if want store "text" pdf file example, need externally extract text , store in documents. or otherwise utilize external text search scheme much same.
for mongodb implementation, extracted text stored in document, or perchance several documents in order enable "text index" in order enable search functionality. on collection this:
db.collection.ensureindex({ "content": "text" })
once field or "fields" on documents in collection covered text index can search using $text
operator .find()
:
db.collection.find({ "$text": { "$search": "word" } })
this form of query allows match documents on terms specify in search , determine relevance search , "rank" documents accordingly.
more info can found in tutorials section on text search.
combinedthere nil stopping in fact taking combined approach. here store orginal info documents using gridfs api methods, , store extracted "text" in collection aware of , contained reference original fs.files
document referring big text document or pdf file or whatever.
but need extract "text" original "documents" , store within mongodb documents in collection. otherwise similar approach can taken external text search solution, quite mutual provide interfaces can things such extract text things pdf documents.
with external solution send reference gridfs form of document allow info retrieved search request if intention deliver original content.
so see 2 methods in fact different things. can build own approach around "combining" functionality, "search" search , "chunk" storage doing want do.
of course of study if content under 16mb, store in document would. of course, if binary info , not text, no search unless explicitly extract text.
node.js file mongodb gridfs
Comments
Post a Comment