mongodb - GridFS

DB/MongoDB 2013. 6. 18. 12:49



GridFS

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.

Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, [1] and stores each of those chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file.

GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory. For more information on the indications of GridFS, see When should I use GridFS?.

[1]The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.

Implement GridFS

To store and retrieve files using GridFS, use either of the following:

  • A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
  • The mongofiles command-line tool in the mongo shell. See mongofiles.

GridFS Collections

GridFS stores files in two collections:

GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with names prefixed by fs bucket:

  • fs.files
  • fs.chunks

You can choose a different bucket name than fs, and create multiple buckets in a single database.

Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. Each chunk is identified by its unique ObjectID stored in its _id field.

For descriptions of all fields in the chunks and files collections, see GridFS Reference.

GridFS Index

GridFS uses a uniquecompound index on the chunks collection for the files_id and n fields. The files_id field contains the _id of the chunk’s “parent” document. The n field contains the sequence number of the chunk. GridFS numbers all chunks, starting with 0. For descriptions of the documents and fields in the chunks collection, see GridFS Reference.

The GridFS index allows efficient retrieval of chunks using the files_id and n values, as shown in the following example:

cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});

See the relevant driver documentation for the specific behavior of your GridFS application. If your driver does not create this index, issue the following operation using the mongo shell:

db.fs.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } );

Example Interface

The following is an example of the GridFS interface in Java. The example is for demonstration purposes only. For API specifics, see the relevantdriver documentation.

By default, the interface must support the default GridFS bucket, named fs, as in the following:

// returns default GridFS bucket (i.e. "fs" collection)
GridFS myFS = new GridFS(myDatabase);

// saves the file to "fs" GridFS bucket
myFS.createFile(new File("/tmp/largething.mpg"));

Optionally, interfaces may support other additional GridFS buckets as in the following example:

// returns GridFS bucket named "contracts"
GridFS myContracts = new GridFS(myDatabase, "contracts");

// retrieve GridFS object "smithco"
GridFSDBFile file = myContracts.findOne("smithco");

// saves the GridFS file to the file system
file.writeTo(new File("/tmp/smithco.pdf"));



출처 - http://docs.mongodb.org/manual/core/gridfs/






When should I use GridFS?

For documents in a MongoDB collection, you should always use GridFS for storing files larger than 16 MB.

In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.

  • If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
  • When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities. When usinggeographically distributed replica sets MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
  • When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.

Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.

Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.

For more information on GridFS, see GridFS.



출처 - http://docs.mongodb.org/manual/faq/developers/#faq-developers-when-to-use-gridfs

Posted by linuxism
,


Data Type Fidelity

JSON does not have the following data types that exist in BSON documents: data_binarydata_datedata_timestampdata_regex,data_oid and data_ref. As a result using any tool that decodes BSON documents into JSON will suffer some loss of fidelity.

If maintaining type fidelity is important, consider writing a data import and export system that does not force BSON documents into JSON form as part of the process. The following list of types contain examples for how MongoDB will represent how BSON documents render in JSON.

  • data_binary

    { "$binary" : "<bindata>", "$type" : "<t>" }
    

    <bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte indicating the data type.

  • data_date

    Date( <date> )
    

    <date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch.

  • data_timestamp

    Timestamp( <t>, <i> )
    

    <t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit unsigned integer for the increment.

  • data_regex

    /<jRegex>/<jOptions>
    

    <jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters, but may not contain unescaped forward slash (i.e. /) characters. <jOptions> is a string that may contain only the characters gim, and s.

  • data_oid

    ObjectId( "<id>" )
    

    <id> is a 24 character hexadecimal string. These representations require that data_oid values have an associated field named “_id.”

  • data_ref

    DBRef( "<name>", "<id>" )
    

    <name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string.


출처 - http://docs.mongodb.org/manual/core/import-export/#bson-json-type-conversion-fidelity






Should I use GridFS or binary data to store & retrieve images from MongoDB?


I was wondering which is better/faster:

  1. Having a separate collection of documents that just contain the image saved as binary data, and possibly some metadata.
  2. Or using GridFS to store the images.
share|improve this question

If your images are small you can store them as binary data in the documents in your collection. Just consider that you will be retrieving them every time you query your document (unless you exclude the 'image' field from your queries).

However, if your images are larger I would use GridFS. GridFS has some features that make it very good at handling images that you should consider:

  • For larger images, when they are stored in GridFs they will be split in chunks and you can store very large files. If you try to store images in your document, you are constrained by the 16Mb max size of a document, and you are consuming space that needs to be used for your actual document.
  • You can add metadata to the image itself and run queries against these attributes, as if you were doing it from a regular document in a collection. So GridFS is as good as a document for metadata about the image.
  • I really like that I get MD5 hash calculated on the images. (It is very useful for some of my cases).
  • By storing images in GridFS you save yourself the preprocessing of the image into binary format (not a big deal, but a convenience of GridFS)

In terms of performance, reading/writing against a regular document should be no different than doing it against GridFS. I would not consider performance to be a differentiator in choosing either one.

My personal recommendation is to go with GridFS, but you need to analyze for your particular use case.

Hope this helps.



출처 - http://stackoverflow.com/questions/7806674/should-i-use-gridfs-or-binary-data-to-store-retrieve-images-from-mongodb












'DB > MongoDB' 카테고리의 다른 글

mongodb - Query, Update and Projection Operators  (0) 2013.07.23
mongodb - GridFS  (0) 2013.06.18
mongodb - BSON(Binary JSON)  (0) 2013.06.18
mongodb - BSON 사이즈 제한  (0) 2013.06.18
mongodb - GridFS save/read file 예제  (1) 2013.06.10
Posted by linuxism
,


BSON [bee · sahn], short for Bin­ary JSON, is a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments. Like JSON, BSON sup­ports the em­bed­ding of doc­u­ments and ar­rays with­in oth­er doc­u­ments and ar­rays. BSON also con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type.


출처 - http://bsonspec.org/






BSON

A serialization format used to store documents and make remote procedure calls in MongoDB. “BSON” is a portmanteau of the words “binary” and “JSON”. Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. For a detailed spec, seebsonspec.org.

See also

 

The Data Type Fidelity section.



출처 - http://docs.mongodb.org/manual/reference/glossary/






'DB > MongoDB' 카테고리의 다른 글

mongodb - GridFS  (0) 2013.06.18
mongodb - binary data(type) 저장  (0) 2013.06.18
mongodb - BSON 사이즈 제한  (0) 2013.06.18
mongodb - GridFS save/read file 예제  (1) 2013.06.10
mongodb - install mongodb on centos, fedora  (0) 2013.06.04
Posted by linuxism
,