Home ยป Should binary files be stored in the database?

Should binary files be stored in the database?

Solutons:


  1. Store in the database with a blob

    A disadvantage is that it makes your database files quite large and possibly too large to back up with your existing set up. An advantage is integrity and atomicity.

  2. Store on the filesystem with a link in the database

    I’ve come across such horrible disasters doing this, and it scares me that people keep suggesting it. Some of the disasters included:

    • One privileged user who would rearrange the files and frequently break the links between the paths in the DB and where they now are (but somehow this became my fault).
    • When moving from one server to another, the ownership of some of the files was lost as the SID for the old machine’s administator account (what the old website was running on) was not part of the domain and so the copied files had ACLs that could not be resolved thus presenting users with the username/password/domain login prompt.
    • Some of the paths ended up being longer than 256 characters from the C: all the way to the .doc and not all versions of NT were able to deal with long paths.
  3. Store in the filesystem but rename to a hash of the contents and store the hash on the database

    The last place I worked at did this based on my explanation of the above scenarios did this. They thought it was a compromise between the organization’s inability to obtain experience with large databases (anything larger than about 40G was ordained to be “too big”), the corporate inability to purchase large hard drives, and the inability to purchase a more modern back up solution, and the need to get away from risks #1 & #3 that I identified above.

My opinion is that storing in the DB as a blob is a better solution and more scalable in a multi-server scenario, especially with failover and availability concerns.

Number 1 for complete data integrity. Use the other options if you don’t care about data quality. It’s that simple.

Most RDBMS have optimizations for storing BLOBs (eg SQL Server filestream) anyway

If going for oracle, take a look at dbfs and Secure Files.

Secure Files says it all, keep ALL your data safe in the database. It is organized in lobs. Secure Files is a modernized version of lobs, that should be activated.

dbfs is a filesystem in the database. You can mount it similar like a network filesystem, on a Linux host. It is real powerful. See blog It also has a lot of options to tune to your specific needs. Being a dba, given a filesystem (based in the database, mounted on Linux), I created an Oracle Database on it without any problems. (a database, stored in a … database). Not that this would be very useful but it does show the power.

More advantages are: availability, backup, recovery, all read consistent with the other relational data.

Sometimes size is given as a reason not to store documents in the database. That data probably has to be backed up any way so that’s not a good reason not to store in the database. Especially in a situation where old documents are to be considered read only, it is easy to make big parts of the database read only. In that case, those parts of the database no longer have a need for a high frequent backup.

A reference in a table to something outside the database is unsafe. It can be manipulated, is hard to check and can easily get lost. How about transactions? The database offers solutions for all these issues. With Oracle DBFS you can give your docs to non database applications and they wouldn’t even know they are poking in a database.

A last, big surprise, the performance of a dbfs filesystem is often better than a regular filesystem. This is especially true if the files are larger than a few blocks.

Related Solutions

Don’t understand how my mum’s Gmail account was hacked

IMPORTANT: this is based on data I got from your link, but the server might implement some protection. For example, once it has sent its "silver bullet" against a victim, it might answer with a faked "silver bullet" to the same request, so that anyone...

What is /storage/emulated/0/?

/storage/emulated/0/Download is the actual path to the files. /sdcard/Download is a symlink to the actual path of /storage/emulated/0/Download However, the actual files are located in the filesystem in /data/media, which is then mounted to /storage/emulated/0...

How can I pass a command line argument into a shell script?

The shell command and any arguments to that command appear as numbered shell variables: $0 has the string value of the command itself, something like script, ./script, /home/user/bin/script or whatever. Any arguments appear as "$1", "$2", "$3" and so on. The...

What is pointer to string in C?

argv is an array of pointers pointing to zero terminated c-strings. I painted the following pretty picture to help you visualize something about the pointers. And here is a code example that shows you how an operating system would pass arguments to your...

How do mobile carriers know video resolution over HTTPS connections?

This is an active area of research. I happen to have done some work in this area, so I'll share what I can about the basic idea (this work was with industry partners and I can't share the secret details ๐Ÿ™‚ ). The tl;dr is that it's often possible to identify an...

How do I change the name of my Android device?

To change the hostname (device name) you have to use the terminal (as root): For Eclair (2.1): echo MYNAME > /proc/sys/kernel/hostname For Froyo (2.2): (works also on most 2.3) setprop net.hostname MYNAME Then restart your wi-fi. To see the change, type...

How does reverse SSH tunneling work?

I love explaining this kind of thing through visualization. ๐Ÿ™‚ Think of your SSH connections as tubes. Big tubes. Normally, you'll reach through these tubes to run a shell on a remote computer. The shell runs in a virtual terminal (tty). But you know this part...

Difference between database vs user vs schema

In Oracle, users and schemas are essentially the same thing. You can consider that a user is the account you use to connect to a database, and a schema is the set of objects (tables, views, etc.) that belong to that account. See this post on Stack Overflow:...

What’s the output of this code written in java?

//if you're using Eclipse, press ctrl-shift-f to "beautify" your code and make it easier to read int arr[] = new int[3]; //create a new array containing 3 elements for (int i = 0; i < 3; i++) { arr[i] = i;//assign each successive value of i to an entry in...

How safe are password managers like LastPass?

We should distinguish between offline password managers (like Password Safe) and online password managers (like LastPass). Offline password managers carry relatively little risk. It is true that the saved passwords are a single point of failure. But then, your...

Can anyone tell me why this program go to infinite times?

while (i <= 2) { while (i > 0) { a = a + b; i--; <- out the inner while loop when i = 0 } printf("%d", a); i++; <- at here, the i==0 each time, so infinity loop } Because your nested loop always restores the value of i to 0, And 0 <= 2 is always...

How to conditionally do something if a command succeeded or failed

How to conditionally do something if a command succeeded or failed That's exactly what bash's if statement does: if command ; then echo "Command succeeded" else echo "Command failed" fi Adding information from comments: you don't need to use the [ ... ] syntax...

How to turn JSON array into Postgres array?

Postgres 9.4 or newer Obviously inspired by this post, Postgres 9.4 added the missing function(s): Thanks to Laurence Rowe for the patch and Andrew Dunstan for committing! json_array_elements_text(json) jsonb_array_elements_text(jsonb) To unnest the JSON array....

Implementing a 2D destructible landscape (like Worms)

I don't know how the landscape in worms was implemented exactly, but I'm pretty sure they used a bitmap for the landscape (at least in the older games of the series). A very basic approach would be a bitmap image (B/W) where black pixels represent air and white...

Huge procedurally generated ‘wilderness’ worlds

I think I better understand what you are asking now. Noise is not random - it's random-looking but is completely based on a mathematical formula and is repeatable. All the information is encoded in the formula. This means that you can have a formula that...