Direct Storage vs. SAN

From PostgreSQL wiki
Jump to navigationJump to search

Direct storage is when you connect hard drives to your server through cards integrated directly into the motherboard. Another alternative is to use an external interface, typically Fiber Channel, and use a Storage Array Network (SAN) system in order to hold your disks. The one fact you can't argue is that disks in a SAN cost considerably more than connecting them direct, but beyond that the details are controversial.

You can attach a surprisingly large number of drives directly to a server nowadays, but in general it's easier to manage larger numbers of them on a SAN. Also, there are significant redundancy improvements using a SAN that are worth quite a bit in some enterprise environments. Being able to connect all the drives, no matter how many, to two or more machines at once trivially is typically easier to setup on a SAN than when you're using more direct storage.

Basically the performance breaks down like this:

  • Going through the SAN interface (fiber channel etc.) introduces some latency and a potential write bottleneck compared with direct storage, everything else being equal. This can really be a problem if you've got a poor SAN vendor or interface issues you can't sort out. The problem with iSCSI talks about an extreme case here.
  • It can be easier to manage a large number of disks in the SAN, so for situations where aggregate disk throughput is the limiting factor the SAN solution might make sense.
  • At the high-end, you can get SANs with more cache than any direct controller I'm aware of, which for some applications can lead to them having a more quantifiable lead over direct storage. It's easy (albeit expensive) to get an EMC array with 16GB worth of memory for caching on it for example (and with 480 drives). And since they've got a more robust power setup than a typical server, you might even enable all the individual drive caches usefully (that's 16-32MB each nowadays, so at say 100 disks you've potentially got another 1.6GB of cache right there)--this may not be possible with every SAN vendor. If you're got a typical server you can end up needing to turn off individual direct attached drive caches for writes, because they many not survive a power cycle even with a UPS, and you have to just rely on the controller write cache.
  • SANs tend to have features that make backups, mirroring, and generating disk snapshots easier than your typical direct storage solution.
  • The flexibility of a SAN also means complexity, and a hidden cost can be that you need consultants to do some operations (and having an external vendor involved like that always makes your company a bit less nimble).

There's no universal advantage on either side here, just a different set of trade-offs. Certainly you'll never come close to the performance/$ direct storage gets you if you buy that in SAN form instead, but at higher budgets or feature requirements they may make sense anyway. The break-even point is heavily influenced how many servers you're able to put on the SAN and how often you need to do tricky backup and upgrade procedures that the SAN makes easier.