Wednesday, January 7, 2009

Insight 6 - Features I Believe Will Bring New Benefits of Virtualization

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on Virtualization Conversation site, sponsored by IBM and Intel.

The common concept of server virtualization has grown through breaking new technologies before years to a point today, when server virtualization vendors tend to implement similar features with minor innovation. There are two virtualization features in my mind which could be the next move in IT virtualization:

Application-Centric Virtualization

Virtualization systems today work with operating systems. They support deploying virtual machines, failing them over to another physical hardware when disaster strikes, they optimize physical memory utilization and many more OS-related tasks. What is more, the virtualization platform can deploy new operating system automagically, according preset scenarios. Well, where’s a problem?

It’s application, not operating system what the end users really need to work with. They need the application interface to be available on demand, to respond fast, to stay available should one of company’s facilities fail. And there are many services like databases, fileservers, mail stores and other which that application interface depend on. And it’s the operating system at the end of this application dependency chain.

This means we have a mature technology available to cover the last link of chain. Now it’s time to move operating systems’ role to background, wher they’ll provide a simple runtime environment for applications. Let’s move the focus to applications and let them be managed through the same interfaces we use to manage virtual servers.

It’s not enough, however. We need to keep those applications highly-available and their data consistent. There’s one problem with current OS virtualization systems’ high availability solutions, however: If one physical hardware fails, the application stops until the OS is auto-restarted on another. Not to mention the consistency is usually corrupted in such events.

To move further to 100% available applications there are two options:

  1. To redesign each (suitable) application separately to cluster their transactions among multiple running instances and thus be ready to transparently failover in case a single hardware fails.

  2. To create a generic API layer in operating system to take care of applications HA features. Similarly to what Volume Shadow Copy did for applications’ consistent snapshots on Windows platform.

Server-bound storage

Most of current server virtualization features depend on central storage, connected to many physical servers. Few companies realize this to be a single point of failure case, when that array’s failure causes a disruption to all connected virtual machines running applications. Mirroring that array or yet better – clustering it using technologies like those by LeftHand or Equallogic – can eliminate this SPOF issue.

There’s another option, however – use the same x86 box for both the virtualization system and storage. Just imagine a common server used for virtualization – it boots from an internal flash drive and still have some 6 drive slots available to accommodate SAS or SATA drives.

By implementing a synchronous block replication feature – or implementing an existing one – to replicate data between boxes, we could start building an autonomous virtualization cluster. Should one box fail, the other one will transparently take over both it’s applications and storage.

Do you worry about such storage performance? You might be right, you need tens of SAS drives to satisfy some applications’ IOps needs. Soon, however, such performance will accommodate in both read and write operations a single SSD drive.

Putting right virtualization system platform, block replication system, 2 SSD drives and some large capacity SATA drives into a single box is a dream solution of most server administrators. We have all technologies available to build such system, the time some server vendor designs it is comming. I believe.

Thursday, June 12, 2008

Insight 5 - Storage Virtualization - Where Should It Go?

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on The Future of Storage webpage, sponsored by Dell.

Writing about storage virtualization, I should start by defining which virtualization I mean. Generally, a simple RAID volume can be called “virtual” too as it is a logical representation of some more complex logic behind it. Don’t worry, I’m not going to write about RAID. Instead, my mind is full of mirrors, snapshots, clusters, recovery sites and a single question: At which layer of SAN infrastructure these features should live?

Today, we can find storage virtualization implemented mostly on two places:

  1. Built into array controllers, or
  2. running as a software installed in a hardware appliance or Fibre Channel switch placed in between the arrays and SAN clients.

The first usually doesn’t go far beyond a semi-working mirroring feature you receive a huge bill for. Also, a common problem with these vendors is their scope ends at the block level; they don’t really care about host applications. Sure, the primary function of a storage controller is different, and mixing it with the complete stack of virtualization features in one box might create more troubles both in design and during operation.

The second method of virtualization many people presume to be an in-path obstacle wearing another vendor’s label, box they have to learn how to manage, pay maintenance fees for etc. Don’t bother explaining to them how it’s full of features, how it’s not necessarily a single point of failure or how it creates just a minor latency.

As a result, there is a large set of SAN installations lacking modern virtualization features. Is it bad? It is, I think. Safety features like mirroring with transparent failover or consistent snapshot replication should be an obligatory part of each SAN installed in 2008. At least, they should be available as part of storage solutions from SMB up through all the marketing labeled levels.

What’s a way to avoid these drawbacks and bring storage virtualization to more SAN users? As always, I believe it’s through simplicity and standardization. Let’s devide each virtualization feature into two parts. One that inevitably needs to be implemented at the controller level, and the second which would reside at the host. No need for any third in-band level in this design.

Suppose you setup synchronous mirroring in such a design. The host would then send the blocks it writes to all arrays configured to be part of such mirroring. The benefit is there is no retransmission from the primary controller to the mirror one, no central point of the in-band appliance. In case of array failure the host itself selects another array. From such a perspective, it could be just an improved MPIO driver. I’m an optimist, so I believe there is a way how to write such drivers to be vendor-independent. Thus you could mirror your HP to IBM, LeftHand to EqualLogic ;-).

There is already similar implementations of such out-of-band virtualization in Fibre Channel world. It’s LSI StoreAge. Most of its features work on Windows only however and yet it requires hardware appliance to be set up in the SAN. There are no similar implementations in the iSCSI world that I know of.

Having the host part of the storage virtualization brings another advantage: It’s close to the applications. It’s application data we need to protect, not low-level blocks. Application support is necessarily important for creating snaphots and replicating data to remote sites. We could manage SAN data much more safely and in a simpler way if the SAN border moved closer to applications. Of course, some work on standardization has yet to be done here.

To summarize, I see the current storage virtualization too in-band-ish. Although there are some rare efforts to put selected tasks on SAN hosts (eg. FalconStor’s DiskSafe agent), they stay alone without further plans to replace the central appliance. If I was an array vendor I would consider pairing with FalconStor to strengthen the market of interoperable, application-centric SANs, bringing more ways how to use “my” arrays.

Thursday, June 5, 2008

Insight 4 - Will Non-Rotational Drives Create A New SAN Era?

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on The Future of Storage webpage, sponsored by Dell.

This community has forced me to think of what the next big things in storage areas could be. At first, I thought of protocols, so I wrote an insight about mirroring and some feature-related things. What I forgot completely was the core of current SANs — drives.

Last year, I had a great opportunity to evaluate a DRAM-based array. To me, testing new arrays and running IOmeter tests on them reminds me of my childhood feelings after being given a new Lego box. This time — meaning with the the array, not with the Lego — I was excited by sequential throughput speeds close to 900MBps through a pair of fibres. We changed the test to 100% random pattern and guess what — the performance was almost the same! Sure, it makes sense, no heads are seeking their blocks on disk plates, so everything works at the speed of memory chips. Yet, it was a feeling of crossing a bridge to a new era.

Ask yourself: What factors do you consider when setting up a new storage system? You probably don’t forget to mention performance, expected traffic pattern (random/sequential), calculations of IOps and number of drives to satisfy the database needs. Storage vendors have undertaken great efforts to design their controllers so they sequentialize the data streams as much as they can, utilizing various caching and disk interface algorithms. What would the world be without these considerations? What new challenges will we face?

The first challenge might be a latency of the connection chain. The HBAs and controller interfaces create significantly higher delays than the storage media itself. Does it matter? I’m not sure. I’ve met a company who couldn’t accept the latency of Fibre Channel infrastructure connecting to that DRAM array. Their application performed time-sensitive decisions based on data stored on the array. Let’s suppose this is not a typical usage, though….

The first adopters of memory based storage media are known already. They’re database systems. The advantage of seek times in random patterns is great. The memory device provides performance of tens or hundreds of traditional rotational drives. Today, like anytime before, the problem is that capacity is many times lower and that price is many times higher than traditional drives. And, like anytime before, both will settle to reasonable levels. (I’m an optimist.)

It’s an open question if there will be any effective usage for non-rotational arrays other than databases. Will we ever use memory devices for sequential, say file-serving applications? Now, that seems like wasting money. Today’s drives with powerful caching and read-ahead algorithms perform well in sequential transfers. The bottleneck is most often at the media level or file-sharing protocol design.

To conclude, I will suggest a tip for a storage vendor: What if you put in some memory devices as an addition to traditional drives in a single box? I don’t mean just increasing a storage controller’s memory, rather I mean a fast disk space configurable as a LUN. Maybe you can go even further and create an intelligent feature, moving the most active random-access disk areas between rotational and memory disk spaces.

Thursday, May 29, 2008

Insight 3 - I Don’t Believe In FCoE

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on The Future of Storage webpage, sponsored by Dell.

I don’t believe in FCoE. I don’t believe in a storage protocol built from the first layer up. I don’t believe in people needing another set of adapters, switches and controllers. I just see a bunch of vendors building a new playground for their old machines. Why? Maybe they missed a train of innovation. Do they need a new protocol? I don’t think so. Rather they need to fight against the iSCSI newcomers who proved there was more to show than just a dual controller.

I’ve seen many articles celebrating Fibre Channel protocol for its all-layers-storage concept. I was celebrating too. Then we received our first iSCSI array based on a hardware iSCSI implementation. We did our standard benchmarks and found the array was performing better than most of fibre arrays we had seen before. We got 96% of that cable’s throughput in sequential operations. That was the time when I lost the idea the all-storage protocol matters.

Although it might not look that way, I still think there is space for protocol innovation. Actually, I would like to see more application support built in to storage protocols. The trend I see with our customers is building highly available applications. As the applications store their data through storage protocols, storage protocols could transfer information about completed transactions, flushed buffers, etc., helping to keep data consistent and available on multiple places.

Thursday, May 22, 2008

Insight 2 - Mirroring vs. Replication for Disaster Recovery

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on The Future of Storage webpage, sponsored by Dell.

As I wrote in my first insight, I’m a technician at a VAR company working with many clients on their SAN setups. We’ve been quite successful selling storage solutions with synchronous mirroring to our clients in recent months. The winning argument has always been the transparent failover in the case of an array failure. Hearing that they don’t have to touch anything to keep their apps running when the array breaks is always a pleasure for the administrators attending our meetings.

As synchronous mirroring is simple in logical design, it’s the best option for a highly-available SAN in my opinion. Still however, I am meeting many people say: “We’re not a bank. We can afford one or two hours break, so asynchronous replication within a single server room is good enough for us.” They usually expect asynchronous replication to be cheaper to purchase and simpler to install.

To me, as the guy involved in the recovery, the idea of doing a disaster recovery of asynchronously mirrored data appears close to a nightmare. Asynchronous always means the risk of losing some data in the case of an array failure. During the recovery process the administrators have to check if they can recover somehow from the primary storage to rescue as much data as they can. They can’t immediately redirect the storage traffic to a replica without ensuring there is no other way first. This phase is quite difficult to manage and depends heavily on the administrators’ skills. And that’s a big risk factor.

From this perspective I only see a place for non-synchronous mirroring in remote replications, where the connection between primary storage and its replica is not broad enough for synchronous transfers. In all other cases, the synchronous mirroring should be the only option.

How do I see the future of HA/DR? According to my ideas of simplicity and automation I would like to see SANs better integrated with applications stored on them. I haven’t said many nice words about asynchronous replication. Nevertheless I’m sure there will always be cases where async will be the only choice. For these cases, storage vendors should create some generic application-aware storage layer or tools to help admins pass the recovery process. Maybe even allow the recovery process to run unattended.

Saturday, May 17, 2008

Lost Datastore - Have You Seen A Snapshot Warning?

I saw a sad story last week. A client starting with VMware ESX called us to help them find their lost virtual machines. They used to store all their virtual machines in a single VMFS datastore on a Fibre Channel array. Usually, when we experience such problem, it's being caused by some change in LUN mapping or FC paths resulting in datastore invisible in ESX. In this case it was different, however. The datastore appeared to be present in a healthy state, but the virtual machines were gone. Completely. No directory remaining there. Just few common hidden metafiles of the VMFS system. Strange.

By exploring the ESX logs - files like /var/log/vmkwarning, /var/log/vmware/hostd.log - I found that few days before the following message had been logged (usually it also appears in Tasks/Events visible through the GUI client):

May  6 13:51:59 test vmkernel: 0:01:24:35.846 cpu4:1037)ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

What caused this warning? Most often it is a simple LUN ID change (it's the last number in device path address, like the 3 in vmhba1:0:3). It happens when you change something in your array mapping setup. Since ESX founds a known datastore on a volume which used to have a different address, it believes it's just a SAN-level snapshot of the original LUN. Its default policy is to disable access to such volume.

This snapshot alert is something what each mediately experienced ESX admin utilizing SAN storage had already seen and knows how to deal with. Not so with an ESX admin-beginner, however. The worst what his malicious mind can do to him in such a panic moment is to let him recreate the datastore in place of the original one. By doing this, the data is unrecoverably lost. Forever. When formatting VMFS ESX rewrites many important places of the volume. This probably happened in our case.

How to deal with the "may be snapshot" warning?

If your ESX server ever gets into such state, do something of the following in the same order:

  1. Call the best VMware guy you know or have by hand to assist you ;-). More brains usually help to avoid a stupid mistake. Or/and call VMware support, most of running ESX deployments are covered by some level of support.

  2. If you're alone for it, then try to change the LUN ID to its original value. Of course, it requires you to know what the LUN ID used to be and why it suddenly changed.

  3. If 2 is not possible, then set the ESX parameter LVM.EnableResignature to 1 (through VI Client choose ESX server/Configuration/Advanced Settings). After rescanning the storage adapter, your datastore should reappear with changed name - it will contain the original name part and a "snapshot" part. Also, it's GUID (unique identifier) will change, so the links to virtual machines will be broken and their names change to "unknown". That's not a problem, however, you can reimport them, all the configuration files should reside in the reconnected datastore. So, just go "Browse Datastore", change to each machine's directory, select the .vmx file, right click on it and select "Add to Inventory". One after another. When finished, set the LVM.EnableResignature back to 0, to keep your ESX running safe.

Can VMware do something to help admins avoid such disaster?

Sometimes, I would like to know how many virtual machines were lost in a scenario similar to that one I saw with our client. Did you experience it too? Just leave me a comment below.

Not to blame VMware, when someone tries to rewrite existing VMFS datastore, the following warning is displayed:

Maybe the exclamation triangle is "just" amber, maybe the GUI should force admins to type "iknowitsnotalostdatastoreandsoireallywanttorewriteit" before allowing them to rewrite VMFS. I believe such little obstruction might keep few more people far from falling deep in troubles.

Saturday, May 10, 2008

Insight 1 - My View on Current SAN Solutions

This post is one of a series of insights I wrote for the Techdirt Insight Community and which were published on The Future of Storage webpage, sponsored by Dell.

Introduction - VMware's Impact

My point of view is based on what I’m doing daily - consulting SAN projects in their presales phase and leading SAN installations in a storage reseller company in Czech Republic (EU). The first thing coming to my mind on this topic is that most of the storage stuff today is somehow being requested for or compared to VMware:

  • People request a storage system to be installed with VMware or at least that it should be supported by or able to connect to VMware later.
  • People ask for SAN features comparing them to what VMware does with servers: high availability, hardware abstraction, central management or DR capabilities.

For administrators, VMware usually means fewer worries and more safety. They are happy hearing that they can get similar features at the storage layer. For us, the reseller, it’s a good tool to compare SAN features to VMware since so many people have already accepted it. Selling a synchronously mirrored array is much easier after comparing the transparent failover capability to VMotion known from virtual servers.

Simplicity - It’s Not Laziness

I feel simplicity is turning out to be a strong attribute of SAN solutions. It’s not because of lazy administrators. It’s because simplicity brings more safety to people operating SANs.

There are many different systems and they are changing often, so administrators are forced to learn how storage, servers, networks communicate and affect each other globally — without being able to deeply understand a system. There’s not always enough time for details. People in IT are not cheap and so they’re often expected to sit on many seats.

In such an environment, running a “traditional”, complex storage system with all those hours managing configurations might be dangerous. Yes, there is outsourcing. But there is also the cost cutting. Which one wins? The simple, yet full-featured storage.

I believe simplicity brings safety to administrators’ work. Not having many options on how to set a system means not having many ways to do it wrong. Defining a new LUN in “traditional” storage systems requires a deep understanding of chunk sizes, raid operations, caching etc. Luckily there are vendors who moved these decisions closer to “simplicity.”

The more we think about high availability, the more we should think about simplicity. It’s hard to maintain a complex storage system that is always ready for failover. In many companies running some sort of HA solution, administrators are not sure that their system will survive a failure. Usually it is because there are many settings and many conditions for failover to succeed. If any one fails, the whole failover operation fails.

There is another attribute for extending or realizing simplicity: automation.

Automation - Intelligent Performance Manager

In my opinion there is a big space in storage systems which can be filled with automation features. Any storage array has lots of information about the data it stores and about the traffic it serves. It can perform lots of optimization features automatically.

A brief example: There’s an array with mixed SAS and SATA drives. There is a LUN with database data placed on SATA drives. The array is able to recognize the random traffic pattern and move the database data to SAS drives which can serve it faster. Or it can move just those blocks being accessed randomly, not a whole LUN. It can even adjust the chunk size or raid level after some period evaluating the traffic. On the other side - after recognizing sequential traffic - the array can communicate with the SAN client and aggregate network paths from array to client to increase throughput. Let’s call it IPM - Intelligent Performance Manager ;-)