Software Defined X – Automation: Job Stealer or Job Enabler?

I’ve had many conversations in recent weeks about the commoditization of the data center with many being concerned about the effect of the diminishing need for specialist hardware and greater automation through software. More specifically, how that might affect the job prospects of administrators and other technical roles in the modern IT environment.

We are in an era of rapid evolutionary change and this can be unsettling for many as change often is.  There seems to be a wide variety of reactions to these changes. At one end there is the complete denial and a desire to retain the status quo, with an expectation that these industry changes may never occur. In the middle, we have those that tip their hat in recognition of the general direction of the trends, but expect things to happen more gradually and then there are those that embrace it with an expectation of gaining some level of competitive advantage by being a first mover.  If there is one thing that is certain, if you find yourself at the wrong end of that spectrum, you will most definitely find yourself in difficulty.

No Change Around Here

The change is happening and happening more quickly than most expect.  The automation of data center operations and a focus on innovation is a key objective for most organisations at the moment. “Keeping the lights on” tasks are becoming less relevant in this world.

Casting Off the Shackles of Hardware

Development of custom hardware based intelligence is complex. This often involves the research and production of custom chipsets for these devices.  Due to the research, prototyping and production requirements of this type of operation.  We are usually working to a 2-3 year development and release cycle. In fact, most organisations have been used to using this kind of procurement cycle, executing a hardware refresh every 3-5 years.

This has worked historically, but today there are new kids on the block and they are eating the market with a new approach to developing and delivering services. Pioneers like Facebook, Google and Netflix have fundamentally changed how service delivery works. These operations have decoupled their software intelligence from hardware and deliver their services based on commodity inexpensive hardware. This not only reduces their capital outlay, it also provides them with a platform to rapidly deliver agile software services. In these types of environments, it is not uncommon to see software releases move from a 18-24 month cycle to a daily or weekly cycle. Strategically they can pivot at a moments notice and they can easily scale or contract operations at a very low-cost. As you might imagine, this kind of agility has become very challenging from a competitive stand point for companies like Microsoft who have had 3-4 year major release cycles baked into the fibre of their operational approach (e.g. Exchange, Windows Server, etc).

What About Automation?

The more we move towards software controlled infrastructures, the more easily they can be automated. Most solutions today are built with some kind of API (application programming interface) to enable other applications to programmatically control or manage them in someway. In this decade, the industry has moved firmly away from proprietary API technologies, towards standardised ones. More often not based on the RESTful API architecture. Alongside this we are starting to see the rise of DevOps tools such as Puppet and Chef, which help bridge the gap between IT operations and the developers actually creating the applications that organisations rely on.

So What Does This Mean For the Modern IT Professional?

As the development of these tools and API interoperability progresses, undoubtedly, IT operations roles will also have to evolve.  This does not mean that there will be fewer jobs in IT.  In fact, IT skills have become more relevant than ever, but those skills have to change a little.  It is time to start moving up the stack by putting more focus on innovation in the area of application and service, rather than keeping the lights on down in the bits and bytes of the infrastructure. By doing this, these industry changes should become a massive job and career enabler, not a cause of suspicion and concern for job security.

I had a chat with a family member this week which summed this up really well for me.  We were discussing the Luddites, a 19th century movement in my home region of the North of England. The Luddites, were a group of textile workers who protested against the mechanisation of the production of garments. They did this violently under the auspices of “those machines are taking our jobs, we’ll have nothing to do and we’ll all starve”. A couple of hundred years on, we can see that starvation didn’t happen and those same people survived by finding new ways to innovate. On a sidenote, I once received a letter from a CBE calling me a Luddite who had seen me on TV discussing an environmental issue. I found this most amusing given the industry I work in and my lust of technological progress. In the same conversation with the family member, I mentioned that I was looking forward to the introduction of robot-taxis (e.g. Self-driving Google Cars) due to the efficiencies and cost of car sharing. They replied “but that could be 20,000 taxi drivers losing their jobs in Manchester alone”. I replied “Yes, but that’s also 20,000 people who could alternatively be working on curing cancer, pioneering space travel or solving the world’s energy problems”.

Conclusion – Software Defined X – Automation: Job Stealer or Job Enabler?

For me I see it as a Job Enabler. My advice… embrace the change, relish the opportunity to innovate and change the world for the better.. one step at a time.

Whitepaper: Virtual Backup Strategies: Using Storage Snapshots for Backups

Introduction
Effective data protection is a mandatory element in the modern IT environment. Historically, backup strategies were confined to the last few chapters in an administrator’s manual and treated like an afterthought. Now they sit firmly at the forefront of every CIO’s mind. The ability to continue business operations after a system failure and the need to fulfil stringent compliance requirements have made backup a necessity—not only for business continuity, but also for
business survival. The question organizations need to ask about data protection is not whether to backup their data, but how to backup their data.

IT systems are prone to rapid evolution and present a constantly shifting landscape and the techniques used to protect those systems need to evolve as well. Perhaps one of the most significant changes in recent years has been the advent of virtualization. In the virtual world, legacy backup systems have become unfit for their purpose, causing backup windows to increase beyond a manageable scope. While this paradigm presents new challenges, new opportunities to improve efficiency, cut costs and reduce risks are also created.

This paper will examine the use of storage snapshots as backups for virtual environments. We will evaluate the relative benefits and limitations while also considering where they fit into a holistic backup strategy when compared to a virtual disk-to-disk backup solution such as Veeam® Backup & Replication™

Background
Pre-virtualization backup strategies were underpinned by operating system (OS) and application-level features. The typical implementation would involve installing a backup agent into an OS and the agent would be responsible for putting applications into a consistent state for backup; copying backup data across the network to a backup server and subsequently monitoring any ongoing changes.

While this worked well in the physical world, virtualization changed everything as operating systems began to share the same physical hardware. Instead of having one backup agent consuming resources from a physical host, there was an agent for each virtual machine (VM) on that host. This meant that ten agents (based on a 10:1 consolidation ratio) or even more could be contending for the host’s CPU, RAM and disk resources. This contention was not only with each other, but also with the applications they were installed to protect. In addition, volumes of data increased to a level where it was no longer feasible to use standard transports to move it across the production network to the backup server. This situation clearly could not continue as virtualization has become the standard practice of datacenters worldwide.

Virtualized Layers

Where virtualization presented new challenges, it also presented new opportunities. The physical world consisted solely of the application/OS layer. The virtual world, Continue reading

PernixData Unbuttons Trench Coat at SFD3 and Reveals..

Flash Virtualization Platform.

SFD3_FVP

It seems in Storage circles there is much discussion around using new technologies to cache data in faster more accessible media. Flash is everywhere.. but there are many choices for where and how you can deploy flash technology in order to alleviate strain on storage systems, whose current SAS/SATA based disk drives are struggling to keep up with the day to day IOPS requirements of many organisations.

Pernix Data believe they have the solution for VMware environments in the form of their Flash Virtualisation Platform (FVP). They certainly have the credentials to be making such claims with team members coming from many leading companies. This includes, Satyam Vaghani, their CTO who came from VMware and was responsible for the creation of VMware’s VMFS filesystem.

FVP is a hardware agnostic server-side flash virtualization solution.  It virtualizes PCIe flash cards and local SSD at the server side. What I found particularly impressive about this solution is that it is a software only solution, that looks very easy to implement. It sits seamlessly between hypervisor and SAN without requiring any configuration changes to the either the datastores at the hypervisor or LUNs at the SAN. It’s just a extension that is easily implemented on vSphere.

The clustered nature of the product also overcomes some of the current server-side flash device challenges. When flash caching is being used in a server, a footprint of hot, commonly accessed data is built up for the workload running. If a VM (and its associated workload) migrate to another host due to vMotion or some other reason, the footprint needs to be recreated from scratch. FVP resolves this by replicating copies of the footprint data to other host in the cluster, making it easy for a VM to pickup it’s cached footprint if it moves. There are also, obviously data protection benefits to keeping multiple copies of the data in the event a server dies along with it’s cache.

In addition, to easy implementation the product also provides solid easy to read stats on what results it’s achieved. A sure fire way to build a solid business case around IOPS saved and the reduced requirement to scale up your SAN to deal with load.

What these new caching capabilities amount to is an entirely new storage tier between RAM & SAN. This new tier (or layer) will definitely come with challenges. One such challenge would be ensuring consistent copies of data at the SAN for things like Backup processes. If FVP caches the data at the server, some reads/writes never actually reach the SAN. So if your backing up from SAN you need a way to flush the data through in a consistent state. FVP does include a “write-through” mode which should flush changes to disk and stop caching (i.e. write-back mode). In order to achieve consistency there will need to be careful orchestration from VSS (or prefreeze/postthaw scripts on Linux) to FVP to VMware Snapshot and beyond. The product will have a PowerShell interface which could be used to switch between write modes for such an operation, but users should be aware that this is a requirement.

All in all, FVP looks like a great product that hist a lot of my hot buttons.. easy-to-install, easy-to-use, transparent but powerful and solid reporting on results. Although not publicly available as yet, it will be interesting to see the licensing model, TCO and ROI information Pernix provide. If they get that right, they could be very successful.

Here is the Intro from Storage Filed Day 3:

 

Webinar: Disaster Recovery for Virtual Environments, One Simple Solution for Five Common SAN Replication Challenges

This is a replay of webinar, I ran last year.. the associated Whitepaper is linked below:

Whitepaper Available here: http://wp.me/p2ZZG3-fG

A new sister webinar/whitepaper focusing on using SAN snapshots in a holistic data protection strategy to be posted shortly.

Whitepaper: Disaster Recovery for Virtual Environments, One Simple Solution for Five Common SAN Replication Challenges

Introduction
It would be no overstatement of fact to say that in the last five years virtualization has radically changed the landscape of IT infrastructure for the better. Workloads encapsulated into standardized virtual machines have significantly increased our ability to optimize and use physical resources in a way that saves much time and money. In addition to these economic benefits, new avenues have opened up to tackle data protection and disaster recovery, allowing us to increase service uptime while also reducing business risk. This white paper focuses on some of the common challenges experienced while implementing and using SAN-based replication for disaster recovery and it examines an alternative approach to disaster recovery to help resolve these issues.

Background
Pre-virtualization disaster recovery plans were underpinned by application-level features hooking directly into specific hardware to achieve the required business recovery goals. To ensure that disaster recovery could be achieved, network infrastructure, hardware, software and application data were replicated to an offsite location, commonly referred to as the Disaster Recovery (DR) site. Depending on an application’s required Recovery Point Objective (RPO) and Recovery Time Objective (RTO), costs could spiral upwards to achieve small improvements in both RPOs and RTOs. When you increase the amount of application uptime provided from 99.99% to 99.999%, the cost increase is not linear, it’s exponential. With the advent of virtualization, the infrastructure stack gained a new layer, enabling the movement of workloads between geographically dispersed locations. Importantly, this is achieved without requiring application-specific engineering, because workloads are compartmentalized and encapsulated into virtual machines. In a virtual machine, everything needed to support that workload can be encapsulated into a set of files in a folder and moved as a single contiguous entity.

Scope and Definitions
The virtualization and storage layers are examined below; i.e., virtual machines (VMs), hypervisors and storage area networks (SANs). Application-level replication is beyond the scope of this document.

There are many potentially overlapping terms, which people often interpret differently. For the purposes of this paper, I will use “Continuous Data Protection (CDP),” “synchronous,” “fault tolerant,” “asynchronous” and “high availability.”

CDP consists of synchronous replication, which in turn involves double-writing to two different devices where an application (or hypervisor) only receives a confirmation of a successful write to storage when both devices have acknowledged completion of the operation. CDP can help you achieve a zero RPO and RTO but requires strict hardware compatibility at both source and destination sites. This allows you to deploy VMs in a cross-site, fault-tolerant configuration, so if you have an infrastructure problem, you can failover to the DR site without any downtime.

Synchronous solutions are expensive and require a lot of network bandwidth but are appropriate for some mission-critical applications where no downtime or data loss can be tolerated. One issue with synchronous replication is that data is transferred to the DR site in real time. This means that if the disaster is driven by some kind of data corruption, malware or virus, then the problem that brings down the production site simultaneously does the same to the DR site. This is why synchronous implementations should always be combined with an asynchronous capability.

This paper primarily is concerned with asynchronous replication of virtual infrastructures for disaster recovery purposes. An asynchronous strategy takes a point-in-time copy of a portion of the production environment and transfers it to the DR site in a time frame that matches the required RPO/RTO goals, and this may be “near real-time/near CDP” or scheduled (hourly, daily, etc.). This is more akin to high availability than fault tolerance. High availability in virtual environments refers primarily to having cold standby copies of VMs that can be powered on and booted in the event that the live production VM is lost. This approach underpins most currently implemented DR strategies.

The next section examines how SAN technologies approach asynchronous replication and the differences between SAN-level and VM-level strategies to achieve DR objectives.

SAN Replication Overview
SAN devices are typically engineered to aggregate disk resources to deal with large amounts of data. In recent years, additional processing power has been built into the devices to offload processing tasks from hosts serving up resources to the virtual environment. The basic unit of management for a SAN device is a Logical Unit Number (LUN). A LUN is a unit of storage, which may consist of several physical hard disks or a portion of a single disk. There are several considerations to balance when specifying a LUN configuration. One LUN intended to support VMs running Tier-1 applications may be backed by high-performance SSD disks, whereas another LUN may be backed by large, inexpensive disks and used primarily for test VMs. Once created, LUNs are made available to hypervisors, which in turn format them to create volumes; e.g., Virtual Machine File System (VMFS) on VMware vSphere and Cluster Shared Volume (CSV) on Microsoft Hyper-V. From this point on, I will use the terms “LUN” and “volume” interchangeably. A LUN can contain one or more VMs.

 SAN LUN Configuration

For SAN devices, the basic mechanism for creating a point-in-time copy of VM disk data is the LUN snapshot. SANs are able to create LUN-level snapshots of the data they are hosting. A LUN snapshot freezes the entire volume at the point it is taken, while read-write operations continue without halting to another area of the array. Continue reading

A Short History of Storage Devices

I can’t really ever see myself being a technology historian, but I do find the development of technology and technology advancements astonishing at times. It’s easy to forget that some years ago, I used to think that my new 20MB hard drive was the bees knees.. “It’s 20MB of Hard Disk Drive, I’ll never need to upgrade it. I could store my entire life’s work on it and still have room to back up my floppies.”. How times change. I recently found this infographic on the spamfighter blog which I think summarizes the rise of storage perfectly. Check out the difference in per GB cost between 1980-2010: Continue reading