image_pdfimage_print

My Loyalties….

image_pdfimage_print

This is a modified version of a post I made on VMware’s internal Socialcast, expressing my feelings of how important it is to work together.  A reminder:  this is *my* blog, not an EMC, VMware or Pivotal blog.

My paycheck comes from David Goulden.  I have an emc.com email address.  I give briefings at the EMC EBC on the EMC strategy.  I train EMC teams on the integration points between EMC and VMware and Pivotal.  In that sense, you could say I’m an EMCer.  I’d agree with you.

I sit in PromD almost every day.  I have a vmware.com email address.  I have a vmware badge.  I train EMCers on how to use/sell VMware products.  I write joint messaging documents.  I’m a VCDX and VCDX panelist.  I’m a vExpert for every year its existed.   I give presentations at the VBC alongside VMware teams at least monthly.  In that sense, you could say I’m a VMware-ite (er?).  I’d agree with you.

My badge works in the Pivotal and Greenplum buildings.  I write apps in Python, Java using things like RabbitMQ, Redis and Gemfire against PivotalHD.  I believe that PaaS is the future of this industry.  In that sense, you could say I’m a Pivoter.  I’d agree with you.

Really, I believe I am an evangelist for the Federation.  I don’t believe that EMC alone has what it takes to be the best in this market – we lack some very critical things around agility, understanding of applications and the mobile world.  These are things that VMware and Pivotal have down pat.  In concert, I think that EMC brings some value in a strong depth and breadth of physical infrastructure that Pivotal and VMware dont have.  There’s a reason these three companies exist together…

…because together, we are be the best choice for a customer, if we work together.

As a result, I push for us to improve everywhere I can.  Sometimes that means telling an internal EMC team to use VSAN rather than a VNX (which I did yesterday), sometimes it means being honest about cost models so we all know where we stand on things like $/GB, sometimes it means suggesting an EMC product to fill a gap for a customer where VMware doesn’t have a play, and sometimes it means admitting when an EMC product isn’t the best choice to use at VMware (in my role as their global architect) and suggesting something else.

I’m not emotional about specific products – but I am absolutely invested in this.  I know beyond a shadow of a doubt that my success (personally, professional, monetarily) is strongly tied to the success of the Federation as a whole.  I *need* products like VSAN, vC Ops, ScaleIO, ViPR, etc to win in the market so I can feed my family, and I will work as hard as I can to make that happen.  Ultimately, I think that this will benefit the customers.  As Chad wrote in his pre-sales manifesto: “We put the customer first, company second, and ourselves third.”

We will all win together; customers, partners, vmware, emc and pivotal.

I’m with you.

VCP Recertification

image_pdfimage_print

VMware announced recently that they would start requiring re-certification of VCPs.  I’m not sure I feel like this is a good call.

Their reasoning is two fold (as far as I can tell):

  1. Ensuring that VCP holder’s knowledge is current.  “But staying up to date in the expertise gained and proven by your certification is equally vital. If your skills are not current, your certification loses value.”
  2. Most other industry certifications require this as well. “…and is on par with other technical certifications like ones offered by HP, Cisco and CompTia (A+)” – Christian Mohn

I think both of these arguments are specious.

  1. VCPs are tied to a specific version of vSphere…they aren’t ‘version agnostic’.  Check out my transcript below:Screen Shot 2014-03-10 at 10.38.33 AMEvery certification I’ve received is version specific.  Meaning there’s nothing to ‘keep up to date’.  vSphere 3.5 hasn’t changed.  Therefore my VCP3 shouldn’t need to be updated.  Clearly, if I don’t hold VCP4 or VCP5 (or some other, higher certification), I can’t show that I’ve been keeping up with the technology, and that my knowledge of the current product line is outdated, but that doesn’t impact the understanding in my VCP3.  There’s no reason to remove my ability to use the VCP3 logo…and perhaps VMware should drop the use of unversioned logos:
    VCP_Logoin favor of versioned logos like this:

    vcp5_image_400X300So that its more clear how current a given person’s certification is.

  2. Other vendors do expire certifications, but the majority of vendors also don’t specify a version number on the certification itself.  The CCNA, CCIE, from everything I understand are non-version specific, and therefore having to ‘recertify’ is entirely reasonable, because the technology they refer to has changed.

So there you have it – I think the re-certification requirement is silly.

Quick Post: VNX Snapshots Performance

image_pdfimage_print

I’ve recently been working on a design for VNX behind VPLEX, and wanted to do some pseudo-continuous data protection.  Now, normally one would use RecoverPoint on VPLEX for that, but there are some current limitations that made that not work for me – specifically, the fact that RecoverPoint CDP can only be used with one side of a VPLEX MetroCluster at a time.  If that side of the cluster goes down your data stays up (good), but you lose your CDP with it (bad).

Unknown-1So, the next option would have been to do RecoverPoint directly on the arrays (VNX) using the built in splitter…this is also a reasonable idea, but has a downside – RecoverPoint requires LUNs (called copy LUNs) that are the same size as the production devices.  So for a single CDP copy at either side of a 100GB production device, we are storing 300GB of data (+ another 50 or so for journals).  Because space is at a premium in this environment, I wanted to look at something a bit more ‘thin.’

UnknownSo, I turned to the Pool-based snapshots in VNX that were released with the Inyo codebase (and still available of course in Rockies on the VNX 2 series).  I like these because they consume space from the same pool the production VM is in (no need to strand space in dedicated snap devices), and consume only as much space as was written to the LUN.  Lastly, they use a Redirect-on-Write technique to avoid the performance hit of Copy On First Write like the older SnapView snapshots did.  Sometimes these are called ‘AdvancedSnapshots’, as an FYI.

But – how impactful are the snapshots?  How do they impact performance?  I decided to test this, to see if would be a reasonable thing to propose to my customer.

Screenshot 2013-12-31 11.23.39

I set up a quick test.  I used a VNX5500 in my lab (so not the current VNX2 series, and also a much lower end model than the customer’s VNX7600), and used 25x300GB 10K SAS drives along with 4x100GB EFDs in a RAID5 pool.  Carved out a 2TB LUN and allocated it to a host.

I started off with just a streaming test of 100% write traffic, and as able to achieve about 550MB/sec sustained (about 5.5GBit).  Next, I wrote a loop to write data, and create snapshots as I went:

[vmax] /mnt/A: for i in {1..96}; do ] 10:30 AM %
for> naviseccli snap -create -res 4 -keepFor 1d 
for> sudo dd if=/dev/zero of=/mnt/A/10G bs=1M count=10240
for> echo $i
for> done

Every iteration of the loop therefore had 10GB of new data, as far as the array was concerned. I let this run 96 times, to simulate hourly snapshots over 4 days.  During this process, I kept track of the write performance.  Here’s a snippet:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.988 s, 565 MB/s
6
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 17.2495 s, 622 MB/s
7
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 17.4732 s, 615 MB/s
8
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 19.1517 s, 561 MB/s
9
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.7212 s, 574 MB/s
10
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.8609 s, 569 MB/s
11

As you can see, its very consistent, and there’s no real degradation in performance while taking the snapshots.

Lastly, I ran a similar test while deleting all those snapshots in the background, to make sure that the customer wouldn’t experience any degradation as the snapshots aged out and were deleted as time rolled on.  Another snippet:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 19.1024 s, 562 MB/s
13
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 20.2217 s, 531 MB/s
14
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.1915 s, 507 MB/s
15
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 20.8978 s, 514 MB/s
16
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 19.8404 s, 541 MB/s
17
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 19.5746 s, 549 MB/s
18

Again, not notable performance difference.  Thats some good news, as it means I can suggest this idea to a customer without concern.

ScaleIO @ Scale – Update

image_pdfimage_print

If you saw my recent post about pushing the limits of ScaleIO on AWS EC2, you’ll notice that I had a few more plans.  I wanted to push the node count even higher, and run on some heavier duty instances.

Well, the ScaleIO development team noticed what I had done, and decided to take my code and push it to the next level.  Using the methods I developed, they hit the numbers I had been hoping to get.  1000 nodes, across 5 protection domains, all on a single MDM and single cluster.

995SDS_400SDC_100Vols_923773IOPSAs you can see, the team was able to get 100 volumes built and 400(!) clients.  Most impressively, using minimal nodes (I believe these to be m1.medium nodes), they achieved a 3.5 Gbytes/s (yes, bytes!) – 28 Gigabit worth of performance across the AWS cloud.  Also, very nearly 1 million IO/s.  Needless to say, I was floored when I saw these results.

Special thanks to the ScaleIO team, Saar, Alexei, Alex, Lior, Eran, Dvir who ran this test (with no help from me, a mean feat in itself given how undocumented my code was!) and produced these results.

Lastly, I also got my hands on 10 AWS hi1.4xlarge instances, which have local SSDs…Unfortunately, I managed to delete most of the screen shots from my test, but I was able to achieve 3.5-4.0 Gbytes/sec using 10 nodes on the same 10GBit switch.  Truly impressive.  And, as a number of people have asked about latency….average latencies in that test were ~650 µsec!  The one screen shot I was able to grab was during a rebuild after I had removed and replaced a couple nodes.

Screen Shot 2013-11-14 at 8.22.38 PM

 

Rebuilding at 2.3GB/s is something you rarely see :).

I’m really happy to be able to share these cool updates from the team.  Feel free to ask questions.

 

ScaleIO @ Scale – 200 Nodes and Beyond!

image_pdfimage_print

Buzz-Robot-2_1266239631Ever since my last post a couple weeks about ScaleIO, I’ve been wanting to push its limits.  Boaz and Erez (the founders of ScaleIO) are certainly smart guys, but I’m an engineer, and whenever anyone says ‘It can handle hundreds of nodes’, I tend to want to test that for myself.

So, I decided to do exactly that.  Now, my home lab doesn’t have room for more than a half dozen VMs.  My EMC lab could probably support about 50-60.  I was going for more – WAY more.  I wanted hundreds, maybe thousands.  Even EMC’s internal cloud didn’t really have the scale that I wanted to do, as its geared for more long lived workloads.

So, I ended up running against Amazon Web Services, simply because I could spin up cheap ($.02/hr) t1.micro instances very rapidly without worrying about cost (too much – it still aint free). They have an excellent API (boot) that is very good and easy to use.  Combine that with the paramiko ssh library and you have a pretty decent platform to deploy a bunch of software on.

Some have asked why I didn’t use the Fabric project – I didn’t feel that its handling of SSH keys was quite up to par, nor was its threading model.  So rather than deal with it, I used my own thread pool implementation.

Anyways – where did I end up?  Well, I found that about 5% of the deployed systems (when deploying hundreds) would simply fail to initialize properly.  Rather than investigate, I just treat them as cattle and shoot them in the head, then replace them.  After all the nodes are built and joined to the cluster, I created 2 x 200GB volumes and exported them back out to all the nodes.  Lastly, I ran a workload generator on them to drive some decent IO.

I ended up being able to shove 200 hosts into a cluster before the MDM on ScaleIO 1.1 refused to let me add any more. I haven’t identified yet if that is actually the limit, nor have I tried with ScaleIO 1.2 yet.  But – you can bet its next on my list!

What does it all look like?

Here are the nodes in the Amazon Web Services Console…

Screen Shot 2013-11-07 at 11.44.45 AM

And then they’ve all been added to the cluster:

Screen Shot 2013-11-07 at 11.39.26 AMThen, I ran some heavy workload against it.  Caveat: Amazon t1.micro instances are VERY small, and limited to less than 7MB/s throughput each, along with about half a CPU and only about 600MB RAM.  As a result, they do not reasonable represent the performance of a modern machine.  So don’t take these numbers as what ScaleIO is capable of – I’ll have a post in the next couple weeks demonstrating what it can do on some high powered instances.

Screen Shot 2013-11-06 at 9.36.01 PM

 

Pushing over 1.1GB/s of throughput (and yes, thats gigabytes/sec, so over 10Gbits total throughput across almost 200 instances.

Screen Shot 2013-11-07 at 11.39.39 AM

 

The individual host view also shows some interesting info, although I did notice a bug where if you have more than a couple dozen hosts, they won’t all show individual in the monitoring GUI.  Oh well – thats why we do tests like this.

Lastly, when I terminated all the instances simultaneously (with one command, even!), I caught a pic of the very unhappy MDM status:

Screen Shot 2013-11-07 at 11.43.20 AM

 

How much did this cost?  Well, excluding the development time and associated test instance costs….to run the test along required 200 t1.micro instances @ $0.02/hr, 1 m1.small instance @ $0.06/hr, 201 * 10GB EBS volumes @ $0.10 / GB-month.  In total?  About $7.41 :).  Although if I add in the last couple weeks worth of development instances, I’m at about $41

Screen Shot 2013-11-07 at 12.20.33 PM

 

Maybe @sakacc (Chad Sakac) will comp me $50 worth of drinks at EMCworld?

Lastly, you can find all the code I used to drive this test at my GitHub page.  Note – its not very clean, has little documentation and very little error handling.  Nontheless, it helpful if you want some examples of using EC2, S3, thread pooling, etc.

I’ll have 3-5 more posts over the next week or two describing in more depth each of the stages (building the MDM, building the Nodes, adding to the cluster and running the workload generator) for the huge nerds, but for now – enjoy!

 

Quick ScaleIO Tests

image_pdfimage_print

I managed to get my hands on the latest ScaleIO 1.2 beta bits this week, and wanted to share some of the testing results.  I’ve been pretty impressed.

I installed a cluster consisting of 4 total nodes, 3 of which store data, and one of which provides iSCSI services and acts as a ‘tie breaker’ in case of management cluster partitions.

Each of the 3 nodes with data (ScaleIO calls these SDS nodes) was a single VM with 2 vCPUs and 1GB of allocated memory.  Very small, and I suspect I can knock them down to 1vCPU given the CPU usage I saw during the tests.  Each one also had a single pRDM to the host’s local disk (varying sizes, but all 7200 RPM SATA).  Building the cluster was fairly simple – I just used the OVA that ScaleIO provides (although a CentOS or SuSE VM works too) and used their installer script.  The script asks for a bunch of information, and then simply installs the requisite packages on the VMs and builds the cluster based on your asnwers.  Of course, this can all be done manually, but the handy script to do it is nice.BWEl-VnCIAA3HvZ.png-large

Once it was installed, the cluster was up and running and ready to use.  I built a volume and exported it to a relevant client device (the one serving iSCSI).  From there, I decided to run some tests.

The basic IO patterns were the first ones I tried, and I did pretty well:

  1. 125 MB/s sustained read
  2. 45 MB/s sustained write
  3. 385 IO/s for a 50:50 R:W 8K workload (very database like).

These are pretty great numbers for just 3 slow consumer class drives.  Normally, we’d rate a set of 3 drives like this at about 60% of those numbers.  Check out the dashboard during the write test:

Screen Shot 2013-10-08 at 12.50.07 PM

After that basic test, I decided to get more creative.  I tried removing one of the nodes from the cluster (in a controlled manner) on the fly.  There was about 56GB of data on the cluster at that point, and the total time to remove?  6 mins, 44 sec.  Not bad for shuffling around that much data.  I then added that system back (as a clean system), and the rebalance took only 9 mins, 38 sec – again averaging about 48MB/s (about the peak performance that a SATA drive can sustain).

The last set of tests I decided to run were some uncontrolled failure tests, where I simply hard shut down one of the SDS VMs to see how the system would react.  I was impressed that the cluster noted the failure within about 5 seconds of the event and instantly began moving data around to reprotect it (again, peaking around 54 MB/s).  It took about 7 minutes to rebuild…not bad!  I’ve included a little screen cast of that below.

I then powered that host back on to see how the rebalance procedure looks (remember, its not a rebuild anymore, because that data has been reprotected already – its pretty much the same as adding a net-new host).  I have another screencast for that too.

All told, I’m pretty impressed.  Can’t wait to get some heavier duty hardware (Chad Sakac, are you listening?) to really push the limits.

Barcelona Bound

image_pdfimage_print

Just a quick post this week (I have one coming next week about business relationships)…

Couple people have asked me what I’m doing for telecom in Barcelona for the VMworld show.  It has certainly been quite a few years since I was super into cell phone technology (at one point I went through like 4 phones a year!), so I had to do some research.  I knew I wanted to use my iPhone, and I wanted to use data over there, without paying the stupid fees that my carrier (Verizon) charges.

So first I had to do the research to figure out if

  1. my phone (a verizon iphone 5) uses the same frequencies as spain
  2. my phone is gsm compatible
  3. my phone is unlocked

Some quick research showed me that all verizon iphone 5’s are unlocked by verizon (what a surprise!), that the frequency bands are compatible for 3G data (no significant LTE in Europe, apparently) and that I can use it on GSM.  I can.  Sweet!
Now I just needed to find a SIM that would work over there.  Now, a bunch of friends recommended just getting a PAYG SIM once I arrived, but my Spanish is rusty, and I’d like to be able to call my family as soon as I land without paying exorbitant fees.  Also, I’m a planner, so I wanted this out of the way.  Found a bunch of options, but I settled on HolidayPhone for a couple reasons:

  • They provide a pre-cut SIM for my phone
  • They provide the SIM ahead of time and mail it to my house (from Sweden, no less!)
  • They provide a pretty good rate plan ($0.06/m to the US, and at least an hour of free incoming calls)
  • They have a clever call forwarding system so that I remain reachable on my US number, but also via Spanish phone number as well.
  • They have a good data plan (550MB for 30 days for about $7).

I feel like this combo will let me call my family as much as I like (and they can call me), call my colleagues and use Twitter/blogs/tethering etc as much I like (550MB is plenty for me given I will only be there for a week, and be limited to 3G performance).
Worth a shot, at least, for a pretty cheap experiment of about $60

See everyone soon – I’m excited to see Barcelona for the first time.

The VCDX Brain Drain (or VMware Should Be A Net VCDX Exporter)

image_pdfimage_print

I have a growing concern about the future of the VCDX certification track.  Its certainly a vested interest; as a current VCDX, I want the certification to be hard to achieve, well marketed and valuable within the industry.  Why?  Because I like money, of course.

So my concern is this: it appears to me that VMware is aggressively hiring VCDXs, which I think is wrong, and dangerous to the future of the certification.

Currently, VMware employs ~45 VCDXs, when I count through the VCDX page and apply some recent knowledge of moves.  In the past year, VMware has directly hired at least 4 VCDX from partners.  I believe there may be more.  I wont name them directly, because this article isn’t about them (I applaud their personal choices), its about VMware.

VMware should not be hiring VCDXs from partners.  In fact, I might go so far as to suggest VMware probably should avoid hiring existing VCDXs at all.  I believe this presents two dangers to the program itself.

  1. By draining the partner pool of VCDXs, VMware is effectively telling partners, “sure, go ahead and spend many thousands training this person up to VCDX level, paying for their hotels, defenses, etc – when you are done, we will go ahead, swoop in with a sweet offer you can’t match and take them.”  This is hardly the way to engender loyalty among partners.  The natural outgrowth of this tactic is that partners will no longer be interested in supporting the candidacy of a VCDX, simply because it wouldn’t provide them any value.
  2. The size of the VCDX pool is of crucial importance.  Too small, and customers either don’t know about the certification (and therefore don’t recognize the value a partner with the certification brings).  Too large, and it becomes nearly routine (think A+ certifications), and therefore of low value.  The pool needs to be large enough to be known, small enough to be a little rare, but again large enough so that there is a reasonable chance that a customer can find a partner with a VCDX or two (or three) on staff.  By hiring and employing so much of the VCDX pool (nearly 40%, by my count), VMware artificially limits the number of partners than will create or employ a VCDX, thus reducing the visibility and value of the certification itself.

Of all the players (partners, VMware itself, vendors, people), the ones with the most opprtunity to fix this is VMware themselves.  With the VERY solid braintrust they have, existing large VCDX pool and the extensive resources and PSO-style options, VMware should be a VCDX-production machine.  It should be trivial (and a goal) for them to hire good people, train them up to VCDX level, get them certified internally and then (eventually, after a couple years paying their dues in PSO or what-have-you) going out into the partner community.  This have a number of effects:

  • The VCDX population grows to a larger size (which it needs to).
  • The partners are no longer afraid of supporting a VCDX candidacy
  • VMware gets all the VCDX power it needs

This has some parallels to other top level certifications that some colleagues have brought up.  Specifically, they mentioned Cisco CCIE, and if I also believe that Cisco should not hire CCIEs.  I’d argue that, currently, given that there are thousands of CCIEs, that the pool size is no longer a problem, and that Cisco is in a different position.

With VMware out of the business of hiring partner VCDXs, partners & vendors can go back to supporting VCDX candidates without fear and VMware can produce those that it needs internally.

Thoughts?

Is NSX Really New?

image_pdfimage_print

I had an interesting request from a colleague recently.  They had a customer suggesting that NSX was really nothing new, and could be replicated more cheaply.  In the customer’s words:

NSX is a suite of tools (not marketed that way, but it is). OpenVSwitch is a single tool. With IPTables, StrongSwan, OpenVPN and OpenVSwitch as a collective suite of tools you can get complete NSX functionality for $0 on Linux.

As you might expect, I have a few thoughts on this.  The value of a suite of products is greater than the sum of its parts.  Is it possible to get similar functionality for a narrow set of requirements using iptables, OpenVPN, OpenvSwitch, etc?  Sure.  Is the cost of that $0?  Absolutely not.    Consider the following that NSX offers (as a suite):

  • Full support from a vendor (non trivial, includes QA testing, and multiple experts to call when things go wrong).  Not something a home grown solution can offer.
  •  Full support for a huge range of operational models.  Not something a Linux-only solution can offer (as much as I love Linux and detest Windows, Windows does exist).
  •  Full REST API to integrate into a larger workflow (which is half the point of network virtualization, no?) – could be built by a lone guy, but now he’s responsible for a huge dev project.
  •  Community and Vendor Ecosystem.  Want your stuff to terminate into the real world?  Arista, Broadcom, Brocade, etc can all help with that with NSX.  Homegrown?  Maybe – maybe not.

Time costs money – its not free.  By the time you built out the required software, made it resilient, added a good GUI, built integration into the common products out there, added a API and built a partner ecosystem, you would have built a company worth about $1B.  How do I know?  Because someone did – Martin Casado, and that company was Nicira – now owned by VMware and the basis for VMware NSX.