When you run a large enough Infrastructure failure is inevitable. How you handle that can be a big differentiator. With VMware Cloud on AWS, the hosts are monitored 24×7 by VMware/AWS Support all as part of the service. If you pay for X number of hosts you should have X. That includes during maintenance and failure operations.
I’m not sure lucky is the right word but I did witness a host issue with a customer I was working with. True to the marketing It was picked up and automatically remediated.
Looking at the log extract above a new host was being provisioned the same minute the issue was identified. Obviously this needed to boot and join the VMware/vSAN cluster before a full data evacuation takes place on the faulty host and finally, the host is removed.
All of this was seamless to the customer. I noticed it as a few HA alarms tripped in the vCentre ( These were cosmetic only)
Just another reason why you should look at the VMware Cloud on AWS Service
VMware Cloud on AWS has introduced a new host to its lineup the “i3en”. This is based on the i3en.metal AWS instance.
The specifications are certainly impressive packing in 96 logical cores, 768GiB RAM, and approximately 45.84 TiB of NVMe raw storage capacity per host.
It’s certainly a monster with a 266% uplift in CPU, 50% increase in RAM and a whopping 440% increase in raw storage per host compared to the i3. Most of the engagements I have worked on so far have discovered that they are storage limited requiring extra hosts to handle the required storage footprint. With such a big uplift in Storage capacity hopefully, this will trend towards filling up CPU, RAM & Storage at the same time. This is the panacea of Hyperconvergence.
The other two noticeable changes are that the processor is based on a much later Intel family. It is now based on 3.1 GHz all-core turbo Intel® Xeon® Scalable (Skylake) processors. This is a much more modern processor than the Broadwell’s in the original i3. This brings a number of processor extension improvements including Intel AVX, Intel AVX2, Intel AVX-512
The other noticeable change is the networking uplift with 100Gb/s available to each host.
AWS Host Pricing (On-demand in US-East-2 Ohio)
*The i3.metal instance, when used with VMware Cloud on AWS has hyperthreading disabled.
At present this host is only available in the newer SDDC versions (1.10v4 or later) and in limited locations.
It also looks like the i3 still has to be the node used in the first cluster within the SDDC (where the management components reside) and they aren’t supported in 2 node clusters.
At the time of writing pricing from VMware is not available however pricing is available for the hosts if they were bought directly from AWS. Assuming the VMware costs fall broadly in line with this giving:
VMware have now released pricing. The below is for On-Demand in the AWS US-East region.
i3.Metal is £6.213287 per hour & i3en.Metal £13.6221 per hour giving:
A cost per GB of SSD instance storage that is up to 50% lower
Storage density (GB per vCPU) that is roughly 2.6x greater
Ratio of network bandwidth to vCPUs that is up to 2.7x greater
This new host type adds an additional complication into choosing host types within VMware Cloud on AWS but makes it a very compelling solution.
As previously mentioned I have been working a lot with VMware Cloud on AWS and one of the questions that often crops up is around an approach to monitoring.
This is an interesting topic as VMC is technicaly “as a service” therefore the monitoring approach is a bit different. Technically AWS and VMware’s SRE teams will be monitoring all of the infrastructure components,
however you still need to monitor your own Virtual Machines. If it was me I would still want some monitoring on the Infrastructure and I see two different reasons why you would want to do this:
Firstly I want to check that the VMware Cloud on AWS service is doing what I am paying for. Secondly I still need to monitor my VM’s to ensure they are all behaving properly, the added factor is that with a good realtime view of my workload I can potential optimise the number of VMC hosts in my fleet reducing the costs.
With that in mind, I decided to look at a few options for connecting some monitoring tools to a VMC enviroment to see what worked and what didn’t. I am expecting some things could behave differently as you don’t have true root/admin access as you would usually do. All of the tests will be done with the firstname.lastname@example.org account. This is the highest level account that a service user has within VMC.
The first product that I decided to test was Veeam One. This made sense for a few reasons: Firstly I’m a Veeam Vanguard and am very familiar with the product. I also have access to the Beta versions of the v10 products as part of the Vanguard program.
Secondly, it’s pretty easy to spin up a test server to kick the tyres and finally, the config is incredibly quick to implement.
I could have easily added a VMC vCentre to my existing Veeam servers however I choose to deploy a new server just for this testing. Assuming you have network access between your Veeam One server and the VMC vCentre then adding to Veeam One is straightforward. If not you will need to open up the relevant firewall’s
Once done Veeam performs an inventory operation and returns all of the objects you would expect. This test was shortly after a VMC environment was created so it doesn’t yet have any workloads migrated to it. However, as you can see below its correctly reporting on the hosts and VM workloads. It is correctly reporting back that the hosts are running ESXi 6.9.1
I also ran a couple of test reports to check they functioned as expected. Everything seemed to work as I would expect.
In Part two I am going to look at using Grafana, Influxdb and Telegraf and seeing if this common opensource monitoring stack works with VMC.