Opinions

New Chapter – Same Book

Today is my first day in a new role; I’m joining EMC as an Advisory Systems Engineer in the Enterprise team. If you’d look at my LinkedIn profile, this might seem like a big change from what I’ve been doing the past ten years; though personally this is just the next chapter in the same book.

DirkGentlyBookCover01What book you may ask? Well if I had to pick my favorite author, it would be Douglas Adams. The unique blend of dry, witty humor, science fiction, technology, and frankly just the bizarre anti-climatic twists. So if I picked a favorite book, you’d think it would be his seminal work, The Hitchhikers Guide to the Galaxy. While I adore these books to the extent I reference them almost daily; it actually would be Dirk Gently’s Holistic Detective Agency.

I believe I was around 10 when I first read the book; full of questions about myself, and well; life, the universe, and everything. While Hitchhiker simply pokes fun at asking such audacious questions; Dirk tackles it through understanding and exploiting the fundamental interconnectedness of things in his seemingly erratic approach to detecting. If you haven’t read the book, (go do it now) I won’t ruin it for you. But Dirk believes that everything is connected, that only by looking at everything, even the seemingly unrelated, can you solve the whole problem. So much, in fact, that Dirk cracked his greatest case by asking a child; whom he felt were free of the filters that hide the whole solution.

The book and Dirk’s approach stuck with me as I started working in IT. I began to notice how much it applied to technology. Not that I started believing that ghosts or time-travel explained issues (often it does seem more likely); but rather like the 10Base2 network I first managed, the problem was often down the wire. That someone simply kicking back their feet could take down the whole system. The proverbial butterfly flapping its wings is an everyday truth in IT.

As I took on more complex roles, even sometimes seemingly unsolvable issues and projects; I always imagined myself as Dirk. I tried to look with an open mind, not scared of retreading previous tracks, asking seemingly stupid questions or even posing ideas that appeared unorthodox (ironically in technology I’ve learned that today’s unorthodox is tomorrow’s status quo).

As my roles grew from administrative and troubleshooting in nature; toward designing and developing solutions; I continued this approach. Always wanting to ensure my portion of the design was well connected with the whole, and that the whole worked together elegantly. I quickly realized to be successful in this approach; I needed a broad experience across all the elements that are interconnected in IT.

So I made it my goal to shift my perspective continually, to take new opportunities outside of my comfort zone; bringing with me my knowledge but learning the impact of aspects of IT on each other.

Taking stock of this goal; I count the blessings of opportunities over the years. I’m so lucky to have personal experience in dimensions of IT such as:

  • Domains like programming, quality assurance, infrastructure, operations, performance/capacity, database management, even more as a leader.
  • Technologies across network, storage and servers; Windows, Linux, Unix, Mainframe running and coding languages of all flavors.
  • Industries traversing insurance, healthcare, travel, retail, real-estate and finance.
  • Organizations that were cloud computing, software vendors, and enterprise brick and mortar.
  • In companies sizes counting employees in the hundreds, thousands and hundreds of thousands.
  • Businesses that were young start-ups, to 100 plus years old.
  • Nameless brands that nobody knew, to seeing daily commercials for my employer on the tv.

Through those roles, I’ve been an individual contributor, a team leader, to vice president with 130+ team members. Not just progressing up a traditional career ladder, but frequently leaving management roles to jump back into an individual contributor.

I’m fortunate, that through all these facets; I’ve learned more about IT, business and myself. I hope that this diversity has improved my solution designs, made me a better employee, a better leader, and maybe even, a better person. I’ve strived in all of these to see how things were interconnected and in doing so learned that all those difference facets are not just connected, but important in of themselves. Being a VP was no more valuable than an administrator. It’s pointless to build a datacenter if there aren’t programmers to build applications to run in is; though a programmer won’t get anything done without a laptop built by a admin. None of IT at all matters without a business to contribute to. That, because all these elements are so connected, each is a critical portion of making the whole successful, and every aspect deserves respect.

Over the years, I often felt that building this experience was leading me to a specific role. At times I thought that destination was solution architecture, having this broad experience did indeed help my designs be more holistic. Later I felt is was preparing me for large leadership roles and I hope it did as I could relate and mentor everyone on my team. Though, continually I’m reminded of that first goal; the it’s not about the destination, but the journey itself. That when I start feeling like I have enough diversity is when I need to switch it up and find something new.

So, what’s missing from above, why the new role and company? Well, I’ve never been on the sales side. So today I’m adding that to this list, learning new aspects of IT. As well, while I’ve worked for a vendor, they were software only vendors. Now with EMC (and soon Dell EMC) I’ll learn more about being a full solution vendor (hardware, software & services).

I look forward to learning a new facet, to bringing my experience to the role; and further learning how we all interconnect. I’m excited as this opportunity also allows more insight into more IT shops and business models; every one is difference, has it’s own challenges and opportunities to learn from. I look forward to trying to explain to my children this new adventure, as they always live up to Dirk’s belief and ask terrific questions with unknowing clarity.

I’d encourage you to look to the holistic side of your own role; how does what you do related with those around you? If you’re comfortable, maybe you shouldn’t be. If you think you’re an expert in your field; maybe it’s time to change fields a bit. When is the last time you asked peer in a different department for a perspective on how what you do is connected to them; or asked a child even for their perspective?

By | June 20th, 2016|EMC, Opinions|2 Comments

“It’s not the…”

I saw this meme on my social media feed, and it reminded me of my first rule of troubleshooting.

Never, ever, try to prove it’s not your area.

If you’re in IT, you’re familiar with ‘critical’ issues. You might call them SevA, Sev1, TITSUP, outage, all-hands or something else. But we all have them, and we’ve all been involved in some way.

How many times in one of those situations did you hear:
“It’s not the network”
“It’s not the storage”
“It’s not VMware”
“It’s not my code”

What you’re really saying is: “It’s not MY fault”.

Stop thinking this way. Stop trying to prove it’s not your fault or your area, or your systems. Instead, ask yourself how can you solve the problem. How can you make it better. Maybe your area did not create the issue at hand. But if all you’re are trying to do is prove it’s not your responsibility, you’re not actually trying to solve the problem, rather you’re only trying to get out of the situation. It reminds me of childhood neighborhood games yelling “1-2-3 NOT IT”. If everyone is simply trying to be “not it” then the problem will never get solved.

Rather than trying to prove it’s not you, I urge everyone to prove it IS. Why? Well for starters if I keep trying to proof it’s my area, and I work under the assumption it might be… I might find out it actually is. It’s very easy for us to overlook a detail about why our area is part of the problem if all we’re trying to do is prove it’s not.

More than that, it’s a mindset.

The goal should always be to restore the service at any cost; does it matter why it happened during the outage? If during a critical issue I can find a way to improve the area I’m responsible for enough to alleviate the pain, I can help de-escalate the situation enough to restore service, then, get to true root cause.

Everything in IT is related, the components all work together. If I leave the situation after proving it’s not my area, I’m not present in the conversation to help answer questions. We see this result as waiting to get someone back on the phone or in the war room to answer a question, delaying the resolution.

Continuing to stay engaged I will learn more about how my role fits into the larger ecosystem my area supports. With that knowledge, I improve my ability to contribute. Not just to the issue at hand, but to future designs. Plus if I wish for “my area” to grow, a.k.a. get promotions; the more I know about the other areas the better suited I am for a wider set of responsibilities.

Digging deeper and deeper into the tools and metrics I have may help uncover the key to solving the problem. I might be able to find the nugget that helps my peer solve the problem. Tracing network packets can help developers; comparing IOPS volume from before the incident can point to workload changes, leveraging security tools might help find code issues; I’ve witnessed this over and over again.

I have a great real-world example of this I use when talking to teams about critical response practices.

Years ago, we were experiencing an issue where a database server would crash every night during major ETL loads. For days no one could figure out why. The database looked fine, the logs were clean, but the whole server would panic every night. I was not responsible for the operating system up at the time, so I was not involved in the initial troubleshooting effort. But with the problem ongoing the teams who were responsible started reaching out for help.

I offered to take a look. While initially I didn’t see anything of concern, I asked when the issues happened and if I could watch. It was the middle of the night, so I agreed to stay up late with them that night to watch in real time. While the other team looked at the OS and Database monitoring tools; I opened up mine, vCenter, storage, etc. Right before the crash happened, in real-time monitoring mode inside vCenter, I noticed an enormous spike in packets per second at the network layer. We repeated the workload, and the crash and the spike repeated as well.

Why and what was happening? The ETL load was causing a large influx of data over the network, increasing the packets per second. While the 10Gbs bandwidth was not a bottleneck, the virtual network card was an older model (E1000) which in turn was overwhelming the kernel processor usage, confirmed by the Linux admin after I asked him to look at each processor usage statistic individually. The solution was to adjust the virtual nic (to VXNET3), as well enable Receive Side Scaling to spread the network processing workload across multiple cores, avoiding starving the kernel on core 0.

By looking at the tools for my area, we were able to find data that led us down the path to the ultimate cause of the issue and solved it. It wasn’t the vSphere Hypervisor causing the issue, but the monitoring at that level could point to the issue. I could help solve the issue, even though it wasn’t my fault. Because I was trying to help, not just trying to prove it wasn’t my fault.

This story also demonstrates another important point… it often is not anyone’s or any areas fault; but the combination of them. Which means no one team can solve it on their own.

My last point, and maybe the most important personally, is also the easiest to forget. This time, it might not be your area, but next time it might be. When it is, don’t you want your peers there to help you? More over, isn’t it better to solve it together and make it a team problem? It might not be your culture today, but it can be with your help.

These are all the reasons I’ve told my teams “Don’t try to prove it’s NOT your area, try to prove IT IS, because if it is, YOU can fix it, and I need it fixed”.

So if you find yourself saying: “It’s not the <my area>”. Try instead “How can <my area> help?”

By | April 27th, 2016|Opinions, Pet Peeve, Soapbox|0 Comments

IsilonSD – Part 6: Final Thoughts

Now that I’ve spent some time with Isilon SD; how does it compare to my experience with its physical big brother? Where does the software defined version fit?

This post is part of a series covering the EMC Free and Frictionless software products.
Go to the first post for a table of contents.

Overview (TL;DR version)

I’m excited by the entrance of this product into the virtualization space. Isilon is a robust product that can be tuned for multiple use cases and workloads. Even though Isilon has years of product development behind it and currently on it’s eight major software version; the virtual product is technically V1. With any first version, there are some areas to work on; from my limited time with IsilonSD, I believe this is everything it’s physical big brother is in a smaller, virtual package. However, it’s also bringing some of the limitations of its physical past. Limitations to be aware of, but also, limitations I believe EMC will be working to remove in vNext of IsilonSD.

If you ran across this blog because of interest in IsilonSD, I hope you can test the product, either with physical nodes or with the test platform I’ve put together; only with customer testing and feedback can the product mature into what it’s capable of becoming.

Deep Dive (long version)

From running Isilon’s in multiple use-cases and companies, I always wanted the ability to get smaller Isilon models for my branch offices. I’ve worked in environments where we had hundreds of physical locations of varying sizes. Many of these we wanted file solutions in these spokes replicating back to a hub. We wanted a universal solution that applied to the varying size locations; allowing all the spokes to replicate back to the hub. The startup cost for a new Isilon cluster was prohibitive for a smaller site. Leading us to leverage Windows File Servers (an excellent file server solution but that’s a different post) for those situations, bifurcating our file services stack which increased complexity in management, not just of the file storage itself, but ancillary needs like monitoring and backups.

Given I’ve been running a virtualized Isilon simulator for as long as I’ve been purchasing and implementing the Isilon solution; leveraging a virtualized Isilon for these branch office scenarios was always on my wish list. When I heard the rumors an actual virtual product was in the works (vIMSOCOOL) I expected the solution to target this desire. When IsilonSD Edge was released, as I read the documentation, I continued with this expectation. I watched YouTube videos that said this was the exact use-case.

It’s taken actually setting up the product in a lab to understand that IsilonSD Edge is not the product I expected it to be. Personally, though the solution by it’s nature is ‘software defined’ as it includes no hardware; it doesn’t quite fit the definition I’ve come to believe SD stands for. This is less a virtual Isilon, or software defined Isilon, as it is “bring your own hardware”, IsilonBYOH if you will.

IsilonBYOH is, on its merit, an exciting product and highlights what makes Isilon successful; a great piece of software sitting on commodity servers. This approach is what’s allowed Isilon to become the product is it, supporting a plethora of node types as well as hard drive technologies. You can configure a super fast flash based NFS NAS solution to be an ultra high reliable storage solution behind web servers, where you can store the data once and have all nodes have shared access. You can leverage the multi-tenancy options to provide mixed storage in a heterogeneous environment, NFS to service servers and CIFS to end users, talking to both LDAP and Active Directory, tiering between node types to maximum performance for newer files and cost for older. You can create a NAS for high-capacity video editing needs; where the current data is on SSD for screaming fast performance, then moving to HDD when the project is complete. You can even create archive class storage array with cloud competitive pricing to store aged data, knowing you can easily scale, support multiple host types and if ever needed, incorporate faster nodes to increase performance.

With this new product, you can now start even smaller, purchasing your own hardware, running on your own network, and still leverage the same management and monitoring tools, even the same remote support. Plus you can replicate it just the same, including to traditional Isilon appliances.

However, to me, leveraging IsilonSD Edge does call for purchasing hardware, not simply adding this to your existing vSphere cluster and capturing unused storage. IsilonSD Edge, while running on vSphere, requires, locally attached, independent hard drives. This excludes leveraging VSAN, which means no VxRail (and all the competitive HCIA), it also means no ROBO hardware such as Dell VRTX (and all the similar competitive offerings), in fact just having RAID excludes you from using IsilonSD. These hardware requirements, specifically the dedicated disks; turns into limitations. Unless you’re in the position to dedicate three servers, which you’ll likely need to buy new to meet the requirements; you’re probably not putting this out in your remote/branch offices; even though that’s the goal of the ‘Edge’ part of the name.

When you buy those new nodes; you’d probably go ahead and leverage solid state drives; the cost for locally attached SSD SATA is quickly cutting even with traditional hard drives. But understand, IsilonSD Edge will not take advantage of those faster drives like it’s physical incarnation… no metadata caching with the SD version. Nor can the SD version provide any tiering through SmartPools (you can still control the data protection scheme with SmartPools, and obviously you’ll get a speed boost with SSD).

Given all this, the use-cases for IsilonSD Edge get very narrow. With the inability to put IsilonSD Edge on top of ROBO designs, the likelihood of needing to buy new hardware, coupled with the 36TB overall limit of the software defined version of Isilon; I struggle to identify a production scenario that is a good fit. The best case scenario in my mind is purchasing hardware with enough drives to run both IsilonSD and VSAN, side-by-side, on separate drives.; this would require at least nine drives server (more really), so you’re talking some larger machines, and again, a narrow fit.

To me, this product is less about today and more about tomorrow; release one setting the foundation for the future opportunity of virtual Isilon.

What is that opportunity?

For starters, running Isilon SD Edge on VxRail; even deploying it directly through the VxRail marketplace, and by this, I mean running the IsilonSD Edge VMDK files on the VSAN data store.

Before you say the Isilon protection scheme would double-down storage needs on the VSAN model; keep in mind you can configure per VM policies in VSAN. Setting Failure To Tolerate (FTT) of 0 is not recommended, but this is why it exists. Let Isilon provide data protection while playing in the VSAN sandbox. Leverage DRS groups and rules to configure anti-affinity of the Isilon virtual nodes; keeping them on separate hosts. Would VSAN introduce latency as compared to physical disk; quite probably… though in the typical ROBO scenario that’s not the largest concern. I was able to push 120Mbps onto my IsilonSD Edge cluster, and that was with nested ESXi all running on one host.

All of this doesn’t just apply to VxRail, but it’s competitors in the hyper-converged appliance space, as well a wide range of products targeted at small installations. To expand on the small installation scenario, if IsilonSD had lower data protection options like VSAN does to remove the need for six disks per node, or even three nodes; it could fit in smaller situations. Why not trust the RAID protection beneath the VM and still leverage Isilon for the robust NAS features it provides. Meaning run a single-node Isilon, after all, those remote offices are likely providing file services with Windows or Linux VMs, relying on the vSphere HA/DRS for availability, and server RAID (or VSAN) for data loss prevention. The Isilon has a rich feature set outside of just data protection across nodes. Even a single node Isilon with SmartSync back to a mothership has compelling use cases.

On the other side of the spectrum, putting IsilonSD in a public cloud provider, where you don’t control the hardware and storage, has quite a few use-cases. Yes, Isilon has CloudPool technology, this extends an Isilon into public cloud models that provide object storage. But a virtual Isilon running in, say, vCloud Air or VirtuStream, with SynqIQ with your on-premise Isilon opens quite a few doors, such as for those looking at public cloud disaster-recovery-as-a-service solutions. Or moving to the cloud while still having a bunker on-premise for securing your data.

Outside of the need for independent drives, this is, an Isilon, running on vSphere. That’s… awesome! As I mentioned before, this opens some big opportunities should EMC continue down this path. Plus, it’s Free and Frictionless, meaning you can do this exact same testing as I’ve done. If you are an Isilon customer today, GO GET THIS. It’s a great way to test out changes, upgrades, command line scripts, etc.

If you are running the Free and Frictionless version, outside of the 36TB and six node limit, you also do NOT get licenses for SynqIQ, SmartLock or CloudPools.

I’ll say, given I went down this road from my excitement about Free and Frictionless; these missing licenses are a little disappointing. I’ve run SyncIQ and SmartLock, two great features and was looking forward to testing them, and having them handy to help answer questions I get when talking about Isilon.

CloudPools, while I have not run, is something that I’ve been incredibly excited about for years leading up to its release, so I’ll admit I wish it were in the Free and Frictionless version, if only a small amount of storage to play with.

Wrapping up, there are countless IT organizations out there; I’ve never met one that wasn’t unique, even with some areas I’d like to see improved with this product, undoubtedly IsilonSD Edge will apply to quite a few shops. In fact, I’ve heard some customers were asking for a BYOH Isilon approach; so maybe this is really for them (which if so, the 36TB seems limiting). If you’re looking at IsilonSD Edge, I’d love to hear why; maybe I missed something (certainly I have). Reach out, or use the comments.

If you are looking into IsilonSD Edge, outside of the drive/node requirements; some things to be aware of that caught my eye.

While the FAQs state you can run other virtual machines on the same hosts; I would advise against it. If you had enough physical drives to split them between IsilonSD and VSAN, it could be done. You could also use NFS, ISCSI or Fibre Channel for data stores; but this is overly complex and in all likelihood, more expensive than simply having dedicated hardware for IsilonSD Edge (or really, just buying the physical Isilon product). But given the data stores used by the IsilonSD Edge nodes are unprotected, putting a VM on them means you are just asking for the drive to fail, and to lose that VM.

Because you are dedicating physical drives to a virtual machine, you cannot vMotion the IsilonSD virtual nodes. This means you cannot leverage DRS (Dynamic Resource Scheduler), which in turn means you cannot leverage vSphere Update Manager to automatically patch the hosts (as it relies on moving workloads around during maintenance).

The IsilonSD virtual nodes do NOT have VMware tools. Meaning you cannot shut down the virtual machines from inside vSphere (for patching or otherwise), rather you’ll need to enter the OneFS administrator CLI, shut down the Isilon node; then go and perform ESX host maintenance. If you have reporting in place to ensure your virtual machines have VMtools installed, running and at the supported version (something I highly recommend) you’ll need to adjust this. Other systems that leverage VMtools; such as Infrastructure Navigator, will not work either.

I might be overlooking something (I hope so) but I cannot find a way to expand the storage on an existing node. In my testing scenario, I built the minimal configuration of six data drives of a measly 64GB each. I could not figure out how to increase this space, which is something we’re all accustomed to on vSphere (in fact quickly growing VMs resources is a cornerstone of virtualization). I can increase the overall capacity by increasing nodes, but this requires additional ESX hosts. If this is true, again the idea of using ‘unclaimed capacity’ for IsilonSD Edge is marginalized.

IsilonSD wants nodes in a pool to be configured the same, specifically with the same size and amount of drives. This is understandable as it spreads data across all the drives in the pool equally. However, this lessens the value of ‘capturing unused capacity’. Aside from the unprotected storage point; if you were to have free storage on drives, your ability to deploy IsilonSD will be constrained to the lowest free space volume, as all the VMDK files (virtual drives) have to be the same. Even if you had twenty-one independent disks across three nodes, if just one of them was smaller than the rest, that free space dictates the size unit you can configure.

Even though I’m not quite sure where this new product fits or what problem it solves; that’s true of many products when they first release. It’s quite possible this will open new doors no one knew were closed and if nothing else; I’m ecstatic EMC is pursuing making a virtual version of the product; after all this is just version 1… what would you want in version 2? Respond in the comments!

By | April 4th, 2016|EMC, Home Lab, Opinions, Storage|2 Comments