Twitter Updates

-->

The dangers of narrow subject matter expertise and the case for solution architecture

As technology solutions continue to increase in complexity, organizations often respond by creating teams with deep technical expertise to design, build and maintain their technology assets.  One side effect of deep technical expertise is narrowing breadth of knowledge.  While most IT professionals start their careers with broad technical knowledge (though perhaps not experience), as one’s experience and interest deepens in one particular domain, the breadth of knowledge – by necessity – shrinks.  This side effect is rarely seen as negative; in fact, deep technical expertise is often – and rightly – held in high regard.  

Unfortunately, lack of breadth presents serious risks to the quality of our solutions.  The complexity within technology domains (driving us to create deep technical expertise) also generates complexity in the interfaces between these domains.  For the purposes of this discussion, I’m referring to domains in a high level, coarse-grained sense like application development, application servers, network infrastructure, supporting application components (databases, middleware, and the like), storage solutions, and so on. These domains are individually complex, often to the point where there are sub-specialties within them.  (Point in case: most large organizations have some network engineers who specialize in load balancing while others are experts in network design/engineering.)

Some will argue that while these individual domains are complex, the internal complexity is abstracted from the interfaces to other components thus hiding the “inner workings”.  While this is often a design goal, it is rarely fully realized.  It’s easy to believe this well-meaning but dangerous fallacy, especially since so many IT professionals are “classically educated” as software developers.  Any college educated CompSci or CIS professional learned the importance of object-orientation, interfaces, abstraction, and so on.  Our trust in abstraction is so conditioned that it just feels like it should work in other domains.  It seems logical, but it just doesn’t scale to the breadth of systems and degree of complexity outside of a pure software engineering paradigm.

An exploration of the reasons why abstraction doesn’t scale could be an entire series of articles in its own right, but a cursory treatment may help convince skeptics.  First, abstraction in an object-oriented software engineering context is completely implemented within a single system (like a programming language).  The layer of abstraction and the complexity on either side of that abstraction share common tools, semantics and structure.  This commonality reduces the overall complexity of the solution and the difficulty associated with making the abstraction work.  Purists will argue that commonality doesn’t matter that much.  Evidence that this is not true can be found by comparing the difficulty in integrating two software components both written in Java with reasonable software standards with the difficulty in integrating a software component written in Java with another component written in .NET using web services.  The myriad of “standards” for web services highlight the difference between these scenarios.

Second, abstraction within software engineering is rooted in programming languages that exhibit a high degree of precision with respect to their semantics and syntax and, in comparison to broad IT “solutions” are relatively simple.  Java, for example, has only 50 keywords and a handful of syntax rules that can be used to implement abstraction.  Compare this to the average load-balancing solution, network switching infrastructure, or application server configuration, all of which can be configured in highly variable (and novel) ways.  This difference in complexity makes abstraction a much more difficult task.  In a software engineering world, it’s the difference between abstraction for a simple framework (something like implementing MVC) and abstraction for an operating system’s threading and memory management libraries.

Third, standards and patterns for abstraction in software engineering are well established and commonly understood.  Creating standards and identifying patterns is easier in software because of the previous two points.  Standards and patterns for integrating components from different domains (e.g. making load-balancers, web servers, application servers, and database servers work together) do exist and may be commonly understood, but are not so detailed or so precise that they reduce complexity or completely hide the inner workings of each individual component.

If we can agree that abstraction doesn’t really work between domains and that individual domains are so complex that they require deep technical expertise, we must then acknowledge that the integration of these components is a significant concern in its own right.  This is what architecture – especially solution architecture – is really about.  So called “PowerPoint architecture” or domain specific architecture (provided by software architects, storage solution architects, etc.) is not a substitute for holistic solution architecture that defines how disparate components from different domains will interact.  Make no mistake: PowerPoint architecture and domain-specific architecture have their place. Domain-specific architecture must be part of the solution delivery process.  Unfortunately, it is too often the focus, usually at the expense of good solution architecture.

Solution architects need to balance breadth and depth of technical knowledge to be effective.  This means that not every solution architect need come from a heavy software architecture or software engineering background.  Instead, a good solution architect understands a wide-range of technology domains and experience in putting them together in a variety of settings.  

A classic example of a problem where this kind of solution architecture really makes a difference is in geographically distributed web applications that are transactional and stateful in nature.  The developer or software architect will rely on the application server’s services for managing session state and persistence.  The infrastructure/hosting teams will rely on load-balancing solutions for “session stickiness” to keep a customer “stuck” to a particular web and application server for the duration of the session.  Easy enough, except as soon as you launch the application in production, you start getting reports of customers complaining that they’re losing their sessions, having to “start over” in multi-step transaction flows, or other intermittent, unpredictable behavior.  What happened?

Large ISPs like Comcast or AOL have multiple proxy servers from which a customer’s HTTP session may originate.  During the customer’s session, the ISP may internally load-balance the customer to a different proxy server, causing the source IP address to change.  Your load-balancer session stickiness didn’t account for this, the user got load-balanced to a different web or application server, and the session state couldn’t be rebuilt.  

There are many variations on this theme…  Maybe the load-balancer uses SSL ID, but the ISP’s proxy had a different A record cached for your site, so the user ended up in another data center.  Perhaps your web servers can’t route traffic to the application server that “knows” the customer’s state.  Or you’ve really done your homework and built a global cache to manage state, but the cache didn’t replicate fast enough.  The bottom line is that there are a variety of scenarios in which the interaction between the load-balancing infrastructure, application server configuration, and application code determine the actual customer experience.

This is where a good solution architect will save the day.  Your deep technical SMEs are still invaluable, but detecting the possibility of scenarios like the one above requires breadth of knowledge and thorough understanding of the characteristics the solution must have rather than any individual component.  The need for solution architecture is very real.  Doing it well has a tangible effect on the quality of technology solutions and mitigates the risks from deep technical expertise creating silos of domains.  We cannot get away from the need for our really sharp SMEs, nor should we want to.  However, we must acknowledge that our solutions demand attention to integrating disparate components in increasingly complex ways.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • LinkedIn
  • Slashdot
  • Twitter
  • Reddit