Embracing the “Presumption of Breach” Doctrine With Rapid Detection and Response

I came across a term last week in a very good article about the virus attack on the USAF Drone command systems (“Dronegate: the First Casualty is our CyberSecurity Paradigm” by CTOVision.com).  The term was simply “Presumption of Breach”, and for me it really summarized the doctrine that organizations and government agencies must adopt in the face of todays IT security environment.  The doctrine is simple: You must assume you have been breached, have tools in place to detect those breaches that evade your shields, and have a plan to respond when such breaches are detected.  I call that Rapid Detection and Response.

The first step in the process – assuming you have been breached – sounds simple, but for many organizations it is the hardest party of adopting the “Presumption of Breach” doctrine.  It is far more comforting to have 100% faith that your shields will protect your systems without fail and without regard to the attack or attacker.  The emotional component of admitting that you cannot fully protect your IT systems is an interesting topic and one that I plan to expand in a later post.

In spite of the emotional resistance to assuming that you have been breached, all evidence points to it being the cold hard truth.  Many believe organizations now fall into two categories: those who know they have been breached and those who don’t.  Even if you have not been breached, every statistic and simple reason says you will be.

Once you give yourself over to the “presumption of breach” you will need a tool to help you quickly identify when you are breached.  Here is where I must make the disclaimer that Triumfant is such a tool, so I will have a bias toward the Triumfant approach and capabilities.  Now that my bias is fully exposed I can also say that I have not yet encountered another tool better equipped for rapid detection and response.

Why a separate tool and not some extension of your shield solutions? First, if your shields could have detected the attack they would have prevented the attack. Put another way, the attack happened because it evaded your defenses, so your defenses are obviously not able to perform the detection.  Second, it is always good to have checks and balances by not relying completely on one tool or vendor.  Think about it – how motivated is a shield vendor to provide you a tool that tells you when those shields did not do their job.

The detection tool must work rapidly and be comprehensive in its discovery and analysis of the attack.  Rapid detection enables the organization to contain the damage caused by a long-term infiltration.  The Verizon Business 2011 Data Breach Investigations Report (published May 2011) noted that 60% of the breaches studied in the report went undiscovered for over a month or more.

Comprehensive analysis is necessary to provide the breadth of actionable data needed to respond to the attack.  The recent virus attack on the systems associated with the USAF drone fleet illustrated the problem when attempts to kill the virus were unsuccessful for two weeks or more.  Today’s malware is designed to persist – to survive.  If you just kill the malicious executable, chances are there is a persistence mechanism that will simply resurrect the malware in another place in the machine.  Detection software that does not detect the attack and all of the associated change/damage to the machine will hamper your response and leave the organization at risk.  The same is true for solutions that used pre-written, generic remediations – a one-size-fits-all approach will undoubtedly leave dangerous artifacts.

(Triumfant uses change detection coupled to patented analytics to identify attacks and correlate all of the changes to the victim machine associated with the attack.  This provides a complete picture of the primary and associated collateral damage, and allows Triumfant to build a remediation specific to the attack that repairs all of the damage to the machine.  This includes persistence mechanisms)

Lastly you have to have a plan to respond to these breaches.  Your rapid detection and response tool should have the ability to learn about the attack and use that knowledge to look for other infiltrations throughout your network.  You should have processes in place to correlate the attack data to firewall logs and other security data (perhaps via a SIEM tool) to help identify the source of the attack and ways to block it at the shield level.  You also need to establish reporting channels to make the appropriate people aware of the breach in the event that it becomes public or cause an interruption in services to stakeholders or customers.  In other words, do the opposite of how Sony handled their PS3 breaches.

Putting the “Presumption of Breach” doctrine into practice is not an admission of failure or some IT security nihilism.  It is a sound and pragmatic recognition of the environment in which we operate.  It also means that your organization faces the inevitable prepared with a plan to minimize the impact of any attacks that gets past your shields.

Defense in Depth – There is No Perfect Shield

Everyone wants the perfect shield for their endpoint population.  All malware should be detected and blocked before it has a chance to do anything bad to any given machine.  Nothing less is acceptable.

Not going to happen.  Sorry.  Truly I am.   See “Why Bad Things Happen to Good Endpoints” and “It is Raining and You Will Get Wet

Defense is always playing catch up.  Always has been, always will be.  Today’s stellar defense is one offensive innovation from being compromised.  It is the nature of the game and examples abound.

A failed defense in depth strategyMy family spent spring break in London and Paris and saw all manner of personal armor that was quite effective – until the crossbow was perfected.  In the 19th century, the best and brightest were trained as military engineers because the construction of earth works was critical to defending fortified positions against cannon fire – until the airplane arrived and munitions could be delivered from directly above a position.

The gap does not always come from leaps of technology or sophistication.  When the U.S. forces entered Iraq it was the improvised explosive device (IED) – crude, homemade weapons – that forced the need to retrofit our advanced vehicles with additional armor.  Statistics abound how major threats (Conficker) were based on simple vulnerabilities that had been identified six months or more before their use.

Today we in IT security chase the same elusive goal and ignore the obvious: there will always be gaps and stuff will always get through.   It is time that government agencies and businesses come to terms with the inevitable and think about technologies that can help them detect what does make it through their defenses instead of continuously chasing the promise of the perfect shield.

The adversary is tirelessly creating new attacks that evade existing defenses.  Sometimes those attacks evade detection for weeks and even months.  And when they are detected, there is lag between when the attack is analyzed, a protection built, and the protection deployed.  During that gap organizations are at risk.  And given that so much of the detection tools still rely on previous knowledge of an attack to see the attack, organizations are often left unaware that they were breached, much less empowered to fight back.

Stuff will get through.  Any vendor or expert that tells you otherwise is not being honest.  There is nothing wrong with seeking protection from attacks, but you are putting your organization at risk if you do not have something in place when the inevitable happens.  It also makes sense that a new approach is needed, because if the attack got through it follows that the normal protection techniques have been evaded.

Change detection has long been viewed as the right approach for detecting attacks that make it to a machine.  The logic is simple – unless the attack can enter the machine, start itself and perform its malicious activity without changing the machine, change detection is an effective triggering mechanism for analysis and ultimately identifying the attack.

Triumfant can not only detect and analyze these attacks, it will correlate changes so you can see the full extent – primary and secondary artifacts – of the attack and will even build a remediation that is contextual to that attack on that specific machine.  It can take what it learns and recognize subsequent attacks, or if the attack morphs it will still see it based on the changes.

One of the most downloaded blog entries was called “Antivirus Detection Rates – It Is Clear You Need a Plan B”.  The more I think about the title, the more I realize I was wrong: having a tool in place that will detect what passes through your shields is a Plan A item and must be part of any defense in depth strategy.  Stuff will get through, and you need some form of detection capability when all of the shields fail.

Why Bad Things Happen to Good Endpoints

I was with a prospect the other day and was asked what, at least for me, was a very thought provoking question.  We were discussing the two major areas of application for Triumfant – continuous enforcement of security configurations and real-time malware detection and remediation – and he asked why you would need the latter if the former was done properly.  In other words, if all of my endpoint protections are in place and everything is properly configured, why am I still getting attacked?

Simple and logical question, right?  But it led me to think long and hard why attacks happen at a very elemental level.  We in security face this question from the powers that be because they cannot understand that attacks still come even though we have added multiple layers of defense. 

After consideration I came up with three reasons.  For perspective, my reasons are very much endpoint centric and presume the attacks have already made their way through protections on the network level, so this is not a cloud to ground holistic view.  Each reason is based on the assumption that the preceding reason(s) have been fully addressed and the represented gap is closed – each reason stands on its own as a problem.  And I will resist the urge to plug in how Triumfant addresses each gap, but I have noted blog entries that do if you care to read on.

Here are my three reasons:

  1. Attacks get through because the machines and the protection software deployed to protect them are not configured to be secure.  The analogy is simple: the most well designed and secure deadbolt lock only secures a door when the deadbolt is engaged.  Too frequently, endpoint protection tools are either improperly installed or improperly configured to perform the tasks for which they are intended, so attacks make it through.  For how Triumfant addresses the configuration management gap see “A New Approach to Configuration Management”.
  2. Attacks get through because traditional endpoint protection tools miss known attacks even when there is a signature for that attack and the protection is properly configured.  The failure rate depends on whose statistics you chose to use, but Gartner puts the detection failure rate at two to ten percent while other studies show failure rates exceeding fifty percent.  Given there will be well over 5M signatures by the end of 2009, ten percent is non-trivial.  See “Antivirus Detection Rates – It is Clear You Need a Plan B”.
  3. Attacks get through because they have been carefully designed and engineered to evade traditional endpoint protections.  These include zero day attacks, rootkits, targeted attacks and the work of the maliciously intended insider.  Zero day attacks are more generic in nature and broker on the fact that most tools require prior knowledge to detect an attack.  Targeted attacks are designed to specifically infiltrate networks and applications to retrieve sensitive information or financial data.  See “It is Raining and You Will Get Wet”.

I am not saying this is groundbreaking thinking here, but if you put things into this perspective, it clearly defines the gaps in protection and subsequently provides a roadmap of what must be done to protect your endpoints.  Reducing the attack surface is clearly not enough.  Antivirus is not getting it done – even the AV vendors say so.  And the bad guys are relentless in their pursuit to exploit any crack in the defenses. 

So what do you think? Too simple or simply brilliant?

Triumfant – First Line of Defense or Last Line of Defense (or Both)

I was reading Byron Acohido’s latest post in The Last Watchdog about the new SMB2 zero day vulnerability and it provoked a lot of thinking around how Triumfant is characterized as endpoint protection. Specifically, we get asked if we consider Triumfant a first line of defense or a last line of defense.  Reading Acohido’s post made me realize the answer is “yes”.

In the case of the SMB2 zero day vulnerability there is no patch and no malware has been detected that exploits the vulnerability as of yesterday (9/9/09).  Traditional defenses for the endpoint will have no knowledge of the eventual attacks that will undoubtedly come and will therefore be ineffective in shielding endpoints from the malware.  In this case the traditional defenses offer no defense, so Triumfant is the first line of defense for the endpoint machine.  Because Triumfant uses granular change detection to detect attacks and therefore does not require prior knowledge of the attack, it is uniquely able to protect the machine.  Acohido predicts that the eventual exploit could be a “Conficker-type worm attack” and when it eventually comes, Triumfant will see it and protect the affected machines.

In short, if the incoming attacks is specifically designed to evade detection from traditional endpoint defense or is a zero day (or very early in its lifecycle), then it is as if the traditional defenses are not even there.  So Triumfant becomes the de facto first line of defense.  Add to the list rootkits, the work of maliciously intended insiders and corruption to the software supply chain and you get a lot of vectors where Triumfant is the endpoint protection that first engages the enemy.  I always add the caveat that we do not position Triumfant as a shield – it detects the malware when it gets to the machine. 

In the case of known attacks that the other tools simply miss or the variants that just slip past the signatures or those attacks that get through because the defensive software is improperly configured or deployed, then Triumfant is the last line of defense.   Everything that falls through the nets – and there is a lot of evidence (read here, here and here) that there is plenty that does – and makes it to the endpoint will be detected and remediated by Triumfant.   We have never positioned Triumfant as a replacement for the existing nets, but we do believe that the holes in those nets are plentiful enough that we provide an excellent complement to traditional defenses.

So there you are – Triumfant is both a first and last line of defense.  It simply depends on the context of the attack.  Either way, Triumfant is filling critical and frequent gaps in endpoint protection.  What makes the story even better is that Triumfant remediates what it detects by synthesizing a remediation and restoring the machine to its pre-attack condition in minutes without human intervention. 

Whether it is acting as the tip of the spear or backstop, Triumfant does what no other endpoint protection product can.  So when I answer “yes” to the first line or last line of defense question I am not being glib or sarcastic, just accurate.

If We Know How Breaches Happen, Then Why Aren’t We Doing Something?

In his August 26 blog post on the Securosis Blog called “We Know How Breaches Happen”, Rich Mogull made some very good observations about the cause of data breaches. According to Mogull:

“If we look across all our sources, we see a consistent picture emerging. The vast majority of cybercrime still seems to take advantage of known vulnerabilities that are can be addressed using common practices. The Verizon report certainly calls out unpatched systems, configuration errors, and default passwords as the most common breach sources.”

When I was with Cybertrust, Peter Tippett, one of the early pioneers in antivirus software now with Verizon Business, would make the impassioned case (he still is) that following a relatively small number of essential practices would lower an organization’s risk significantly.  Tippett’s researchers from Cybertrust are at the core of the Verizon team that publish the Verizon Data Breach Investigations Report which notes that “2008 continued a downward trend in attacks that exploit patchable vulnerabilities versus those that exploit configuration weaknesses or functionality.”

In other words, a little discipline in security configuration management would go a long way toward making organizations more secure and eliminating the low hanging fruit used by hackers.  Hackers are people too, and like the rest of us they will take the path of least resistance.  They could choose the difficult path of engineering a new zero day attack. Or they could take the far more simple approach of using an exploit that leverages a common misconfiguration known to exist in a significant number of endpoint machines and build an attack with enough variation to evade the signatures in place for earlier versions of that attack. 

GGGR1You can almost picture a Glengarry Glen Ross hacker’s boiler room full of hackers under quota and a boss telling the crew that “Coffee is for hackers that Hack!”  It is just too simple to spin up an exploit that picks off the multitude of unpatched and misconfigured endpoints.

In a separate post, Mogull talks about the Data Breach Triangle in the context of fire triangle (oxygen, heat, fuel – take away any one of the three and the fire goes out): the sides of the triangle are the three components needed for a breach to occur, so removing any one side prevents the breach.  Good security configuration management should help eliminate the triangle side Mogull calls the exploit, which is the vulnerability or flaw that provides the hacker the path to the data.

Given the obvious linkage between breaches and configuration problems, why hasn’t security configuration management become an essential component of endpoint security strategies?  The answer is steeped in irony given that configuration management has replaced patchable vulnerabilities as the exploit of choice.  Many of the companies that offer security configuration management use what are essentially patch management tools as the technical implementation.  Think of the number of patch management vendors that now offer security configuration management. These solutions essentially push out configurations and remediations the same way that they would push out a patch. 

Why is his ironic? While this is a proven and technically sound approach, patching is somewhat universally recognized as problematic.  It is therefore sensible and logical to ask why we would expect that using the same type of tool and underlying processes would prove to be successful in addressing configuration management.  Patching tends to be a brute force process, while configuration management requires much more flexibility and finesse.  And creating a patch (call it any name the vendor gives it – it is still a patch) is a human resource intensive process that requires someone to write, test and deploy a script.  

It would be fair of the patch management vendors to note that a lack of institutional commitment and sound process are also contributing factors and I would agree.  But the parallel between patch management and configuration management is too obvious to ignore.  Mogull’s observations are backed by numerous studies and analysis that cite unpatched systems and security configurations errors as a problem, so the evidence would indicate that these tools are not getting the job done. 

Clearly a different and more automated approach is appropriate, and in my next post I will tell you how Triumfant addresses this problem.  Specifically, I will detail how Triumfant continuously enforces security configurations by detecting machines that are out of compliance with desired configurations and automatically build a remediation to return the machine to compliance.

What is clear is that we need to address these foundational problems to become more secure and reduce organizational risk.  Perhaps now the tools have caught up with the problem and we can make the road of that hacker a bit more difficult.

Antivirus Detection Rates – It is Clear You Need a Plan B

Someone called to my attention a new white paper by Cyveillance called the “Cyber Intelligence Report, August 2009”.  In this report Cyveillance tested the detection rates of 13 leading antivirus tools by feeding these tools confirmed malicious files in real time.  The test ran from 5/12 through 6/10.  No other details are provided such as how many unique instances of malware were used or how the test platform was configured. 

The detection rates from this test are eye opening.  The average daily detection rate for the 13 AV products ranged from 16% to 44%, with the average of about 29%. Good hit rate if you are a baseball player, but not great for an endpoint protection tool.   

If this test is accurate, then a malicious file has a 2 in 3 change of getting past the AV protection.  There are too few details about the study for me to be comfortable with the numbers, but even if they are off by 50%, we are still talking about a 1 in 2 chance.  Yikes.

So you can look at this one of two ways.  The first is that you need some other form of shield to detect the bad stuff and keep it away from your endpoint machines.  I am quite sure that is the point that Cyveillance is trying to make in this white paper.  The other approach is to come to terms with the fact that no matter how well you protect the endpoint environment, no matter how deep your defense in depth strategy is, malicious stuff will find its way to the machines.  You need some form of Plan B, and this is where Triumfant comes into the story.

Triumfant is not a shield.  But what it will do is see the malicious attacks that make it onto your endpoint machines.  Not only will it detect those attacks, it will analyze them, determine the full scope of the attack, and build a surgical situational remediation on the fly that stops the attacks and fixes all of the collateral damage associated with the attack.  With Triumfant you go from detection to remediation in five minutes or less.  No matter how the malware got there.  No signature required, no human intervention needed. 

Sounds like a solid Plan B to me.

Studies like this and many other all point to one fact – regardless of what endpoint protection you have, things will get through to the endpoint.  Instead of adding lots more layers to the shield, might it be time to look hard at something that can help when the inevitable happens and malware gets through?

What Ultimately Sets Triumfant Resolution Manager Apart: Context

Context.  This word has become clear to me as one of the key differentiators of our product and our ability to detect and ultimately remediate malicious attacks.  This realization comes after two plus weeks of meetings with some of the most seasoned and smart security people I have had the pleasure to interact with since coming into the IT security market.  These were all second and third meetings at a variety of organizations, so we were going much deeper into the inner workings of Resolution Manager to help them really understand how we can do what they were seeing in the demonstrations.  One theme quickly emerged that was key to understanding what we do and how we do it and frankly left these very impressive security people intrigued with our product: Context.

There are other endpoint protection tools on the market that attempt to analyze logs or other information from endpoint machines in an attempt to identify malicious attacks to those machines.   But these attempts all lack what Resolution manager uniquely brings to the table: context.  That is because when you use log files or diffs you are seeing an event in time for just that machine and therefore have no way to put the event into any context. The result is predictable – so many false positives that the solution is not viable.  The same fate is also shared by behavioral analysis and heuristics that, while very sophisticated, only see the story in the context of one machine and never in the context of the endpoint population. 

Let me try to put the subject of context into some, well, context.  A change is detected on an endpoint machine and is captured in a log entry.  Is this a valid change or the results of an attack?  Is this the only machine in that group or even the entire population that is seeing that change, or is it happening to other machines.  If it is happening to other machines is it happening consistently, and in an orderly way? What other changes are related to this change?  What was the collateral damage to the machine associated with the change? Where ports opened, files corrupted or deleted, or security configuration settings altered?  If it is a new application, does that application exist in the endpoint population?  If it does, is the install following the patterns established for other installs?  Is the same set of files being installed and do they all hash to consistent values?

The attacks of today are sophisticated and complex and cannot be properly analyzed by looking at disconnected log entries.  It is context that allows for a proper analysis, and context that uniquely enables Resolution Manager to see every change associated with an attack so it can synthesize a remediation that is complete and restores the machine to its pre-attack state.   So how does Resolution Manager we get this context that others lack?  The answer is in the Adaptive Reference Model that lies at the heart of the solution.

When Resolution Manager is installed, the agent on the endpoint machines scans over 200,000 attributes which include all of the registry keys, an MD5 hash of every file, configuration information, performance data, and just about every other piece of elemental data about the machine.  These attributes are sent to the Resolution Manager server where they are correlated with the attributes from the other machines in the endpoint population in order to build the rules used by the analytics that makes up the “secret sauce” of our solution.  The resulting rule set is an adaptive, multi-dimensional view of the endpoints that is really one-of-a-kind.

It is context.

When I say multi-dimensional, I am speaking to the highly sophisticated grouping and correlation process that occurs automatically within the model.  The analytic engine will group machines by any number of dimensions and apply multiple correlation algorithms to find patterns that will eventually help with the threat assessment and false positive elimination.  For example, it will build a set of rules for of every application it finds and will build a separate rule set for each version of that application.  The rules capture the analysis of such information as the files associated with that specific release, the hash values of those files, and other elements of what is the “normal” for a machine running that version of that application.

You don’t have to tell the model what is normal – it learns it.  The rules sets for each version of each application form a normative whitelist for the endpoint population.  So when a new application is installed the model will know immediately if it has seen this application before.  If it has not, it will create an alert, if it has, it will make sure that it installed according to the rules created from how that application performs on other machines. 

And the model evolves over time.  Changes to the endpoint population are assimilated into the model and rules are updated and reformed accordingly.  The model grows and adapts with the environment.

With the model in place, you now have context. So ultimately when the agent identifies a change on a given machine and sends that change to the server, the change is analyzed in the context of the Adaptive Reference Model.  It analytically asks all of the questions I used as the example earlier and thousands more, leveraging the learned context of the model.  If the analysis determines the change may be part of an attack, the server will actually build a request back to the endpoint for additional information which we call a probe.  The purpose of the probe is to get even more context, as it will perform over twenty different correlation techniques to ensure that all of the changes to the machine associated with the attack have been identified.  It uses context to cluster these changes so any attack can be comprehensively addressed.

Context. No amount of heuristics applied to analysis of log traffic can replace the context of our Adaptive Reference Model.  Same with behavioral analysis and heuristics at the endpoint level.  It is context that makes the most seasoned security people all say they have never seen a product like Resolution Manager and leave them at a minimum intrigued and, from the reactions I can see, impressed.

I plan to supplement this post with additional information about the model – such as how do you set explicit policies – but I am way over my unofficial post word limit so I will stop here.  I hope this gave you a feel for the Adaptive Reference Model and why it is differentiating, but the best way to understand is through a demonstration, which we would be happy to provide.