The Nasdaq Breach Illustrates the Need for Continuous Monitoring

Dear Nasdaq, call me.  I am here to help.

The Wall Street Journal reported late Friday that Nasdaq had discovered that they had been hacked.  The hackers never made it to the trading programs, but instead infiltrated an area of Nasdaq called Directors Desk where directors of publicly traded firms share documents about board meetings.

What caught my eye was the following quote from the AP story filed about the attack: “…Nasdaq OMX detected “suspicious files” during a regular security scan on U.S. servers unrelated to its trading systems and determined that Directors Desk was potentially affected.”

People, people, people.  You have got to get on the continuous scanning bandwagon.  Seriously.

Connect the dots.  The story says that “the hackers broke into the service repeatedly over more than a year”.  Notice that the scans that found the suspicious files were “regular” meaning periodic.  Monthly? Quarterly?  How many of these regular scans were run before the activity was discovered.  I understand the need for network based, agentless scans.  I also know their limits, and deep down inside in a place most IT security people don’t want to admit, so do you.  “Regular” is not continuous.

Don’t stop yet, because the story says that the scan determined that the systems were “potentially affected”.  The diagnosis was partial because agentless scans, even credentialed scans, only get part of the story and therefore can only point out “potential” exploitation.

I have zero data about the actual attack and therefore am speaking in general terms.  But I am confident that a granular, continuous scanning tool should have been able to detect enough anomalous and exceptional artifacts on the Nasdaq servers to spot an attack like this.  The story says that suspicious files were ultimately discovered, so we know that there were persistent artifacts created by the attack.

This is a prime example of why you must have continuous, granular monitoring of endpoints and servers.  Periodic scans, while effective, leave too many blind spots.  A continuous scanning tool should have fond the artifacts.  And if the tool used change detection like Triumfant, it would have flagged the files as anomalous at a minimum within 24 hours of the attack.

Don’t throw the shield argument at me here.  These attacks went on for over a year.  Triumfant would have spotted the artifacts in 24 hours or less.  If you can’t see that difference and want to live the lie of the perfect shield, you are on the wrong blog.  In fact, if those files triggered our continuous scan that looks for malicious actions (an autostart mechanism, opening a port, etc.), Triumfant would have flagged the files within 60 seconds.

Regardless of which of our continuous scans would have detected the incident, Triumfant would have performed a deep analysis of the files and been able to show any changes to the affected machine that were associated with the placement of the suspicious files on the machine.  You likely could have deleted the word “potentially” from the conversation almost immediately.  I would also add that we would have built a remediation to end the attack.

Strong words for someone who has no details?  Perhaps.  But I would bet the farm that we would have found this attack in less than a year.

I don’t understand where we have arrived in regards to why organizations don’t implement continuous scanning.  Innovative solutions like Triumfant get throttled by old predispositions and the disconnect between IT security and the operations people who manage the servers and endpoints.  The security teams are forced to use agentless tools because the ops people refuse to consider a new agent, even if that agent is unobtrusive and allows them to remove other agents in a trade of functionality.  As a result, the IT security people to protect machines with periodic scans that cannot possible see the detail available when an agent is used.

Machines get hacked, the organization is placed at risk, countless hours and dollars are spent investigating the problem and then more hours and dollars are spent putting useless spackle over the cracks.  This is worth dismissing even the consideration of an agent?

Let me put it a different way.  We allows users to run whatever they want on endpoint machines, yet block IT security from deploying granular, continuous scanning tools that can actually detect attacks such as the one we see in Nasdaq.

What am I missing here?

Dear Nasdaq, call me.  Don’t rinse, repeat and be in the WSJ again.  I can help.  Promise.

Triumfant and Situational Awareness – The Google Model

I have written in this blog that while Triumfant is useful, innovative technology, I often struggle to come up with word pictures or analogies that help others grasp how useful and innovative it really is.  Thankfully, we employ lots of smart people and one of our developers came up with what I think is an exceptional analogy.

Because Triumfant assumes nothing, it scans just about every persistent attribute on every machine in the endpoint population and sends this to the server for analysis.  Since the majority of the state data on each machine rarely changes, after the first snapshot is collected the Triumfant agent performs change data capture and only sends changes up the wire for subsequent scans.  This is, of course, the proven, prudent and efficient way to monitor large amounts of data that is predominantly static.  Otherwise, you end up moving large answer sets across the wire needlessly.  The data is available at the server level in a repository to power all forms of situational awareness.

The analogy suggested by our developer is the Google approach.  Google does not know what questions will be asked of its search engine, so it uses crawlers to traverse the World Wide Web to collect data and store it in anticipation of any question.  Google puts that raw data through correlation and pattern matching algorithms to further speed the search process.  The logic is simple – a search against the open Internet would be grossly inefficient and utterly preposterous.  By gathering the data before the question is asked, Google actually returns answers while you are asking the question.

Triumfant does essentially the same thing as Google for endpoint state data, because like Google, we do not know the question until it is asked.  Triumfant does not rely on prior knowledge and instead detects malware and configuration problems by monitoring change.  We use our agent to continuously monitor over 200,000 attributes per machine and then collect that data at the server level.  Queries, online views, and data feeds execute against the repository data at the server and require no interaction with the endpoints.  Put this in contrast to other tools that have to get the data from the endpoint for every question asked.

Triumfant’s repository can be queried directly and a report produced in hours (more likely minutes but I don’t like to show off).  You would know almost immediately how many machines have the new vulnerability and therefore be able to assess the risk to your organization.  It would not matter what machines are connected at that time nor would it impact the network or the endpoints.   Why?  Because like Google, the hard work of gathering and processing the raw data is done and the data readily available.  Best of all, the Triumfant agent performs its continuous monitoring unobtrusively and efficiently, and only sends back changes across the wire once a day.  You get faster access to the data with no impact to the endpoints or your network.

With other tools, you would either have to initiate an agentless scan of the machines to collect the required information, or push some new query or script to the endpoint agents for execution.  Either way, this activity places a burden on the endpoint and on the network as potentially large answer sets are returned across the wire.  The necessary data would then be collected in some repository and evaluated over time.  I was recently at a prospect that I would judge to be progressive and perceptive, and that prospect told me that it takes two weeks to identify machines affected by a new vulnerability for a population that is not large by most standards.

One hour versus two weeks.  Impressive.  Most Impressive.

But wait, there is more.  Most vulnerabilities have a short term mitigation strategy that involves setting some registry keys to temporarily disable the vulnerability until a patch is created and applied.  With Triumfant, a simple policy can enforce the temporary fix and applied in less than 24 hours.  Since there is likely no signature for an attack that quickly moves to leverage the new vulnerability, Triumfant will see those attacks and build a remediation to stop the attack.  Triumfant sees the new vulnerability, effectively closes the vulnerability, and detects anything that attempted to exploit the vulnerability.

The concept of accessing the central repository rather than continuously interrogating the endpoint machines works for all forms of situational awareness, business intelligence, data mining and analysis, and external feeds.  For example, Triumfant stores SCAP attributes for CCEs, CPEs and CVEs in the repository, so when the organization wants to build a CyberScope (Triumfant is a certified CyberScope provider) feed it does so from the repository without intrusion on the endpoint or consumption of network bandwidth.

So there you go.  Triumfant is like a web crawling search engine for the state data of your endpoint population.  The data is there so you can ask questions and get the situational awareness your organization needs to keep pace.  Gartner and other have been talking with increasing frequency about the importance of situational awareness and Enterprise Security Intelligence. I cannot think of a more more efficient and detailed source for endpoint state data than Triumfant.

The Yin and Yang of Triumfant – Agent Based Precision With Network Level Analytical Context

Yesterday I was in a conversation with Dave
Hooks, our CTO, and a very smart person from the intelligence community, and, as often happens when I engage with people smarter than myself, I had an epiphany:

Triumfant provides agent level precision, with network level analytical context.

There is a set of trade-offs when working with endpoint security tools based on their perspective and architecture.  Agent based solutions allow for monitoring at very granular levels, but there are limitations to the amount of analysis they can perform.  That is because when the analysis only happens in the context of the machine, the lack of broader context creates far too many false positives to make the analytic processes effective.  In most tools, the agent uses prior knowledge to detect, remediate or both, resulting in the need to continuously update the prior knowledge on the agent, creating a network and administrative burden.

In contrast, a server-based agentless tool trades a lack of intrusiveness with a lack of precision.  Even the most efficient scanning tools using credentialed scans cannot see the levels of detail needed to be absolutely sure about many potential problems, whether it be malicious activity or vulnerabilities or compliance.  For example, a credentialed scan can point out machines that may have a specified vulnerability, while Triumfant can probe deeply to say without question if a given machine has a vulnerability.  Agentless scanning also tends to gather large answer sets, which places a burden on the network.

Which leads me to my epiphany – Triumfant’s approach provides the best of both worlds while eliminating the drawbacks of each.  Triumfant has achieved harmonic balance between what appear to be opposing forces – a true Yin/Yang relationship.

The Triumfant agent performs continuous scanning at a level of precision that I have not seen in any other tool – over 200,000 attributes per machine.  The agent recognizes changes and sends only changes to the Triumfant server for analysis, minimizing network burden through an effective application of change data capture.  The agent uses no prior knowledge, and therefore requires no regular updates of signature files or remediation scripts.  No network impact outbound, very low network impact inbound.

Triumfant performs the analysis of the detailed data collected at the machine level on the Triumfant server, empowering Triumfant’s analytics to view changes in the context of the broader population, driving analytical accuracy and eliminating false positives.  The context also empowers Triumfant’s patent pending donor technology that uses the population as a donor pool to build remediations that address missing and corrupted attributes.  When a new attack is identified, the context allows for investigation of broader attack patterns which will ultimately provide the IT security team the information they need to proactively protect the organization from other similar attacks.

The context that I speak of in the previous paragraph is unique to Triumfant and is at the heart of our patents.  The context takes the detailed attribute data collected by the agent and builds a normative, rule-based model of the endpoint population.  Again the Yin/Yang relationship is manifested: the context thrives because of the detail provided by the agent, but logically and logistically can only be implemented at the server level.

By using the agent to do what it does best, and using the server to perform the heavy lifting of analysis, Triumfant captures the best of both worlds.  The agent is extremely unobtrusive and efficient, and requires near-zero maintenance.  Using change detection means that you can assume nothing, and must therefore monitor everything, which would be impossible to do efficiently and accurately without an agent.  Equally impossible is the task of making sense of detected changes without a broader context.  That is why performing the analysis at the server level is critical.  It is important to note that the analysis is only as good as the data provided, and the server’s analysis would not have the depth and accuracy it generates without the granular data that could only be obtained through the agent.

So there you have my epiphany – Triumfant harnesses the data collection power of an agent based approach with the analytical power and contextual perspective of a server based approach.  Triumfant uses the power of each to neutralize the weaknesses of the other to create a solution that is unique and certainly powerful.  We can detect, analyze and assess the impact of changes to identify malicious attacks that evade other defenses, and build a contextual remediation to repair that attack.  We can continuously enforce security policies and configurations.  And we can provide deep insight into the endpoint population.

Defense in Depth – There is No Perfect Shield

Everyone wants the perfect shield for their endpoint population.  All malware should be detected and blocked before it has a chance to do anything bad to any given machine.  Nothing less is acceptable.

Not going to happen.  Sorry.  Truly I am.   See “Why Bad Things Happen to Good Endpoints” and “It is Raining and You Will Get Wet

Defense is always playing catch up.  Always has been, always will be.  Today’s stellar defense is one offensive innovation from being compromised.  It is the nature of the game and examples abound.

A failed defense in depth strategyMy family spent spring break in London and Paris and saw all manner of personal armor that was quite effective – until the crossbow was perfected.  In the 19th century, the best and brightest were trained as military engineers because the construction of earth works was critical to defending fortified positions against cannon fire – until the airplane arrived and munitions could be delivered from directly above a position.

The gap does not always come from leaps of technology or sophistication.  When the U.S. forces entered Iraq it was the improvised explosive device (IED) – crude, homemade weapons – that forced the need to retrofit our advanced vehicles with additional armor.  Statistics abound how major threats (Conficker) were based on simple vulnerabilities that had been identified six months or more before their use.

Today we in IT security chase the same elusive goal and ignore the obvious: there will always be gaps and stuff will always get through.   It is time that government agencies and businesses come to terms with the inevitable and think about technologies that can help them detect what does make it through their defenses instead of continuously chasing the promise of the perfect shield.

The adversary is tirelessly creating new attacks that evade existing defenses.  Sometimes those attacks evade detection for weeks and even months.  And when they are detected, there is lag between when the attack is analyzed, a protection built, and the protection deployed.  During that gap organizations are at risk.  And given that so much of the detection tools still rely on previous knowledge of an attack to see the attack, organizations are often left unaware that they were breached, much less empowered to fight back.

Stuff will get through.  Any vendor or expert that tells you otherwise is not being honest.  There is nothing wrong with seeking protection from attacks, but you are putting your organization at risk if you do not have something in place when the inevitable happens.  It also makes sense that a new approach is needed, because if the attack got through it follows that the normal protection techniques have been evaded.

Change detection has long been viewed as the right approach for detecting attacks that make it to a machine.  The logic is simple – unless the attack can enter the machine, start itself and perform its malicious activity without changing the machine, change detection is an effective triggering mechanism for analysis and ultimately identifying the attack.

Triumfant can not only detect and analyze these attacks, it will correlate changes so you can see the full extent – primary and secondary artifacts – of the attack and will even build a remediation that is contextual to that attack on that specific machine.  It can take what it learns and recognize subsequent attacks, or if the attack morphs it will still see it based on the changes.

One of the most downloaded blog entries was called “Antivirus Detection Rates – It Is Clear You Need a Plan B”.  The more I think about the title, the more I realize I was wrong: having a tool in place that will detect what passes through your shields is a Plan A item and must be part of any defense in depth strategy.  Stuff will get through, and you need some form of detection capability when all of the shields fail.

Security Fails of 2009 – The Heartland Payment Systems Breach

This is the fifth in the series of Security Fails of 2009.  As 2009 draws to a close I think no one would argue that this has been an extremely eventful year for IT security.  While others will soon be trotting out their “best of 2009” lists, I thought I would instead visit some of the prominent fails of 2009. 

In January of 2009, it was disclosed that Heartland Payment Systems had experienced an intrusion into their computers that may have compromised over 100 million customer records.  After the dust settled, the breach was found to involve 130 million customer records, pushing this breach well past the previous record represented by the 2007 TJX breach that compromised 94 million records.  Heartland processes 100 million payment card transactions per month for 175,000 merchants.

By December the attack was traced to admitted TJX intruder Albert Gonzalez who eventually entered into a plea agreement on the Heartland breach and additional charges that he hacked into Hannaford Brothers, 7-Eleven and two other unnamed national retailers.  Heartland has allocated $12.6M for the clean-up, and as of today Heartland was still settling with American Express ($3.6M) and resolving other class action suits.

The scope of the breach re-energized conversations about the efficacy of the PCI standards and the general state of fraud protection for card based transactions.  The dialogue became more interesting when Heartland CEO Robert Carr did an interview with Bill Brenner of CSO Magazine where Carr laid the blame squarely on the audits done by their Qualified Security Assessors (QSAs).  Carr’s comments were viewed by many in the security community as “disingenuous” as most believe that the source of the breach could have been eliminated if Heartland had applied some generally accepted security controls. 

PCI has long been an industry hot button, and the Heartland attack was illustrative of the issues at hand.  Heartland appeared to be in full compliance with the PCI standards, but was attacked by essentially a “garden variety” SQL injection.  In an interesting twist, Heartland’s traditional signature based tools missed the attack, but the attackers actually used antivirus software to cover their tracks and avoid detection. 

So what are the lessons learned?  Heartland demonstrates that even the most sophisticated companies in regards to IT security are still far too reliant on signature based tools and must look to new and evolved technologies to close security gaps that allow long known vectors such as SQL injection to breach their perimeters.  Heartland is also a great “exhibit A” that compliance does not equal security; it is only a temporary measure that certain standards were in place at a point in time.  Finally, in spite of calls to action to rid the card processing industry of fraud, there is not much evidence that anything other than rhetoric came from the attack, so we can fully expect to see another Heartland in 2010.

Why Bad Things Happen to Good Endpoints

I was with a prospect the other day and was asked what, at least for me, was a very thought provoking question.  We were discussing the two major areas of application for Triumfant – continuous enforcement of security configurations and real-time malware detection and remediation – and he asked why you would need the latter if the former was done properly.  In other words, if all of my endpoint protections are in place and everything is properly configured, why am I still getting attacked?

Simple and logical question, right?  But it led me to think long and hard why attacks happen at a very elemental level.  We in security face this question from the powers that be because they cannot understand that attacks still come even though we have added multiple layers of defense. 

After consideration I came up with three reasons.  For perspective, my reasons are very much endpoint centric and presume the attacks have already made their way through protections on the network level, so this is not a cloud to ground holistic view.  Each reason is based on the assumption that the preceding reason(s) have been fully addressed and the represented gap is closed – each reason stands on its own as a problem.  And I will resist the urge to plug in how Triumfant addresses each gap, but I have noted blog entries that do if you care to read on.

Here are my three reasons:

  1. Attacks get through because the machines and the protection software deployed to protect them are not configured to be secure.  The analogy is simple: the most well designed and secure deadbolt lock only secures a door when the deadbolt is engaged.  Too frequently, endpoint protection tools are either improperly installed or improperly configured to perform the tasks for which they are intended, so attacks make it through.  For how Triumfant addresses the configuration management gap see “A New Approach to Configuration Management”.
  2. Attacks get through because traditional endpoint protection tools miss known attacks even when there is a signature for that attack and the protection is properly configured.  The failure rate depends on whose statistics you chose to use, but Gartner puts the detection failure rate at two to ten percent while other studies show failure rates exceeding fifty percent.  Given there will be well over 5M signatures by the end of 2009, ten percent is non-trivial.  See “Antivirus Detection Rates – It is Clear You Need a Plan B”.
  3. Attacks get through because they have been carefully designed and engineered to evade traditional endpoint protections.  These include zero day attacks, rootkits, targeted attacks and the work of the maliciously intended insider.  Zero day attacks are more generic in nature and broker on the fact that most tools require prior knowledge to detect an attack.  Targeted attacks are designed to specifically infiltrate networks and applications to retrieve sensitive information or financial data.  See “It is Raining and You Will Get Wet”.

I am not saying this is groundbreaking thinking here, but if you put things into this perspective, it clearly defines the gaps in protection and subsequently provides a roadmap of what must be done to protect your endpoints.  Reducing the attack surface is clearly not enough.  Antivirus is not getting it done – even the AV vendors say so.  And the bad guys are relentless in their pursuit to exploit any crack in the defenses. 

So what do you think? Too simple or simply brilliant?

A Practical Primer on Triumfant – the ActiveX IE Exploit

In his blog The Last Watchdog, Byron Acohido discusses the recent zero day attacks that exploit a flaw in the video Active X component of the Internet Explorer browser. Acohido goes on to discuss why Microsoft may not have a patch ready in time for the next Patch Tuesday on July 14.   The exploits and associated problems described by Acohido are a perfect context for a very practical primer on what Triumfant can do for an organization.

First, we would detect the zero days that exploit the flaw, including the two attacks described that use a Trojan downloader and a rootkit. No signature required.

But of course we do not stop at detection. Triumfant Resolution Manager will build a remediation and remove the detected attacks. This includes ejecting the rootkit attack and cleaning up the various hooks it established, and repairing all of the collateral damage made by the Trojan downloader to configure the machine for subsequent incursions as described in the post. No humans needed to write the script, no re-imaging required.

Third, it would be a simple task to build a policy in Resolution Manager that would address the registry changes Microsoft has recommended as a stopgap for the problem until a patch is issued. The policy would be enforced on all machines and the organization would get an up-to-date report on what machines had been updated and what machines were still vulnerable until a patch is created. Given the length of time Acohido describes for Microsoft to build a patch and the well known time gaps in organization’s distributing the patch, the action by Triumfant would protect machines for the weeks and even months until the patch was in place.

This is not meant to be a sales pitch – this is a perfect and very practical example of how the unique functionality and capability of Triumfant would step into a gap not currently filled by any other product that I (or any industry expert or analyst or writer) am aware. As a new technology it is sometimes hard for people to get their heads around what Resolution Manager can do and the benefit it delivers. And exploits like this ActiveX IE exploit show up on an all-too-frequent basis.