Continuously Monitoring the Haystack (Needle Inventory Report)

In the previous two posts (Part 1 here and Part 2 here) I first shot down the much over-used “Finding a Needle in a Haystack” analogy by showing that the problem facing IT security professionals is far more complex and then defining a new approach that will find the <unknown> in a <ill-defined, shifting maelstrom>.  I closed the second post by adding a “but wait, there is more”, so here it is: what if I told you that the approach I described not only solves the needle in the haystack problem, it also provides an innovative approach to continuous monitoring and situational awareness.  Without changing the core process one iota.

Think of it – this is like inventing cold fusion and finding out that the solution repairs the ozone layer.  Or inventing a beer that tastes great and is less filling.  Or something like that.

To recap, my thesis is that the analogy assumes that you know are looking for a known thing in a well-defined and completely homogenous population, when in fact we do not what we are looking for in most cases and the machine population that doubles as our proverbial haystack is anything but well-defined and completely homogenous.  Therefore, the proper expression of “Finding a Needle in a Haystack” is in fact “Finding an <unknown> in a <ill-defined, shifting maelstrom>”.

I then outlined how you could solve the problem by first building a normalized view of the endpoint population to create a baseline of the ill-defined, shifting maelstrom, effectively giving you a haystack.  Then you could continuously monitor machines for changes under the assumption that any change may be someone introducing something to your haystack.  By analyzing and grouping the changes and using the baseline as context, you could then assess the impact of those changes to the machine and determine if the changes represented a needle (malicious attack) or just more hay.

Now for the continuous monitoring part.  Because I do not know what I am looking for, I have no choice but to scan everything persistent on the host machine.  Since I am building my normalized model on the server, I have to move the raw data to the server and store that data into some form of data repository.  Logic dictates that the repository that I use to power the process of finding needles in the haystack can be used as the data source for all forms of situational awareness and continuous monitoring activities.

It gets better!  I can perform those activities without incurring any additional burden on the endpoints or on the network.  I can ask questions of the repository without the need to collect additional data from the endpoints.  Most solutions use an agent that scans only segments of the data, or use an agentless scan to collect segments of the data.  If either is not currently scanning the data needed, the process has to be altered and repeated.  For example, a new scan script might have to be pushed to the agents.  Furthermore, organizations often run multiple scans using different tools, each dutifully thrumming the endpoints for data and often times collecting much of the same information collected by the scan that ran an hour earlier.

Of course, I must continuously refresh the data in my repository to be accurate.  Luckily, I already thought of that, and I am using my agent-based precision to detect and cache changes on each host machine in a very efficient way.  I then send those changes to the server once per day and use the changes to refresh the repository.  Given I have a large number of data attributes that rarely change, sending only the changes across the wire keeps the network burden to a minimum.  Obviously, I have to move down a full scan when I initially deploy the agent, but for subsequent updates using the change-data-capture approach results in comparatively small answer sets per machine per day.

My result?  I have a complete repository of the granular data for each endpoint machine efficiently collected and managed and available for reporting and analysis.  I can feed my highly sophisticated information portals, scorecarding processes, and advanced analytics.  I can build integrations to feed other systems and applications in my security ecosystem.  I can create automated data feeds such as the feed required to comply to the FISMA CyberScope monthly reporting requirement.  Best of all, this activity does not require me to go back to the endpoint ceaselessly for more information.

I have implemented true continuous monitoring and comprehensive situational awareness without running any incremental data collection processes.  I am continuously scanning every persistent attribute on every machine.  My data collection routine to find the needles in my haystack is the only data collection process required!  You want situational awareness?  From this data I can readily produce reports for application inventories, patch inventories, vulnerabilities, and non-compliance with policies and configurations.  I can tell you the machines that have had a USB key plugged into them in the past week.

I can keep history of my machine scans and show you the detail of the granular elements of what I am scanning.  Think of the impact for investigating incidents on a machine.  You could pull up the snapshot for any given day, or select two dates and generate a summary of the diffs between the two images so you could see exactly what changed on the machine.  .

I am just getting started.  Since I have my normative baseline in place to interpret the changes I detect on each machine, I can also provide reports on anomalous applications and other activity that is exceptional.

To recap, the data collection processes required to implement my approach to finding the <unknown> in a <ill-defined, shifting maelstrom> provides true continuous monitoring and broad situational awareness.  Or you could say that my approach is a continuous monitoring solution that can identify the<unknown> in a <ill-defined, shifting maelstrom>.  Either way, what results is an accurate picture of my <ill-defined, shifting maelstrom> without having to run any additional scans or data collection, so I get the benefits without incremental burden to the endpoint machines or the network.

But wait, there is more.

Triumfant and Situational Awareness – The Google Model

I have written in this blog that while Triumfant is useful, innovative technology, I often struggle to come up with word pictures or analogies that help others grasp how useful and innovative it really is.  Thankfully, we employ lots of smart people and one of our developers came up with what I think is an exceptional analogy.

Because Triumfant assumes nothing, it scans just about every persistent attribute on every machine in the endpoint population and sends this to the server for analysis.  Since the majority of the state data on each machine rarely changes, after the first snapshot is collected the Triumfant agent performs change data capture and only sends changes up the wire for subsequent scans.  This is, of course, the proven, prudent and efficient way to monitor large amounts of data that is predominantly static.  Otherwise, you end up moving large answer sets across the wire needlessly.  The data is available at the server level in a repository to power all forms of situational awareness.

The analogy suggested by our developer is the Google approach.  Google does not know what questions will be asked of its search engine, so it uses crawlers to traverse the World Wide Web to collect data and store it in anticipation of any question.  Google puts that raw data through correlation and pattern matching algorithms to further speed the search process.  The logic is simple – a search against the open Internet would be grossly inefficient and utterly preposterous.  By gathering the data before the question is asked, Google actually returns answers while you are asking the question.

Triumfant does essentially the same thing as Google for endpoint state data, because like Google, we do not know the question until it is asked.  Triumfant does not rely on prior knowledge and instead detects malware and configuration problems by monitoring change.  We use our agent to continuously monitor over 200,000 attributes per machine and then collect that data at the server level.  Queries, online views, and data feeds execute against the repository data at the server and require no interaction with the endpoints.  Put this in contrast to other tools that have to get the data from the endpoint for every question asked.

Triumfant’s repository can be queried directly and a report produced in hours (more likely minutes but I don’t like to show off).  You would know almost immediately how many machines have the new vulnerability and therefore be able to assess the risk to your organization.  It would not matter what machines are connected at that time nor would it impact the network or the endpoints.   Why?  Because like Google, the hard work of gathering and processing the raw data is done and the data readily available.  Best of all, the Triumfant agent performs its continuous monitoring unobtrusively and efficiently, and only sends back changes across the wire once a day.  You get faster access to the data with no impact to the endpoints or your network.

With other tools, you would either have to initiate an agentless scan of the machines to collect the required information, or push some new query or script to the endpoint agents for execution.  Either way, this activity places a burden on the endpoint and on the network as potentially large answer sets are returned across the wire.  The necessary data would then be collected in some repository and evaluated over time.  I was recently at a prospect that I would judge to be progressive and perceptive, and that prospect told me that it takes two weeks to identify machines affected by a new vulnerability for a population that is not large by most standards.

One hour versus two weeks.  Impressive.  Most Impressive.

But wait, there is more.  Most vulnerabilities have a short term mitigation strategy that involves setting some registry keys to temporarily disable the vulnerability until a patch is created and applied.  With Triumfant, a simple policy can enforce the temporary fix and applied in less than 24 hours.  Since there is likely no signature for an attack that quickly moves to leverage the new vulnerability, Triumfant will see those attacks and build a remediation to stop the attack.  Triumfant sees the new vulnerability, effectively closes the vulnerability, and detects anything that attempted to exploit the vulnerability.

The concept of accessing the central repository rather than continuously interrogating the endpoint machines works for all forms of situational awareness, business intelligence, data mining and analysis, and external feeds.  For example, Triumfant stores SCAP attributes for CCEs, CPEs and CVEs in the repository, so when the organization wants to build a CyberScope (Triumfant is a certified CyberScope provider) feed it does so from the repository without intrusion on the endpoint or consumption of network bandwidth.

So there you go.  Triumfant is like a web crawling search engine for the state data of your endpoint population.  The data is there so you can ask questions and get the situational awareness your organization needs to keep pace.  Gartner and other have been talking with increasing frequency about the importance of situational awareness and Enterprise Security Intelligence. I cannot think of a more more efficient and detailed source for endpoint state data than Triumfant.