Continuously Monitoring the Haystack (Needle Inventory Report)

In the previous two posts (Part 1 here and Part 2 here) I first shot down the much over-used “Finding a Needle in a Haystack” analogy by showing that the problem facing IT security professionals is far more complex and then defining a new approach that will find the <unknown> in a <ill-defined, shifting maelstrom>.  I closed the second post by adding a “but wait, there is more”, so here it is: what if I told you that the approach I described not only solves the needle in the haystack problem, it also provides an innovative approach to continuous monitoring and situational awareness.  Without changing the core process one iota.

Think of it – this is like inventing cold fusion and finding out that the solution repairs the ozone layer.  Or inventing a beer that tastes great and is less filling.  Or something like that.

To recap, my thesis is that the analogy assumes that you know are looking for a known thing in a well-defined and completely homogenous population, when in fact we do not what we are looking for in most cases and the machine population that doubles as our proverbial haystack is anything but well-defined and completely homogenous.  Therefore, the proper expression of “Finding a Needle in a Haystack” is in fact “Finding an <unknown> in a <ill-defined, shifting maelstrom>”.

I then outlined how you could solve the problem by first building a normalized view of the endpoint population to create a baseline of the ill-defined, shifting maelstrom, effectively giving you a haystack.  Then you could continuously monitor machines for changes under the assumption that any change may be someone introducing something to your haystack.  By analyzing and grouping the changes and using the baseline as context, you could then assess the impact of those changes to the machine and determine if the changes represented a needle (malicious attack) or just more hay.

Now for the continuous monitoring part.  Because I do not know what I am looking for, I have no choice but to scan everything persistent on the host machine.  Since I am building my normalized model on the server, I have to move the raw data to the server and store that data into some form of data repository.  Logic dictates that the repository that I use to power the process of finding needles in the haystack can be used as the data source for all forms of situational awareness and continuous monitoring activities.

It gets better!  I can perform those activities without incurring any additional burden on the endpoints or on the network.  I can ask questions of the repository without the need to collect additional data from the endpoints.  Most solutions use an agent that scans only segments of the data, or use an agentless scan to collect segments of the data.  If either is not currently scanning the data needed, the process has to be altered and repeated.  For example, a new scan script might have to be pushed to the agents.  Furthermore, organizations often run multiple scans using different tools, each dutifully thrumming the endpoints for data and often times collecting much of the same information collected by the scan that ran an hour earlier.

Of course, I must continuously refresh the data in my repository to be accurate.  Luckily, I already thought of that, and I am using my agent-based precision to detect and cache changes on each host machine in a very efficient way.  I then send those changes to the server once per day and use the changes to refresh the repository.  Given I have a large number of data attributes that rarely change, sending only the changes across the wire keeps the network burden to a minimum.  Obviously, I have to move down a full scan when I initially deploy the agent, but for subsequent updates using the change-data-capture approach results in comparatively small answer sets per machine per day.

My result?  I have a complete repository of the granular data for each endpoint machine efficiently collected and managed and available for reporting and analysis.  I can feed my highly sophisticated information portals, scorecarding processes, and advanced analytics.  I can build integrations to feed other systems and applications in my security ecosystem.  I can create automated data feeds such as the feed required to comply to the FISMA CyberScope monthly reporting requirement.  Best of all, this activity does not require me to go back to the endpoint ceaselessly for more information.

I have implemented true continuous monitoring and comprehensive situational awareness without running any incremental data collection processes.  I am continuously scanning every persistent attribute on every machine.  My data collection routine to find the needles in my haystack is the only data collection process required!  You want situational awareness?  From this data I can readily produce reports for application inventories, patch inventories, vulnerabilities, and non-compliance with policies and configurations.  I can tell you the machines that have had a USB key plugged into them in the past week.

I can keep history of my machine scans and show you the detail of the granular elements of what I am scanning.  Think of the impact for investigating incidents on a machine.  You could pull up the snapshot for any given day, or select two dates and generate a summary of the diffs between the two images so you could see exactly what changed on the machine.  .

I am just getting started.  Since I have my normative baseline in place to interpret the changes I detect on each machine, I can also provide reports on anomalous applications and other activity that is exceptional.

To recap, the data collection processes required to implement my approach to finding the <unknown> in a <ill-defined, shifting maelstrom> provides true continuous monitoring and broad situational awareness.  Or you could say that my approach is a continuous monitoring solution that can identify the<unknown> in a <ill-defined, shifting maelstrom>.  Either way, what results is an accurate picture of my <ill-defined, shifting maelstrom> without having to run any additional scans or data collection, so I get the benefits without incremental burden to the endpoint machines or the network.

But wait, there is more.