Needle in a Haystack? How to Find an Unknown in an Ill-Defined, Shifting Maelstrom

In the March 17,2011, post, I demolished the “Finding a Needle in a Haystack” analogy by pointing out that in IT Security we don’t know what we are looking for (the needle) and our haystack is not a homogonous pile of hay but is instead a continuously changing, utterly non-homogenous population of one-off configurations and application combinations.  We went from “Finding a Needle in a Haystack” to “Finding an <unknown> in a <ill-defined, shifting maelstrom>”.

I ended by promising you a solution and that is where I begin.

The first step toward a solution is getting your hands around the “ill-defined, shifting maelstrom” that is your endpoint population.  To find what is unwanted or anomalous in that population, you first need a way to establish what is normal for that population.  You could build and dictate normal, and then enforce that normal in a total lockdown.  That is expensive and hard to do, and in my many travels, I have seen exactly two such environments.  The alternative is to monitor the machines in that population, and accurately create a baseline learned from the environment itself.  One that captures all of the exceptions and disparity in all of its glory.  The end result is a normalized, well defined representation of your ill-defined, shifting maelstrom.  A normalized haystack, as it were.

Easy, right?  Not really.  You have to remember that your target is unknown, so you have no idea where it will appear and in what form.  You must also consider that whoever is putting the unknown in your haystack does not want it to be found, and will so design the unknown to evade detection.  Zero day attacks don’t show up as shiny needles.  You can assume nothing; therefore, you must monitor everything as part of your normalized haystack.  You must also remember that the population shifts (wanted change) and drifts (unwanted change) by the moment, so you will need to keep it current.

In short, you will need continuous monitoring that is comprehensive and granular.  Not the kind the scanner vendors sell you that sees some of the machines in weekly or monthly increment, or the kind the AV vendors sell you that sees parts of the machine and not the entire picture.  You will need comprehensive and truly continuous monitoring.

In yesterday’s post, I noted that if you had a homogonous haystack you could remove everything that was hay and what is left should be the thing you are looking for, even if you do not know what that thing was.  Our haystack is not homogonous, but now we have created a baseline that provides the next best thing.  We can’t throw out the hay, so we need a slightly modified approach that uses changes to the machines as our potential indicators to compliance issues and malicious attacks.

If we are smart, we can use this approach to our advantage because once we establish our normative haystack we can continuously monitor the machines and identify changes.  This fuels our detection process and drives efficiency in managing the shift (we want to control the drift, but that is another post) in the population.  By capturing changes, we can keep the image of the population current with minimal drag on the endpoints and the network by moving changes across the wire.  No need to move large images when incrementally smaller change captures will do.

Once we identify the changes, we will need analytics that assess the impact of those changes to the associated machine.  These analytics will leverage the context provided the normalized model of the haystack to identify those changes that are anomalous.  Changes identified as anomalous are further analyzed to gauge their effect on the state of the machine and identify those changes believed to be malicious.  We can use the context and other analytic processes to group changes so that we see the malicious code and all of the damage done to the machine by the malware.

We have successfully identified the unknown in our ill-defined, shifting maelstrom, which, like I said yesterday, is infinitely harder than finding a needle in a haystack.  We did not just find the unknown, we have detailed its composition, analyzed the effect to the machine, and identified its path of destruction.

I think we are onto something here.  This could revolutionize malware detection, creating a detection capability that is agnostic to attack type, vector, and delivery.

But wait, there is more

About The Triumfant Blog
This Blog is about all things Triumfant

Leave a comment