Test, retest, re-retest, and then prove it works

It amazes me just how many people I see in infosec, and IT in general, that simply don’t test (or prove something functions correctly).  I would argue it is one of the most important aspects in our day-to-day activities and it just doesn’t get the attention it so deserves.

The lack of testing is something I will never understand.  Maybe my years of going through numerous government audits are the reason.  The audits I’m referring to are the kind where the auditor says, “prove to me you are logging correctly by attempting to modify an audit log as a normal user and then show me in the logs where the attempt was recorded.”  Since I wanted to ensure we would pass the audit, and so I wouldn’t be on the hot seat, I tested our logging over and over.  Actually, the thought of failing a government audit was always on my mind as a system administrator, so I spent time testing everything listed in our master system security plan.  When I was done, I would then test again just to make sure I didn't miss something the first time.  I would then ask someone else to double check my work as a final test.  Fast forward many, many years later and whether it is a major system change or confirming security tools actually work correctly, many infosec engineers just don’t bother testing.  Here are a few real world examples of what I have experienced to hopefully drive my point home…

I was in a meeting once where someone suggested removing antivirus from a group of workstations to speed up their performance and cut down on help desk calls.  By “group of workstations” I’m talking well over 8,000 systems.  For “evidence” of why it would be okay to remove the antivirus, a senior infosec engineer said he ran a report for the past 2 months in one of our security appliances and there were no hits for viruses.  The room basically cheered, called success and were in agreement and ready to move on with the change.  Of course I couldn’t sit quietly.  I’m sure a lot of you have already said out loud while reading this what I said next. 

I said and/or asked the following:

  • “It’s virtually impossible to have 0 hits on viruses in 2 months on that many computers.”

  • “There would be at least 1 false positive in there at some point.”

  • “Have we proven a virus would actually be caught and/or logged?”

  • “My first thought is something isn’t working correctly.”

To me, they were pretty "no brainer" type questions or statements any infosec engineer should be thinking...especially one in a senior role.  I was wrong though, because the room looked at me as if I was a dream killer and basically wrote off my comments since it wasn’t in their favor. 

I’m thinking to myself:

  • “Before we make a major change such as removing antivirus and potentially putting the enterprise at risk, shouldn't we all want to prove things are being logged correctly.”

  • “Let’s make sure a virus would actually be logged, by downloading a few malware samples.”

  • “Let's do what we can to ensure are metrics are correct.”

  • “If nothing else, let’s cover our butts in case we make the change and the network gets infected.”

I mean who wouldn’t want to prove without a doubt it was a wise decision?  Not testing the metrics, or foundation of our decision, would just be foolish.  I was beyond dumbfounded to be honest.  How could a room full of “infosec professionals” not know the possibility of not catching 1 virus in a 2 month period, even a false positive, in over 8,000 machines is pretty much 0.  You are probably more likely to win the Powerball, twice, before that is true. 

Even if we did assume the metrics were true, the fact everyone was ready to make a major system change without spending a few minutes to test and confirm the metrics totally threw me for a loop.  To make a long story short…I ended up being right.  The security appliance was not logging correctly.  Not sure who actually ended up listening to my suggestions and "dream killer" comments, but glad someone did.

Another typical practice I see is the failure to test before and/or after changes.  I’ve seen it many, many times.  A security engineer makes a big change in a security appliance and assumes it’s good to go.  They just blindly assume no mistakes were made.  The engineer didn’t test anything prior to making the change and didn’t test anything after making the change.  Or if they did test, it was half-fast and not thorough.  The results, as you can probably guess, are less than ideal.  I’ve seen everything from people spending countless hours troubleshooting something just to find out it was a change made by someone, sometimes even by someone on the same team who didn't warn others about the change, to seeing the entire network being brought to a screeching halt resulting in lost revenue.  I don’t know if the lack of testing in this case is out of laziness or arrogance.  Laziness and arrogance are two of the worst personality traits to have in infosec, because either one could lead to catastrophic results for everyone around.

Now this is on the lighter side of testing and it’s something I think we have all fallen victim to.  You have to test your search strings.  For this example, I will use something that happened to me...probably more than once to be honest.  I jumped into a firewall and my search strings were there from the last time I had logged in.  I go about my normal duties of looking to see if an IP and/or URL shows up in the firewall logs from a phishing attempt I was analyzing.  I probably worked on 8 different attempts before I realized, I forgot to double check my search string prior to starting.  I had already taken numerous screen shots verifying no one fell for the phishing attempts, but low and behold, my search string had an “and” statement instead of an “or” statement.  I was looking for the suspected phishing IP in the source and destination, so they had to match exactly.  You can stop laughing now.  You know you have done it too.  Lol.  I just assumed my search string was what I always use, because it looked the same. I forgot I was actually troubleshooting an issue the last time I was in the firewall and not analyzing phishing attempts.  It was a good reminder and probably one I obviously needed.  You must check your search strings. 

Tip: Always test to ensure you are confident you will get the desired results prior to analyzing incidents. For example, If you are looking for an IP in the firewall, navigate to a known good IP and then use your search string to find your hit in the logs.  If you are looking for a URL, navigate to a known good URL and then find your hit in the logs.  Once you have everything set-up correctly and verified/tested everything is as you expect, i.e. your search strings showing known good hits in the logs, continue with your analysis.  If you don’t, you may end up losing several hours of work or drawing false conclusions which could be detrimental to your network.

I've also seen more times that I can count, or that I care to witness, where people create alerts, configure rules, etc. in various security appliances or software and never go back later to ensure the alerts or rules are still working as intended.  Things change within the environment, updates are pushed to devices or software, people create new rules within security devices, or many other possible things happen to where a rule or alert created several months ago is either not functioning correctly or is now preempted by something.  It amazes me when infosec engineers, or others in IT, never give a second thought to an alert they haven't seen trigger in a while.  For example, let's say your infosec team normally sees 4-5 failed login attempts per week.  If a week goes by without 1 failed login attempt, everyone on the team should be testing to see why that alert hasn't fired.  However, I've seen more times than not where numerous weeks go by without 1 failed login alert/attempt before maybe, maybe someone brings it up.  You can't help but scratch your head and wonder how this could happen with so many smart people around.

I could list many other scenarios, but I think these examples illustrate my recommendation/concern.  Testing should always be a priority, especially in IT and even more so in infosec where a slight change could cripple the network or false metrics could be disastrous.  One serious issue resulting from a lack of testing is other departments, especially senior management, becoming leery of “security” making changes.  The rest of the enterprise will fight every change the infosec team proposes, because of the numerous mistakes made in the past.  To prevent this from happening, test several different aspects of your change before and after implementing it.  If possible, enroll some “test subjects” from other groups/teams to test with you.  Do your best to ensure any change made won’t negatively impact your network.  On the same note, remember to always test your metrics or metrics collection procedures to ensure the information being provided to everyone is correct and as accurate as possible.