A question I was asked today in a very serious tone…
Even though the company endorse open source software, it still has a few applications that where initially developed using Microsoft technologies and while none of the teams (open source, .net) are very vocal about their concerns, fear and hatred towards the opposite solution, the communication between these teams is not very active. Mistrust has easily take its throne in our company.
Trying to get back on the topic, today the .net team encountered (or should we rather say missed) a bug that caused their application to go offline. Working in a company that provides SaaS, that means a handful of websites where down. As I have never worked with .net I was very eager to see how they would handle this problem, so I took off 2 hours of my time to participate as a spectator in their problem.
While some people would consider rude for me to sit there and watch their activity while the frustration level rose higher and higher, I tried to keep myself as silent as possible so they could concentrate as much as possible on cooperatively sharing information about the many different symptoms of the bug.
At first I was hyped to participate, but later that hype turned into amazement and cringing.
While I do not dare to criticize their programming skill (never actually discussed about programming or actually seen their code) their deployment and analyzing procedures scared me.
First of all, they didn’t use any VCS and would deploy modifications by actually copying the files remotely via FTP.
Oh, should I mentioned that they used a couple of Remote desktop sessions to find their way around the different places the data is scattered. One thing it is to use FTP for copying and another one to not have organized your scripts and utilities.
While monitoring errors in the event log they didn’t seem to do more than just open the error notification, noticing that is exactly the same as the rest of over 9000 errors already caught and closing it back.
This was my first attempt to give a helping hand (or so I taught), so I asked one of the team members who monitored the error log if he could show me the error
just out of curiosity, he gladly did so, so I asked if he could show me more samples; I actually was looking for variance, who knows, maybe they missed it while opening/closing the notifications. He promptly closed it as saying
they are all just the same, what would you expect… So I asked one last question
Uhm, could you show me the first error notification logged?. He obliged but mistakenly opened the first error notification
in the event viewer, not the first caused by the application. When I said it again, more explicitly and justifying the reason of my request
Maybe more useful information can be found in the first error log, and not the same duplicate message he basically uhmed a you’re right and did nothing.
Wow, what was that all about?
A little later in their debugging quest, after many times enabling and disabling different webservers; which from my perspective looked totally randomly, as they didn’t justify to each other the reason, and many where as stumped as I was; some debugging of the database they finally have drawn out a theory that one of the stored procedures may be the root of the problem.
What did they do? Nothing, absolutely nothing else than continue speculating on that premise.
Out of the six people (excluding me) in that room no one did take the time to analyze thoroughly the stored procedure and maybe pinch out the problem (assuming the bug actually was to be found there).
Running out of ideas; and the tension level boiling the room to over 100 degree Celsius; I made a step forward and asked.
Did you analyze the access logs?
What? … Why?
Maybe it could give you an approximate location of the bug, and that could be a good starting point.
And how do you want to do that?
That was the first time I actually felt like I’ve hit a wall. Nowadays, after the years of web development I’ve done in PHP, I take the Apache access/error logs as a thing for granted, never actually imagining that people could not interpret the access log of their webserver. Many things could be done with the information that lies therein, and some of which I’m going to point later on in this article.
Seeing my brief pause, she continued (yes she, OMG a girl programmer)
Well you can look over it while opening the servers access log.
Of course it looked gibberish to me in those few seconds, but that doesn’t mean it can’t be parsed. It has after all a structured format.
I asked in a semi tonal voice:
Do you have cygwin installed? and for my lucky day (which it wasn’t by the way) she asked
What is cygwin?
Before I could dabble on about what, why and how I came to ask about cygwin, another developer in the room asked me
What do you want with the access logs?
I tried to explain, again, my intent to which he argued that they already know the POTENTIAL problem. I actually froze and uncertainty hit me…
Do they actually KNOW what I’m talking about?!…
As soon as everyone else noticed my total silence they switched away their mental context and continued on doing the same thing they did in the last hour.
Apparently, “don’t talk if not asked” applies to many more real life situations that when giving advice to your best friend after his girlfriend cheated on him.
I am in no way a person certified, or acknowledgeable, to give advice in regards of this issue. But is it bad to try and give a helpful hand? Apparently so; so for the remaining of my day got back at my desk, browsed reddit and mentally imaged myself saying “Fuck that shit!”
As for the question
What would you do with the access logs?, I would simply run this script over them for a quick lookup, and maybe further inspection, if nothing else. Remember, out of the six people someone should have been enable to do this.