Monday, February 18, 2013

Post #21 Configuration Management Nightmare

So I galloped to a greatly productive this weekend and by Sunday evening was ready for another deployment till I ran into a nightmare called poor configuration management.

I've always known the importance of it. So many releases / deployments have been spoiled by this innocuous looking thing. The importance of it grows with the team size. In a distributed environment teams use concepts like continuous integration and weekly / daily merges to avoid late surprises. And yet it sneaks in from time to time and ruins an otherwise perfectly successful release.

I understand this issue and though I'm the only one working on this project I'd decided to practice it.

Till nov-12, I was doing only development and therefore I had just one environment. I made builds several times a day and was fine with it. In November, I went live on AWS. There were some environment variables that were different for production environment and I created 2 projects instead. One that mirrored the production and the other that I used for development. I'd typically remember the files I needed to promote and I'd do that once I reached a logical milestone. I did not promote these changes to production immediately though. I would do it about twice a month. Also typically, I upload my statements once a month which meant that a large part of the functionality was unused for most part of the month. This approach worked well till Jan, but as I tried uploading Jan statements, I ran into my first brush with config mgt.

I typically store the uploaded statements in a temp folder which I purge from time to time. The location of the folder is different on the production env. Also, its hardcoded in one of the java files ( A decidedly poor practice). In Jan I was working on some functionality and updated this file. I later promoted this file to production. This obviously overwrote the path to temp folder. As I did not use the functionality till Feb, I did not run into any problem. When I did, it took me a while to figure out what was wrong. Why my statements were not processed?

And so I learnt a lesson and after some research, decided to go with Visual SVN for repository. I got eclipse client to connect to it and set up both production and local as 2 projects.

That helped me in identifying which files to promote and I do not need to remember file names as before.
But, there is a step in between to compare the new version with the HEAD version to ensure the changes are fine and do not break anything.

Yesterday, I missed that step and paid the price. The error happened right at the time you keyed in DomainName.com. It'd show up an ugly 500 message with the entire stack trace. Yes, that's Ugly. Also, the stack trace was not much use.

I had made some changes to a file to manage guest user and I thought that might be the root cause. But after several attempts to correct the problem it still did not go away. This brings me to the effectiveness of logging.

I was always proactive about logging. I have avoided writing elaborate Try Catch and have use AOP based class to intercept and log the errors. This does a decent job of managing errors on the business tier. But when it comes to framework, if there was an error in the framework the whole thing falls apart. Especially spring errors etc.

Anyhow, after much trouble I found out that I had unwittingly promoted changes meant for mobile  devices to production and hence the error.

Net Net 2 lessons.
- Get Config Mgt right.
- Improve Error Logging

No comments: