Monday, February 1, 2010

Graduation and the sales kick-off meeting

I haven't had the time to brag about it last week, but on last Monday I managed to defend my thesis and graduate to finally become a certified software engineer. Having spent, well, quite a bit more time in the programme than originally planned and cca. 5 years working in IT, it was high time it happened.

It was a great opportunity to celebrate it at the sales kick-off conference we held at the end of the week. I gave a presentation there about Shell Control Box and held a workshop the next day together with Marton, but the bigger part of the job was to talk with the guests, answer their questions, find out their opinion about our products -- and of course, to make sure they have a good time during their whole stay. I hope we did well in the former (the sales figures of the year will give a good feedback about that), but I'm pretty confident we excelled in the last one: it brought a huge smile to my face when at some point during the party we held Friday evening, a guy from one of the partners from the other end of the world just shouted into my ear while dancing: "BalaBit rocks!" :)

Big up and thanks to everyone involved in the organization of the event -- you did a great job.

Tuesday, January 19, 2010

VersionOne with Bugzilla quips

Bugzilla has a rarely used feature called quips. It just displays random quotes on the top of each page from a database to which clever new sentences can be added incredibly easily. Here at BalaBit we love this feature: it contains ~400 entries from the last 5-6 years and serves as a collective memory of the company, saving notable adages from colleagues often no longer working here and unknown by freshmen and which were said during the development of ancient pre-alpha versions of our now mature products. Some of the phrases from the more popular quips have became a part of the everyday spoken language we use here, hell, it's even mentioned in the checklist made for HR used to get newcomers started on their first day under the "Company culture" section.

Well, the problem is that over the last year we've slowly became oh-so-agile and moved our day-to-day work tracking interaction from Bugzilla to VersionOne. We're still using Bugzilla for its original goal (that is, tracking bugs), but most of the developers now only check it once or twice a week while they're supposed to log their work into VersionOne and check for new tasks there every day. So the time-honored tradition of Bugzilla quips slowly started to fade, and of course, we couldn't let it happen!

VersionOne is a proprietary application, so we couldn't just modify it to add the quips, even though it would've been a fun practice to hack some ASP.NET. It does support user-made plugins, but the API for it seemed too complicated for this simple task and it does not seem to support changing the core interface, only to add new pages -- but to re-add the original Quip Experience we needed a small field added on the top of the normally used listings. For a while, we experimented with the idea of using some stored XSS virus to change the interface (strictly only theoretically, of course, but just let me mention it for the record: it worked flawlessly), but that would've raised some eyebrows, so we had to figure out something else.

And here comes GreaseMonkey into the picture, which was made just for that: modifying the interface of a website on the client side. It took only a couple of minutes to hack together a user-side JS that added the field that is populated from the original Bugzilla database through AJAX. Only one addition was needed: Bugzilla did not have an API to query its quip database, so we had to do some easy copy-paste development to add a CGI that, when loaded by the JS, simply displays a random quote. And now we can read the classic once again: "Oh, that's not a leak. It's just caching."

In the unlikely case you'll ever need it, here's the GreaseMonkey script (or here's a bit more obtrusive, but more visible version) and the small CGI needed. Of course you'll have to tailor them to include your Bugzilla and VersionOne URL's, but otherwise, they're ready to use. (Credit goes to Péter Györkő and Gergely Czilly for the GreaseMonkey scripts.)

Monday, January 11, 2010

Introducing pdbtool patternize

As Márton has already written about it, lots of guys here spent a good part of their summer creating a pattern database for some 200+ often-used applications. Just like every manual process, this was a tedious task which begged to be automated. Of course it cannot be fully automated as no algorithm can replace an actual person understanding the structure of the logs a piece of software produces (or even looking into the source code to see how they're generated), but still, a tool that can detect similar messages in a log database and generate a pattern database for it would've been real handy.

Thus have pdbtool patternize been created. The tool I've written is a part of the pdbtool utility and can be used to generate a pattern database from a bunch of unknown messages. It uses the algorithm developed by Risto Vaarandi for SLCT, the main idea of which is using a data clustering technique to find similar log messages and replacing the differing parts with wildcard characters. In our case, the wildcards are @ESTRING:: @ parsers, otherwise, the solution is pretty much the same.

It's far from being perfect (the code could be optimized at some places, it needs to load everything into memory so the size of the parseable log file is limited to the RAM in your machine as swapping slows things down really bad, and the log messages are split to words only at spaces, which makes it unable to detect "username=@QSTRING::'@"-type patterns), but it has already produced some impressive results on the test databases I've tried it with. It managed to categorize 95-98% of ~2M lines of logs into 40-50 patterns in a reasonable time and the patterns themselves were pretty readable as well.

There're two options that can be set for the patternization. The first one ("-S") is the support value and accepts a floating point number: the percentage of the lines that have to match a pattern candidate to include it in the resulting pattern database. It allows tuning the trade-off between the number of patterns (too much would be hard to maintain) and the coverage these patterns produce. My first tests show that the optimal support value for most log types are around 2.5-5%, but you should check this with your own logs.

The other option ("-o") enables a different operation mode. In this case, after a clustering step is completed, the tool does not exit printing out the generated patterns, rather starts yet another clustering on the messages that are not covered with the patterns generated so far. It keeps on doing this while new patterns can be generated. This way, much better coverage can be achieved while still having a low number of patterns -- the larger groups are detected early, but the small groups aren't left out either. Based on our tests, a bit larger support value, around 10-15% is optimal for this operation mode.

The code is available in my public syslog-ng repository and I'm more than eager for some feedback. It still needs a larger review from someone more experienced in syslog-ng internals and leaks a little memory here&there but in overall it's mature enough to play around with even with large log databases.

Friday, December 4, 2009

Santa's testing SSB 1.1 this year

I guess we've been really good kids in 2009, as Santa joined us in release testing of the next syslog-ng Store Box release:


Saturday, November 21, 2009

Watch things grow

Ever since I was a little child, I've always loved watching progress. I could spend hours at the shore of lake Balaton watching the waterdrops on my flip-flops dry up in the sunshine, I loved the now-ridiculous loading times of the games on my C64 as that meant I could watch the bits slowly moving on the screen, one of my first computer programs was an algorithm looking for perfect numbers, which, of course, provided detailed feedback about what it's doing at the moment, and I was crazy about the first DC++ and BitTorrent clients not because they brought me music and movies I couldn't get otherwise but because they had -- oh, the beauty -- 10+ fancy progress bars on screen all showing me The Progress.

Fast forward ten years, and here I am, writing my Master's thesis. As every proper nerd, I'm writing it using LaTeX (OK, I'm old and lazy, so I'm actually using LyX), and as every proper nerd (and also as award-winning science-fiction author Cory Doctorow) I'm version controlling it using git. From the very beginning I knew I want to watch it grow. But how could I do that? There enters git, ImageMagick and some shell scripting.

My document is version controlled, so reproducing a step during its creation is a matter of a single git checkout, that's easy. Fortunately LyX can be used from the command line as well, so a lyx -pdf thesis.lyx generates the PDF from the version at the current commit. ImageMagick is our next tool after that: a convert thesis.pdf thesis.png produces us PNG files for all pages in the PDF. ImageMagick has an other handy tool called montage: it grabs the given bunch of images and aligns them on a single image. It's all pretty straightforward after that: get the list of all revisions with git rev-list, iterate through them and create the picture of the document at that point and when you're finished, combine them into a single animated GIF. And then you can wach progress:



I think I've already spent more time watching this animation than I've spent with proof-reading the thing and it's still three weeks 'till the deadline.

You can grab the shellscript here -- it has ugly hacks, the filenames and paths are hardcoded into it but it could be used as a starting point if you want to do something similar.

Sunday, October 4, 2009

Weird people have lived everywhere

Even tough I've been living in the historical Castle District of Budapest the last two years which is more famous for its tourist sights and being the birthplace or residence of numerous world-renowned artists, politicians and scientists rather than its high-tech-affectionate population, I can't seem to be able to avoid the unmistakable signs of nerds living in the same flat before me. Last year it was CSS on the bottom of the dining table, now it's a dryer made out of Cat5e cables:


I wonder what's coming next.

(By the way, the Twitter destination driver is getting some affection: I've received the first contributed patch cleaning up some build-related issues, Bazsi, the main author of syslog-ng called it the "Neatest syslog-ng hack ever" -- for which I feel deeply honored and proud -- and I've heard of at least one guy outside the company who've actually tried it. I aim market dominance on the platform of Twitter clients, so if you haven't already given it a shot, please do so :))

Saturday, September 19, 2009

ATMs do indeed run Windows...

Attila wrote the following a couple of weeks ago in his blog post "Luigi and Igor in the bank business":

Most ATMs run actually on Windows operational system, which, I think, is already very funny. Windows and the easily manageable systems are not visible to the users, and there are other OSs on the market that are more reliable, stable, and secure. I have never understood why clinical devices, ATMs and spaceships have to use Windows… I do not really feel safe :-)


Well, I did not find it all that funny when I saw this on the screen of an ATM of my bank some days ago:



If we need a (badly configured) built-in Windows firewall to protect an ATM, we're doomed.