Saturday, November 7, 2009

A Sendmail replacement in Python

I know I blog about odd things, but it's just that those odd things happen to me! The problem we have now is the following: we need to have a sendmail-like command that will send email through an external SMTP server. Why? Well its a long story that can be resumed to the fact that I just don't want to install Sendmail into my VoIP server ;).

So instead of doing what everyone would do (apt-get install sendmail) I wrote a simple script in Python that works somewhat like Sendmail does, with the exception that it just forwards everything to another server. I *could* make it work without an external server, but that would require to resolve the DNS of the destination server and to find the MX registers... and I'm lazy. Plus that would require an external Python Module to do the DNS resolution, and I wanted my script to be pure-Python with no dependencies other than the Python library. Anyway if anyone, ever, needs a script to do the DNS thing, ask and I'll do it *sigh*.

So here comes the script. The script accepts the sendmail -t parameter so the "To" header from the Email is used to specify the recipients, otherwise recipients should be specified by command line.

#!/usr/bin/python

import email
import smtplib
import sys

SMTP_SERVER="YourMailServer.com"
SMTP_USER="YourUser"
SMTP_PASS="Ssh-Sekrat!"

def sendmail(from_addr, to_addrs, msg):
    smtp = smtplib.SMTP(SMTP_SERVER)
    smtp.login(SMTP_USER, SMTP_PASS)
    smtp.sendmail(from_addr, to_addrs, msg)
    smtp.quit()

def main():
    msgstr = sys.stdin.read()

    msg = email.message_from_string(msgstr)
    if sys.argv[1] == "-t":
        to_addrs = msg["To"]
    else:
        to_addrs = sys.argv[1:]
    sendmail(msg["From"], to_addrs, msgstr)

main()


Thats it! To test it:

./sendmail.py someone@somewhere.com
From: your@friend.com
Subject: Hello!

Hello! Hows life?
<CTRL+D>

I hope you liked it, happy hacking!

Sunday, October 18, 2009

The Holy Grail of Binary Data in PostgreSQL and Python

I'm not sure if you've ever had this problem.  I've had it a lot of times and I suppose I'm not alone. The thing is the following, say you want to store binary data in PostgreSQL. For those who use MySQL, I've heard it's a very simple task, however I don't care since I never use MySQL :). On the other hand, doing this from Python to a PostgreSQL database, can be quite tricky. What I used to do is to Base64 encode everything and store it on the database, that worked but it was slow and bloated. There's another way, the "bytea" data type. The problem with bytea is that PostgreSQL wants things escaped and it is quite unclear how to do it. However, today I found it out and I was so happy I decided to blog it! It is actually simple once you know the trick. First you have to create a table with a bytea field (where you store your bytes), for example:

CREATE TABLE IMAGES (ID SERIAL, NAME VARCHAR, DATA BYTEA);

This is the easy part and I think we all have gotten this far. Now the thing is in the Python side. The secret is to escape things properly. To insert binary data to a PostgreSQL database you need to follow this scheme: E' + data in octal + '::bytea For example:

INSERT INTO IMAGES VALUES(default, 'PARIS.JPG', E'\\001\\002\\003\\031\\313'::bytea);

Notice the double "\" since the bar is un-escaped by the database. Easy? Yeah! But hard to find out. Now it's just writing a simple Python function to do the escaping:

def octize(data):
    out = "E'"
    for char in data:
        octdata = oct(ord(char))[1:].zfill(3)
        out += "\\\\" + octdata
     return out + "'::bytea" 


And now we could do something like:

conn = psycopg.connect()
cur = conn.cursor()
cur.insert("INSERT INTO IMAGES VALUES(default, 'PARIS.JPG', %s)"  

           % octize(image_data))
cur.close()
conn.commit()

Hope you liked it! Happy hacking!

Monday, December 22, 2008

Website statistics: Roll your own real-time Google Analytics in 5 minutes!

Hello my fellow hackers! I want to talk about something we all freak out about at least once in life: website statistics.

Website statistics have been a headache for webmasters since Internet became Inter-net. It is fundamental to know how many people see your site every day, what they do there, from where they come, what OS and browser they are using, etc. This is so for a lot of reasons, for optimizing your site to get traffic, for bandwidth optimization and (in my case) just to watch endlessly, frenetically and insanely how the visit counter crawls up ;).

Though in the early beginnings of Internet these solutions weren't so easily available, in this ever-changing, over-connected and wonderful world there is a huge number of options for us to choose. The most popular and widely used (IMO) is Google Analytics.

So why bother making our own then? I won't say I don't like Google Analytics because I really like it, I think its a wonderful solution and a must-have tool for any webmaster. However there is something I personally don't like: the fact it is not real-time. Although this is perfectly justified (they must index and analyze millions of sites and real-timeness is difficult to achieve at such scales), I still wanna see my lil' counter crawl up!!!

The question is, do YOU want to see your lil counter crawl? From a practical point of view, I'd say probably not, since usually having the information available by the next day is already enough to do all your work. But actually I think you probably do. Why? Well, I guess we techies are just like that :) I decided therefore to make my own lil' analytics and set it up in my blog today. So lets get to it.

First of all I'd like to start with how statistics work and how do we get them. As most of you already know, in the ancient times only log statistics were available. That is, you'd just walk through your http server's log and you would build statistics from it. That was great since it was simple and totally passive (no modification needed to the site) and you could see it real-time (a tail -f access.log would show it all), the cons: some things you may find interesting about the users such as the resolution of their screens were not logged in the server and hence out of your stats.

Now how Google Analytics work. You basically embed a little piece of Javascript code in every page in your site you want to analyze and you are set up. Then all you need to do is to log into your Google account and you have your stats there. The con: as I said, it's usually updated after 24 or 48h. And how do they do it? Well basically, this little script makes your browser point to one of their servers while the site is loading, posting all the necessary information to the stats collector.

So lets do the same!

First we will start with the Javascript code, you can save this as stats.js:

data = [ document.referrer,
navigator.userAgent,
screen.width + "x" + screen.height,
screen.colorDepth]
query ="";
for (val in data) {
query+=data[val] + "&";
}
img = document.createElement("img");
img.setAttribute("src", "http://yoursite.com/analyz0r.gif|" + query);

Ok this little monster does all the magic. To embed this in your site you'd simply:

<script type="text/javascript" src="stats.js"></script>

What will this do? It will simply make the browser try to load an image called analyz0r.gif in your server, sending all the information we want about the client. The image can just be missing, we don't really care. We are combining here log analysis with Javascript. We will get something like this in the server's log:

exe@melange:~/workz/stats$ cat /var/log/apache2/access.log
127.0.0.1 - - [23/Dec/2008:01:21:29 +0100] "GET /analyz0r.gif|http://localhost/&Mozilla/5.0%20(X11;%20U;%20Linux%20i686;%20en-US;%20rv:1.9.0.5)%20Gecko/2008121622%20Ubuntu/8.10%20(intrepid)%20Firefox/3.0.5&1680x1050&24& HTTP/1.1" 404 334 "http://localhost/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.10 (intrepid) Firefox/3.0.5"

Although it looks ugly, this is easily parseable into meaningful data. We can for example use this simple script to export it to CSV and then you can load it with your preferred spreadsheet.

exe@melange:~/workz/stats$ cat /var/log/apache2/access.log |grep analyz0r| sed s/%20/" "/g|cut -d"|" -f2 |cut -d"&" -f1-4
http://localhost/&Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.10 (intrepid) Firefox/3.0.5&1680x1050&24

The format is simple: referrer & user agent & resolution & depth. You can modify the script to add all the fields you would like to.

And it works with other browsers too! Look:

http://localhost/&Opera/9.61 (X11; Linux i686; U; en) Presto/2.1.1&1680x1050&24

Happy analyzing!