Django and PostgreSQL – improving the performance with no effort and no code.

Posted by ymirpl on July 8, 2011

We really like Django and PostgreSQL. And we can improve their performance two times (2x!) with no changes at all done to your application.

It’s a long post, so we’ll start with a summary:

Application performance

We will be using:

Django 1.3 + Jinja2 2.6-dev
nginx 1.0.4 + gunicorn 0.12.2 (1 worker)
PostgreSQL 8.4

Performance test will be performed using:

Blitz – set to sweep 1-110 users
Apache Benchmark – with concurrency set to 1, and 100 request – to average out the single request time

Case study

Yup, it’s going to be a social example. I know. We start with a simple twitter-like application (microblogging, but without followers – what a twist!). Let’s call it:

The Tuitter!

You can see the source code at . This repository contains a few more clever files, but we will not talk about them just yet, their time will come in the next few weeks (keep in touch for The Big Django Hosting real cost and performance review).

We will be looking at the performance of the index site, which checks your session (kept in the DB – Django default – we do not recommend this on production) and prints out 10 latest Tuits (using INNER JOIN query). It will come in 2 flavours:

/ – using Django templating engine
/jinja2/ – using the excellent Jinja2 templating system (we DO recommend using it!)

See the Demo here.

Stage 1

We’ve created some user accounts and Tuitted some Tuits to fill the DB. Let’s use blitz to find out it’s melting point.

Stage 1 – Django templates

Stage 1 – Jinja2 templates

Apache Benchmark shows:

Requests per second:    24.04 [#/sec] (mean)
Time per request:       41.602 [ms] (mean)
Time per request:       41.602 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    31.40 [#/sec] (mean)
Time per request:       31.850 [ms] (mean)
Time per request:       31.850 [ms] (mean, across all concurrent requests)

It peaked at 24 requests/second. By the way – see the advantage of using Jinja2? – it made 31 requests/second.

So let’s profile this view. We use our simple (just add it to your MIDDLEWARE_CLASSES, as the first item). If you add ‘?profile=’ to the URL, it will display the profiling data.

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3    0.013    0.004    0.013    0.004 {cursor.execute}
      1    0.007    0.007    0.007    0.007 {psycopg2._psycopg.connect}
     22    0.002    0.000    0.003    0.000 base.py:275(__init__)
   1560    0.002    0.000    0.002    0.000 {isinstance}
  22/11    0.002    0.000    0.007    0.001 query.py:1128(get_cached_row)
294/284    0.001    0.000    0.003    0.000 encoding.py:54(force_unicode)

Problem

Whoa, so we’re spending most time in psycopg2._psycopg.connect, just connecting to the PostgreSQL? What can we do to improve it?

Solution

Well, we’ve got to try connection pooling. Each time you make a request, Django makes a new connection to the database. Pooling means that each gunicorn instance will have its own connection set up and never closed, so it would reduce the need to call _psycopg.connect. But that would mean having to modify Django. We do not want to do that, as it makes upgrading Django and using fabric/pip deployment painful in the long run.

We’ve got 2 options:

use pgpool-II
use pgBouncer

They are both production ready solutions. From the user perspective they are used the same way – we connect to them instead of the database, and they deal with the connection pooling.

Stage 2 – pgpool-II

Installing pgpool

On Debian just do a:

apt-get install pgpool

pgpool configuration

The file we’re interested in is pgpool.conf. In Debian it will be in /etc, if you compile pgpool from source, you will have to copy and rename the sample configuration file to pgpool.conf.

Default options are all right, we just have to set the database info. Set options:
backend_host_name, backend_port, backend_socket_dir to your PostgreSQL instance info. By default pgpool will listen at the port 5433. Start pgpool up, change the database port in your settings.py and your ready to go.

Performance

Stage 2 – Django templates + pgpool

Stage 2 - Jinja2 templates performance tested using http://blitz.io/ with pgpool

Stage 2 – Jinja2 templates + pgpool

Apache benchmark shows:

Requests per second:    30.74 [#/sec] (mean)
Time per request:       32.534 [ms] (mean)
Time per request:       32.534 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    48.93 [#/sec] (mean)
Time per request:       20.437 [ms] (mean)
Time per request:       20.437 [ms] (mean, across all concurrent requests)

This time we peaked at 31 requests/sec (49 for Jinja2). Well, it’s better by 20%, with almost no effort! But profiling data says that we still spend much time connecting, which is caused by pgpool having to authorize us every time with the database. It would help much more if the database server was in a different datacenter.

Stage 3 – pgBouncer

Now it starts getting interesting. pgBouncer is a event-driven connection pooler, which will not authorize your every connection. Instead it will authorize you itself using PostgreSQL’s auth file or even your own text file with users and passwords. This should make our application perform A LOT better.

Installing pgBouncer

Use to compile it from source (Debian/Ubuntu-compatible) or do it yourself.

pgBouncer configuration

Modify the configuration file (or the sample configuration file available in etc/pgbouncer.ini of the source code copied over to /usr/local/etc/pgbouncer.ini):

Modify the [databases] section to look like this (if your server is on your local machine on the default port):
```
[databases]
* = host=127.0.0.1 port=5432
```
In [pgbouncer] section:
1. Set the listening port for example to:
```
listen_port = 6432
```
2. Set the authorization file, if using PostgreSQL 8.x on default settings:
```
auth_file = /var/lib/postgresql//main/global/pg_auth
```
3. Set the log and pid files:
```
logfile = /var/log/pgbouncer.log
pidfile = /tmp/pgbouncer.pid
```
4. You have to set the user pgBouncer will run as:
```
user = postgres
```
  You can choose any user, as long as it’s not root. We will not dwell about security here.

Run by executing:

su -l postgres -c "pgbouncer -d /usr/local/etc/pgbouncer.ini"

Modify the settings.py file, set the database port accordingly (in this example – 6432).

Done. Let’s get it smoking:

Performance

Stage 3 – Django templates + pgBouncer

Stage 3 - Jinja2 templates performance tested using http://blitz.io/ with pgBouncer

Stage 3 – Jinja2 templates + pgBouncer

Apache Benchmark shows:

Requests per second:    36.44 [#/sec] (mean)
Time per request:       27.445 [ms] (mean)
Time per request:       27.445 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    70.99 [#/sec] (mean)
Time per request:       14.086 [ms] (mean)
Time per request:       14.086 [ms] (mean, across all concurrent requests)

Well well well, peaked at 37 requests/second, that’s 50% better than what we started with! With Jinja2: 71 req/s. It will be even better for you – this server has only one CPU core – and we hit another limit, this time we’re CPU, not I/O bound (as a side effect – that’s why Jinja2 is 2 times better this time).

Next week Tomek Kopczuk () will write about running Django on Heroku‘s brand new cedar stack and getting the most out of it (6 times more, in fact).

Comments

kowsik

Great blog! Thanks for the plug on http://blitz.io. Do let us know how we can continue to improve our service!

[edit] Just gave +250 blogging credits for talking us up! Your free plan now allows you to rush up to 500 concurrent users. Enjoy.
http://twitter.com/muniu Muniu Kariuki

Awesome!
Philip Cammarata

I like that you use Jinja2 for use in Django but how do you recommend you go about integrating it? I’ve seen a few options such as the template_loader, coffin and middleware. How does ATP do it?
- https://askthepony.com/blog/ Marcin Mincer
  
  Pony says – go for Coffin. It eases the transition quite a bit for no apparent loss if compared to going for Jinja2 alone. Good luck!
  - Philip Cammarata
    
    Thanks! I’ll check out Coffin first.
Pingback: Getting Django on Heroku prancing 8 times faster. | Ask The Pony()
Pingback: Fabric script to deploy combined and otherwise optimized media. | Ask The Pony()
Anonymous

In your `InstrumentMiddleware`, you can simply use: `stats = pstats.Stats(request.profiler, stream=stream)`, instead of creating a temporary file and later deleting it.
- https://askthepony.com/blog Tomek Kopczuk
  
  True, well spotted!
.fru Amir Fruchtman

great post !
Michael A

It’s important to note that, while Jinja2 is much faster than Django’s template engine, the reason the Jinja2 gain appears to grow drastically when coupled with a connection pooler is that max_connections is being reached (either in PgBouncer or Postgres itself), thus resulting in hanging clients. In other words, the application layer happened to be the bottleneck here. If max_connections was set higher, the Jinja2 gain would be similar (in percentage) among all three tests.
Pingback: Python RESTful webservices with Python: Flask & Django solutions | Solution4Future blog()
Pingback: Flask/Django: Python RESTful API | webMASTAH()
http://www.newsfixed.com/ Brendan

It’s worth noting that Django 1.6 introduced long-lasting database connections which can be configured using the CONN_MAX_AGE database parameter of your DATABASES entry in settings.py.
- Cliffton Fernandes
  
  but doesnt django close each connection after every request ?
  - http://www.newsfixed.com/ Brendan
    
    Django doesn’t close its connection to the database on every request if you set this parameter (see https://docs.djangoproject.com/en/dev/ref/databases/#persistent-database-connections ). You might be thinking of the HTTP connection to the end user’s browser, which is a separate issue.
http://arkadefr.github.io/ aRkadeFR

Excellent comparaison :)
I always use pgBouncer for all my projects. Never try Jinja though.
Pingback: sharpek.net » „Wysokowydajne” django()
http://pixabay.com/ Simon Steinberger

Does PGBouncer bring an advantage when using CONN_MAX_AGE in Django 1.6+?
- andrew chase
  
  Simon, I’ve been wondering that same thing.
  
  I believe it does if you are connecting from a large number of appservers. In my case I am running a cluster of 10+ appservers, each running a UWSGI with 10 workers. So right off that bat that’s 100 persistent connections to postgres. Add in another 20ish connections for various background processes and that connection count starts to look a little high.
  
  As I understand it, it is better to pool those connections at the PGBouncer layer and then configure and explicit number of connections from PGBouncer to PG, which has the advantage of being a fixed number that has presumably been tested for optimal performance.
  
  Anyone else have thoughts on this? Am I off base here?
  - http://pixabay.com/ Simon Steinberger
    
    That sounds totally reasonable. Oh boy, you must have a hell of an app/website – with so many servers!!
  - https://askthepony.com/blog Tomek Kopczuk
    
    This is exactly what I always go for. Plus there’s a pro to this which took me by accident once – if you make a mistake somewhere that causes one of the background workers or app servers to constantly exit and be respawned in huge numbers – you’ll be bashing PGBouncer (which is ready for it) and not your main database.

Bits about Django performance, scalability and deployment

Django and PostgreSQL – improving the performance with no effort and no code.

Case study

Stage 1

Problem

Solution

Stage 2 – pgpool-II

Installing pgpool

pgpool configuration

Performance

Stage 3 – pgBouncer

Installing pgBouncer

pgBouncer configuration

Performance

Comments

About this blog

Are you a designer?

Recent Posts