This blog is now hosted at consciou.us

Thursday, March 4, 2010

Django: Generators are your friend


Pro tip: if you need to shove a lot of data through a Django view, DO NOT attempt to create a big string-- use a generator.


Here's something that seems sensible on the surface:

body = ""
for region in regions:

  for zip in region.zipcode.iterator():
    body = body + "\t".join(
           [region.region, zip.zipcode]
           ) + "\n"

resp = HttpResponse(body, mimetype='application/ms-excel')
resp['Content-Disposition'] = 'attachment; filename=%s.xls' % (unicode("Regions"),)

Except when the set of regions and zipcodes gets large!  Let's consider the following replacement that uses a generator:


def dump_it():
    for region in regions:
       for zipcode in region.zipcode.iterator():
           yield "\t".join(
              [region.region, zipcode.zipcode]
           ) + "\n"
resp = HttpResponse(dump_it(), mimetype='application/ms-excel')
resp['Content-Disposition'] = 'attachment; filename=%s.xls' % (unicode("Regions"),)

There is a slight performance difference; the latter takes a few seconds (2 seconds for almost 70K resulting rows on my dog of a laptop).  However, I attempted to benchmark the former on the same dataset, and it took almost 13 MINUTES (775 seconds).  So, slight, meaning within 3 orders of magnitude.

No comments: