This blog is now hosted at consciou.us

Thursday, March 4, 2010

Django: Generators are your friend


Pro tip: if you need to shove a lot of data through a Django view, DO NOT attempt to create a big string-- use a generator.


Here's something that seems sensible on the surface:

body = ""
for region in regions:

  for zip in region.zipcode.iterator():
    body = body + "\t".join(
           [region.region, zip.zipcode]
           ) + "\n"

resp = HttpResponse(body, mimetype='application/ms-excel')
resp['Content-Disposition'] = 'attachment; filename=%s.xls' % (unicode("Regions"),)

Except when the set of regions and zipcodes gets large!  Let's consider the following replacement that uses a generator:


def dump_it():
    for region in regions:
       for zipcode in region.zipcode.iterator():
           yield "\t".join(
              [region.region, zipcode.zipcode]
           ) + "\n"
resp = HttpResponse(dump_it(), mimetype='application/ms-excel')
resp['Content-Disposition'] = 'attachment; filename=%s.xls' % (unicode("Regions"),)

There is a slight performance difference; the latter takes a few seconds (2 seconds for almost 70K resulting rows on my dog of a laptop).  However, I attempted to benchmark the former on the same dataset, and it took almost 13 MINUTES (775 seconds).  So, slight, meaning within 3 orders of magnitude.

1 comment:

Microsoft Office said...

Microsoft Outlook 2010 to help control email volume, you can find the desired content, and perform operations at the appropriate time and location. Outlook 2010 download needs no introduction since it is the industry standard.