Friday, February 19, 2010

python twisted Throttling with Cooperator

Example 8: Throttling with Cooperator


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.internet import defer, task

maxRun = 2

urls = [
'http://twistedmatrix.com',
'http://twistedsoftwarefoundation.org',
'http://yahoo.com',
'http://www.google.com',
]

def pageCallback(result):
print len(result)
return result

def doWork():
for url in urls:
d = getPage(url)
d.addCallback(pageCallback)
yield d

def finish(ign):
reactor.stop()

def test():
deferreds = []
coop = task.Cooperator()
work = doWork()
for i in xrange(maxRun):
d = coop.coiterate(work)
deferreds.append(d)
dl = defer.DeferredList(deferreds)
dl.addCallback(finish)

test()
reactor.run()

This is the last example for this post, and it's is probably the most arcane :-) This example is taken from JP's blog post from a couple years ago. Our observation in the previous example about the way that the deferreds were created in the for loop and how they were run is now our counter example. What if we want to limit when the deferreds are created? What if we're using deferred semaphore to create 1000 deferreds (but only running them 50 at a time), but running out of file descriptors? Cooperator to the rescue.

This one is going to require a little more explanation :-) Let's see if we can move through the justifications for the strangeness clearly:

1. We need the deferreds to be yielded so that the callback is not created until it's actually needed (as opposed to the situation in the deferred semaphore example where all the deferreds were created at once).
2. We need to call doWork before the for loop so that the generator is created outside the loop. thus making our way through the URLs (calling it inside the loop would give us all four URLs every iteration).
3. We removed the result-processing callback on the deferred list because coop.coiterate swallows our results; if we need to process, we have to do it with pageCallback.
4. We still use a deferred list as the means to determine when all the batches have finished.

This example could have been written much more concisely: the doWork function could have been left in test as a generator expression and test's for loop could have been a list comprehension. However, the point is to show very clearly what is going on.

I hope these examples were informative and provide some practical insight on working with deferreds in your Twisted projects :-)
Source

No comments:

Post a Comment