Oh wow. My feature extractor for a bag of words model written in julia went from taking 4 seconds to do 1000 tweets to .03 seconds by using a hash to check identity. Awesome. Now when I run over all 3.7 million tweets it won't take nearly as long


Unless I try to run it on the machine with insufficient ram. Whoops.

