In my previous post about Google's AppEngine horrible price hike I jokingly concluded that to optimize my apps in order to make appengine costs reasonable again the only solution would be to remove any useful code.
The higher the latency for your page, the more instances get spawned by AppEngine. Given the new prohibitive prices this can get very expensive even if your pages use little RAM and CPU. One of the reasons a very simple page can take 50ms instead of 6ms to return is simply the use of memcache.
AppEngine's memcache has a confusing name. It's not in your instances memory memory otherwise it couldn't be shared and consistent across all instances (which it is).
For many apps though we don't really care about consistency between instances. For example my Horoscope HD app just gets a simple cached page that only changes once a day. The same happens on the associated website. My previous implementation used AppEngine's memcache for every request. This used very little CPU and was therefore pretty cost-efficient with the previous prices.
It turns out that to greatly reduce latency, all I had to do was declare a dictionary before my class:
_instanceCache = dict()
class MainHandler(webapp.RequestHandler):
Then before trying to get the same key from memcache I'd try to get it from my own instance. It is vastly faster because this is real memcache. The app now has three levels of cache:
- Instance dictionnary: very fast (just like real memcache)
- AppEngine memcache: fast enough
- AppEngine DataStore: rather slow and costly
After this change the daily cost for this particular app went back down to 0.14$ /day. Note that the price was also largely affected by the fact that I removed the Always On instances which cost me (future) money doing nothing. With Always on instances and no "instance cache" the site was bound to cost me nearly 6.– $ per day....
Sadly you can't do this type of optimization for all types of apps but for ones that merely serve cached content this can be a real life saver. Depending on the variety of content it might be wise to put a timestamp in your cache entries but in this case I can live with the idea that the instance will be killed if it ever uses too much memory (a rare occurence given the size of the data) and that the data for a given key never changes.
That doesn't change the fact that the price increase is outrageous. If I needed a lot of processing and data consistency I'd be pretty much stuck. I guess my private statistics site (which was promising to cost up to 400$ monthly) could be optimized a lot using instance variables but I am not taking the risk to do all this work again to see Google change the rules at their will. I did trash the site and all the hardwork that went into it was rendered moot by this insane new pricing.
Update: Using Google's front-end cache
Tammo Freese recently replied to my GAE Groups post suggesting that for an even more efficient cache one could set the Cache-Control header that should enable caching by Google's front end cache, generating zero actual requests (nor costs).
The following code would do the trick:
seconds_valid = 15*60 # 15 minutes
self.response.headers['Cache-Control'] = "public, max-age=%d" % seconds_valid
He added: You see the front-end cache working if your log shows "204 No Content" log entries with 0 CPU seconds in your log.
Update
I have just tested the above code and it works fine. A lot of content now gets cached properly and returns code 204.