What is the thinking behind static generation? That is, the idea of when and how it’s beneficial to do computation ahead-of-time rather than on pageload, like a CMS would do.
Static generation allows you to change the way that costs occur. This change doesn’t always equal less cost, but it’s about paying different costs at different times.
Storage is cheap, whether it’s disk storage or services like S3, and it tends to be cheap both for time and for space. Thus it doesn’t hurt too much to store content that’s unlikely to be viewed.
Computation is expensive: despite the miracles of technology, processing power and RAM still have bounds with power consumption, heating, and so on. Services like EC2 allow you to slice this cost much thinner, reserving instances for shorter periods of time - but keeping a big box on all the time will still mean a big bill.
The complexity of a problem decides whether it’ll be solvable with static generation. For instance, look at game complexity. At first glance, tic-tac-toe and chess appear to be the same sort of problem, but tic-tac-toe has only 255,168 possible positions - in contrast to chess’s 10^47.
Consider then, a blog with 10 posts. A basic site would include ten individual pages for posts, and one front page: 11 things to generate. If the posts have tags - let’s say eight different tags, then we’ll need to generate 10 post pages, a home page, and 8 tag listings.
How about combining up to two tags, like metafilter allows?
That’ll be 8 tag pages, plus 10 post pages, a home page, and 56 unique combinations of ordered tags. It gets worse: with 20 tags, it would be 380 tag combinations.
In many cases, you can use estimation or a heuristic to greatly change the type of complexity possible. Instead of generating infinitely many floating point value solutions to a problem, you can generate only integer solutions and round: this is the approach that grids, like UTFGrid, take.
You can think of this as filling a cache ahead of time, and you’re limiting the type and number of variables that can have values. For instance, a map request with the WMS standard, designed for big, dynamic servers, looks like
http://nowcoast.noaa.gov/wms/com.esri.wms.Esrimap/obs? service=wms&version=1.1.1&request=GetMap&width=512& height=512&srs=EPSG:3857& bbox=-12523442.7125,3757032.8137499997,-11271098.44125,5009377.085& layers=RAS_RIDGE_NEXRAD&format=image/png&transparent=true
While there are many things to complain about here, the clearest is the
bbox argument, which can contain any four floating-point numbers: so practically infinite different values.
An OpenStreetMap tile, on the other hand, has a URL that looks like
There are three variable parts here, and 68,719,476,736 potential values, for zoom levels up to 18. That’s a big number, but far short of infinity.
Finally, it’s important to remember that making one part of the chain static doesn’t instantiate everything. If you’re using a CDN, a fresh site isn’t in the cache yet. If data is on a hard disk, your disk caches won’t be hot yet. And everyone hitting a site for the first time has an empty browser cache.
This isn’t stated just for completeness: it actually affected MapBox very significantly. EBS volumes cloned from snapshots, for instance, are backed by S3 - lazily. The first reads on them are extremely slow, to the point of being network-latency slow.
Hopefully this has made static generation a little bit clearer. Once you’ve got the concept, it starts turning up in unusual places, like materialization in databases, and templates in programming. It’s also useful to think about making services static-able in the same way you think about making them testable: as a design element.