This test is balanced between different areas of the language and different types of code. It's not all math, all string processing, or all timing simple loops. In addition to having tests in many categories, the individual tests were balanced to take similar amounts of time on currently shipping versions of popular browsers.
One of the challenges of benchmarking is knowing how much noise you have in your measurements. This benchmark runs each test multiple times and determines an error range (technically, a 95% confidence interval). In addition, in comparison mode it tells you if you have enough data to determine if the difference is statistically significant.