The short takehome is that the model projections ran high for a bit in the first part of the century, but observations have caught up in the last few years. In other words, the models are working well at what they’re designed to do. The data contradict a core ‘skeptic’ talking point about the utility of climate models.
The typical talking points criticizing models involves cherrypicking intervals, comparing apples to oranges (comparing the surface temperatures with which the models are input with and output to satellite data), asking models to perform tasks they’re not designed for (projecting external forcings like volcanic eruptions or short-term variation), or using incorrect baselines.
These talking points have become much harder to defend with science in recent years, although they remain prominent in political rhetoric.
This page is an ongoing effort to compare observations of global temperature with CMIP5 simulations assessed by the IPCC 5th Assessment Report. The first two figures below are updated versions of Figure 11.25a,b from IPCC AR5 which were originally produced in mid-2013.
The first panel shows the raw ‘spaghetti’ projections, with different observational datasets in black and the different emission scenarios (RCPs) shown in colours. The simulation data uses spatially complete coverage of surface air temperature.
The attached plot shows combined CMIP5 TAS land-only and TOS sea-surface temperature simulations, à la Cowtan et al. 2015 , to global instrumental land + sea-surface temperatures. I've reproduced the model runs in  in a simpler way (by averaging paired TAS land + TOS runs weighted by the global proportions of land and water, 0.28 and 0.72), but with very similar results, and for all four RCP scenarios. As you can see in the plot (comparable to Figure 4a of ), an apples-to-apples comparison of instrumental land+SSTs to combined CMIP5 TAS+TOS realizations comes out pretty well for the models.
I keep a simpler version of this plot (RCP 8.5 only, since the CMIP5 scenarios don't really diverge till after about 2020) updated biweekly here, if anyone cares to check the models' skill that often:
1. Cowtan K, Hausfather Z, Hawkins E, Jacobs P, Mann ME, Miller SK, Steinman BA, Stolpe MB, and Way RG: Robust comparison of climate models with observations using blended land air and ocean sea surface temperatures. Geophysical Research Letters 42(15):6526–34, 2015.