QED Working Paper Number
Inference for estimates of treatment effects with clustered data requires great care when treatment is assigned at the group level. This is true for both pure treatment models and difference-in-differences regressions. Even when the number of clusters is quite large, cluster-robust standard errors can be much too small if the number of treated (or control) clusters is small. Standard errors also tend to be too small when cluster sizes vary a lot, resulting in too many false positives. Bootstrap methods generally perform better than t-tests, but they can also yield very misleading inferences in some cases.
wild cluster bootstrap