Testing for the appropriate level of clustering in linear regression models

QED Working Paper Number

1428

Reliable inference with clustered data has received a great deal of attention in recent years. The overwhelming majority of this research assumes that the cluster structure is known. This assumption is very strong, because there are often several possible ways in which a dataset could be clustered. We propose two tests for the correct level of clustering. One test focuses on inference about a single coefficient, and the other on inference about two or more coefficients. We also prove the asymptotic validity of a wild bootstrap implementation. The proposed tests work for a null hypothesis of either no clustering or "fine'' clustering against alternatives of "coarser'' clustering. We also propose a sequential testing procedure to determine the appropriate level of clustering. Simulations suggest that the bootstrap tests perform very well under the null hypothesis and can have excellent power. An empirical example suggests that using our tests leads to sensible inferences.

Author(s)

James G. MacKinnon

Morten Ørregaard Nielsen

JEL Codes

Keywords

CRVE

grouped data

clustered data