标题:Comparison of empirical study power in sample size calculation approaches for cluster randomized trials with varying cluster sizes – a continuous outcome endpoint
摘要:Background: Cluster randomized trials (CRTs) are a popular trial design. In most CRTs, researchers assume equal cluster sizes when calculating sample sizes. When clusters vary, assuming equal sized clusters may result in low study power. There are two common approaches to sample size calculations for varying cluster sizes. One approach uses a harmonic mean ( m̄H ) of cluster sizes, while the other incorporates the squared coefficient of variation ( cv2 ) of cluster sizes. We performed simulations to compare empirical power between the two methods as well as the arithmetic mean method for a continuous endpoint. Study design: We considered cluster sizes that follow uniform distributions and performed 20,000 simulations under each scenario. Endpoints were analyzed using: 1) an individual-level linear regression model with Gaussian random intercepts for clusters; 2) an individual-level t-statistic with cluster-robust standard errors; 3) a generalized estimating equations (GEE) model with exchangeable correlation structure; and 4) a GEE model with independent correlation structure and robust standard errors. Results: When the Gaussian random effects or the GEE model with exchangeable correlation structure was considered, the m̄H method had 80% power. The cv2 method had power of 85%–88%. However, when the data were analyzed using a t-statistic or the GEE model with independent correlation structure, the power of cv2 method was 80%. The m̄H method produced power of 71%–76%. Conclusion: The performance of the sample size methods depends on the data analysis approaches. The degree of disparity in power depends also on the intracluster correlation coefficient. These findings emphasize the maxim that researchers should consider methods of analysis when designing CRTs to allow for appropriate sample size calculations.