3.1 Cochran's C ratio
In a balanced design, all data sets have the same number of observations. The G ratio (General summary of G test: § 2.1) then reduces to the wellknown Cochran's C ratio [23]:
Where:
G_{j} = G test statistic for data set j
C_{j} = Cochran's C test statistic for data set j
C_{j} = Cochran's C test statistic for data set j
3.2 Computing critical C values
For balanced designs the critical values C_{UL} and C_{LL }take the reduced forms:
Below you find a simple example spreadsheet to calculate critical values for balanced designs:
Cell contents:
B11: "=1/(1+(B51)/FINV(1B4/B5,B61,(B51)*(B61)))"
B12: "=1/(1+(B51)/FINV(1B4/2/B5,B61,(B51)*(B61)))"
C10: "=1/(1+(B51)/FINV(B4/B5,B61,(B51)*(B61)))"
C12: "=1/(1+(B51)/FINV(B4/2/B5,B61,(B51)*(B61)))"Current practitioners of Cochran's test may prefer to continue working from tables. For their convenience, Appendices A to F provide extensive lists with critical values for balanced designs:
Appendix

Limit

ζ

Onesided α

Twosided α

Upper Limit

0.01

0.01

0.02
 
Upper Limit

0.025

0.025

0.05
 
Upper Limit

0.05

0.05

0.1
 
Lower Limit

0.01

0.01

0.02
 
Lower Limit

0.025

0.025

0.05
 
Lower Limit

0.05

0.05

0.1

3.3 Onesided upper limit G test
When conducting a onesided upper limit G test on balanced data, it is not necessary to evaluate G_{j} for all data sets j. It is sufficient to evaluate:
Where:
G_{max} = G test statistic for the data set with the highest variance
C_{max} = Cochran's C test statistic for the data set with the highest variance
C_{max} = Cochran's C test statistic for the data set with the highest variance
Running a onesided upper limit G test on balanced data is similar to running a traditional Cochran's C test:
Determine the upper limit value C_{UL} according to § 3.2. If G_{max} exceeds C_{UL}, label the corresponding variance value as "exceptionally large", remove the value from the variance data, and repeat the test on the remaining variance values. Continue the process until you have identified all remaining variance outliers.
Several text books [1,2,6] and ISO 5725 [19] provide guidance on conducting a C test. The main advantages of conducting a G test in stead are:
Determine the upper limit value C_{UL} according to § 3.2. If G_{max} exceeds C_{UL}, label the corresponding variance value as "exceptionally large", remove the value from the variance data, and repeat the test on the remaining variance values. Continue the process until you have identified all remaining variance outliers.
Several text books [1,2,6] and ISO 5725 [19] provide guidance on conducting a C test. The main advantages of conducting a G test in stead are:
 The G test is not restricted to the significance levels α = 0.01 and α = 0.05 but will run at any significance level.
 The upper limit values for the G test need not be obtained from tables, but can be computed (§ 3.2).
 The limit values for Cochran's C test have only been tabulated for selected numbers of data sets (L) and selected numbers of observations (n) per data set; whereas the G test allows limit values to be calculated for any combination of L and n.
3.4 Onesided lower limit G test
When conducting a onesided lower limit G test on balanced data, it is not necessary to evaluate G_{j} for all data sets j. It is sufficient to evaluate:
Where:
Where:
G_{min} = G test statistic for the data set with the lowest variance
Cochran's C test does not allow for lower limit tests. A onesided lower limit G test on balanced data is conducted as follows:
Determine the lower limit value C_{LL} according to § 3.2. If G_{min} is less than C_{LL}, label the corresponding variance value as "exceptionally small", remove the value from the variance data, and repeat the test on the remaining variance values. Continue the process until you have identified all remaining variance outliers.
3.5 Twosided G test
Determine the lower limit value C_{LL} according to § 3.2. If G_{min} is less than C_{LL}, label the corresponding variance value as "exceptionally small", remove the value from the variance data, and repeat the test on the remaining variance values. Continue the process until you have identified all remaining variance outliers.
3.5 Twosided G test
When conducting a twosided G test on balanced data, it is not necessary to evaluate G_{j} for all data sets j. It is sufficient to evaluate:
Where:
Where:
G_{max} = G test statistic for the data set with the highest variance
G_{min} = G test statistic for the data set with the lowest variance
Usually a twosided G test on balanced data can be conducted as follows:
Determine the upper limit value C_{UL} and the lower limit value C_{LL} according to § 3.2, taking ζ = α/2. If G_{max} exceeds C_{UL}, label the highest variance value as "deviant" and remove this value from the variance data. Likewise, if G_{min} is less than C_{LL}, label the lowest variance value as "deviant", remove that value from the variance data, and repeat the test on the remaining variance values. Continue the process continues until you have identified all remaining variance outliers.
Occasionally both the highest variance and the lowest variance value may appear "deviant". In that case you should only remove the variance value that has the highest probability to be deviant indeed, and continue the process from there. There are two equivalent ways to identify the variance value that is most likely to be deviant. Conceptually the simplest approach is:
Lower the initial significance level of the twosided test such that only one of the variance values will still be marked "deviant". Remove only that value from the variance data, and repeat the test at the lower significance level on the remaining variance values. Continue the process until you have identified and removed all variance outliers at the lower significance level. Then run the twosided test on the remaining variance data; let the significance level gradually increase until you reach your initial significance level.
While conceptually simple, the above approach has the drawback that it will require more calculation work than is strictly necessary. If you plan to apply the G test routinely, you may prefer to directly calculate the probability that a specific variance is truly deviant according to § 3.6 of the manuscript.
Just want to say a big "Thank you" to Ruben for being so responsive to my questions and helping me understand his formulas better.
ReplyDeleteI am Maintenance Analyst in the U.S. military and I needed some assistance computing Cochran Ccritical values in Excel. My main problem was that there is no easy formula to compute these values and the values are only available inside of a table. Rather than do a Vlookup or Hlookup in excel, I needed to find a way to compute the critical values so that I could automate some of my hypothesis testing.
I emailed Ruben, explaining my issue and hoping that I would get a response. Within a day, I had a quick response my questions and asking me a few too. A week later of emailing back and forth, Ruben explained quite a bit in his emails, gave me a formula in excel to calculate Cochran’s C critical values, and I had a better understanding of this blog and the Gtest. I plan on using what he showed me and incorporating it into my work. With any luck, I can pass this knowledge on throughout my career field, so that no one else has the same issue as me.
For anyone with questions, don't hesitate to ask on this blog or email him like me. Ruben is knowledgeable, friendly, and responsive to any question. Never in my life did I think I would email a random person from the Netherlands that I read about in a blog but I am glad I did it. It was definitely worth it and I enjoyed talking to Ruben for the small amount of time that I interacted with him.
Thanks again for your help, Ruben.