Gene clustering: 2001:
2
Significance
Test
§
Pick a candidate minimum cluster size to test
•
Þ
set of possible clusters (good and bad)
§
Some clusters are pure
•
are they significant?
§
Null hypothesis:
•
Categories are unrelated to the text features
§
What is the chance of getting a pure cluster if
categories are assigned randomly
•
if chance is less than (1-
c
),
then we have confidence
c
in the real pure clusters.