Gene clustering: 2001: 2
Significance Test
§Pick a candidate minimum cluster size to test
• Þ set of possible clusters (good and bad)
§Some clusters are pure
•are they significant?
§Null hypothesis:
•Categories are unrelated to the text features
§What is the chance of getting a pure cluster if categories are assigned randomly
•if chance is less than (1-c),
 then we have confidence c in the real pure clusters.