From: Leveraging text skeleton for de-identification of electronic medical records
 | i2b2–2006 | i2b2–2014 | Chinese |
---|---|---|---|
Number of records | 669 | 1304 | 9700 |
Number of tokens | 560,852 | 1,005,582 | 3,026,944 |
Number of PHIs | 19,498 | 28,862 | 48,072 |
Number of PHI tokens | 29,917 | 38,435 | 137,496 |
Vocabulary Size | 20,254 | 41,879 | 32,265 |
Percentage of ID | 24.6% | 3.6% | 8.8% |
Percentage of DATE | 36.4% | 43.2% | 38.9% |
Percentage of HOSPITAL | 12.3% | 8.0% | 2.2% |
Percentage of DOCTOR | 19.2% | 16.6% | 14.7% |
Percentage of PATIENT | 4.7% | 7.6% | 17.3% |
Percentage of AGE | 0.1% | 6.9% | 16.1% |