Skip to main content
Fig. 4 | BMC Medical Informatics and Decision Making

Fig. 4

From: Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Fig. 4

Zip's eponymous law [76]. Illustrated by a selection of word counts based on the analysis of the 863 937 clinical notes included in the dataset we use to conduct experiments, which yielded 863 937 unique unigram tokens and 1 803 428 common phrases in the knowledge base. Frequent words account for a large percentage of the text, but a large portion of words appear at a low frequency

Back to article page