Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information.
Flamholz Zachary N, Crane-Droesch Andrew, Ungar Lyle H, +1 more·Journal of biomedical informatics
OBJECTIVE: Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings. MATERIALS AND METHODS: We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the…