For the subtask A, we have designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set (150 annotated papers). Then, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top twenty outputted concepts. Official results here. Thanks to BioCreative IV, we were able to design a complete pipeline for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts along with a set of evidence sentences.
|