Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition

被引：4

作者：

Weissenbacher, Davy ^{[1
]}

O'Connor, Karen ^{[2
]}

Rawal, Siddharth ^{[2
]}

Zhang, Yu ^{[3
]}

Tsai, Richard Tzong-Han ^{[3
,4
,5
]}

Miller, Timothy ^{[6
,7
]}

Xu, Dongfang ^{[6
,7
]}

Anderson, Carol ^{[8
]}

Liu, Bo ^{[8
]}

Han, Qing ^{[9
]}

Zhang, Jinfeng ^{[9
]}

Kulev, Igor ^{[10
]}

Koprue, Berkay ^{[10
]}

Rodriguez-Esteban, Raul ^{[11
]}

Ozkirimli, Elif ^{[10
]}

Ayach, Ammer ^{[12
]}

Roller, Roland ^{[12
]}

Piccolo, Stephen ^{[13
]}

Han, Peijin ^{[14
]}

Vydiswaran, V. G. Vinod ^{[15
,16
]}

Tekumalla, Ramya ^{[17
]}

Banda, Juan M. ^{[17
]}

Bagherzadeh, Parsa ^{[18
]}

Bergler, Sabine ^{[18
]}

Silva, Joao F. ^{[19
]}

Almeida, Tiago ^{[19
,20
]}

Martinez, Paloma ^{[21
]}

Rivera-Zavala, Renzo ^{[21
]}

Wang, Chen-Kai ^{[22
,23
]}

Dai, Hong-Jie ^{[24
]}

Hernandez, Luis Alberto Robles ^{[17
]}

Gonzalez-Hernandez, Graciela ^{[1
]}

机构：

[1] Cedars Sinai Med Ctr, Dept Computat Biomed, Los Angeles, CA 90048 USA

[2] Univ Penn, Perelman Sch Med, DBEI, Philadelphia, PA USA

[3] Natl Cent Univ, Dept Comp Sci & Informat Engn, 300, Zhongda Rd, Taoyuan 320, Taiwan

[4] Natl Taiwan Univ, IoX Ctr, Sect 4,Roosevelt Rd, 1 Barry Lam Hall, Taipei 106, Taiwan

[5] Acad Sinica, Res Ctr Humanities & Social Sci, 128, Sect 2,Acad Rd, Taipei 115, Taiwan

[6] Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA USA

[7] Harvard Med Sch, Dept Pediat, Boston, MA USA

[8] NVIDIA, Santa Clara, CA USA

[9] Florida State Univ, Dept Stat, Tallahassee, FL USA

[10] F Hoffmann La Roche Ltd, Data & Analyt Chapter, Basel, Switzerland

[11] Roche Innovat Ctr Basel, Pharmaceut Res & Early Dev, Basel, Switzerland

[12] DFKI, Speech & Language Technol Lab, Berlin, Germany

[13] Brigham Young Univ, Dept Biol, Provo, UT USA

[14] Univ Michigan, Med Sch, Dept Computat Med & Bioinformat, Ann Arbor, MI USA

[15] Univ Michigan, Med Sch, Dept Learning Hlth Sci, Ann Arbor, MI USA

[16] Univ Michigan, Sch Informat, Ann Arbor, MI USA

[17] Georgia State Univ, Dept Comp Sci, Atlanta, GA USA

[18] Concordia Univ, CLaC Labs, Montreal, PQ, Canada

[19] Univ Aveiro, Inst Elect & Informat Engn Aveiro, DETI, Aveiro, Portugal

[20] Univ A Coruna, Dept Computat, La Coruna, Spain

[21] Univ Carlos III Madrid, Comp Sci & Engn Dept, Madrid, Spain

[22] Big Data Lab, Chunghwa Telecom Labs, Taoyuan, Taiwan

[23] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan

[24] Natl Kaohsiung Univ Sci & Technol, Coll Elect Engn & Comp Sci, Dept Elect Engn, Kaohsiung, Taiwan

来源：

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2023年 / 2023卷

关键词：

TWITTER; CORPUS; DRUGS;

D O I：

10.1093/database/baac108

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at . The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.

引用

页数：12

共 63 条

[1]

Alsentzer Emily., 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop, P72, DOI [10.18653/v1/W19-1909, DOI 10.18653/V1/W19-1909]

[2]

Alvaro Nestor, 2017, JMIR Public Health Surveill, V3, pe24, DOI [10.2196/publichealth.6396, 10.2196/publichealth.6396]

[3]

Anderson C., 2021, P BIOCREATIVE 7 CHAL

[4]

[Anonymous], 2017, Training

[5]

Bagherzadeh P, 2021, P DEEP LEARNING INSI, P108

[6]

Bagherzadeh P., 2021, P BIOCREATIVE 7 CHAL

[7]

Barbieri F., 2020, FINDINGS ASS COMPUTA

[8]

Basaldella M, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P3122

[9] Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach [J].

Batbaatar, Erdenebileg ;

Ryu, Keun Ho .

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2019, 16 (19)

[10]

Baziotis C., 2017, P 11 INT WORKSHOP SE, P747, DOI DOI 10.18653/V1/S17-2126

← 1 2 3 4 5 6 7 →