Summary
Overview
Work History
Education
Research Interests
Publications (Google Scholar)
Academic Services
Timeline
Generic

Yao Ge

Ph.D Candidate, Emory University, Atlanta,

Summary

Fifth year Ph.D. student majoring in computer science with a background in machine learning and natural language processing. Strong technical proficiency with work history in text mining, named entity recognition, and information extraction, paired with vast research experience evaluating existing machine learning processes, conducting statistical analysis to resolve data set problems, and enhancing the performance of model's predictive capabilities. In-depth knowledge of few-shot learning coupled with data analysis abilities. Extensive experience in Bioinformatics, including genomic data analysis, biomedical text mining, and protein structure prediction. Expertise in Generative AI, focusing on text generation, language Model Fine-tuning, and synthetic data generation. Proven history of performance in programming using multiple languages. Self-directed and energetic with superior performance in both autonomous and collaborative environments, working independently and collaborating with others on group projects.

Overview

4
4
years of professional experience

Work History

Research Assistant @ Information Retrieval Lab

Advisor: Prof. Eugene Agichtein
09.2019 - 05.2020
  • Worked on graph based document retrieval

Research Assistant @ Cryptography Lab

Advisor, Prof. Qiuliang Xu
09.2016 - 06.2019
  • Worked on Blockchain based on Ethereum

Education

Ph.D Candidate - Computer Science And Informatics

Emory University
Atlanta, GA
11.2024

Master of Science - Computer Technique

Shandong University
Shandong
06.2019

Bachelor of Science - Information Security

Hainan University
Hainan
06.2016

Research Interests

Natural Language Processing: Applied NLP, Bioinformatics NLP, Generative AI, Information Extraction, Text Mining, Pre-trained Language Models, Neural Networks, Synthetic Data Generation, Conversational AI, Sequence-to-Sequence Models

Few-shot Learning: One-shot / Zero-shot Learning, Meta Learning, Prompt Learning, Low-resource Learning, Transfer Learning

Data Science: Biomedical Text Mining, Named Entity Annotation, Data Processing, Data Analysis

Publications (Google Scholar)

  • Few-shot learning for medical text: A review of advances, trends, and opportunities
  • Yao Ge, Yuting Guo, Yuan-Chi Yang, Mohammed Ali Al-Garadi, Abeed Sarker
  • Journal of Biomedical Informatics, vol 144, pages: 144458. 2023


  • Data Augmentation with Nearest Neighbor Classifier for Few-shot Named Entity Recognition
  • Yao Ge, Mohammed Al-Garadi, Abeed Sarker
  • MEDINFO 2023—The Future Is Accessible, pages: 690-694. 2023


  • Detection of Medication Mentions and Medication Change Events in Clinical Notes Using Transformer-Based Models
  • Yuting Guo, Yao Ge, Abeed Sarker
  • Studies in Health Technology and Informatics, vol 310, pages: 685-689. 2024


  • A comparison of few-shot and traditional named entity recognition models for medical text
  • Yao Ge, Yuting Guo, Yuan-Chi Yang, Mohammed Al-Garadi, Abeed Sarker
  • In Proceedings of The 10th IEEE International Conference on Healthcare Informatics (ICHI). 2022


  • Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification
  • Yuting Guo, Yao Ge, Yuan-Chi Yang, Mohammed Al-Garadi, Abeed Sarker
  • In Healthcare, MDPI. 2022


  • Mining long-COVID symptoms from Reddit: characterizing post-COVID syndrome from patient reports
  • Abeed Sarker, Yao Ge
  • JAMIA open, Sep 2;4(3):ooab075. 2021


  • Signals of increasing co-use of stimulants and opioids from online drug forum data
  • Abeed Sarker, Mohammed Al-Garadi, Yao Ge, Nisha Nataraj, Christopher Jones, Steven Sumner
  • Harm Reduction Journal, 2022


  • Evidence of the emergence of illicit benzodiazepines from online drug forums.
  • Abeed Sarker, Mohammed Al-Garadi, Yao Ge, Nisha Nataraj, Londell McGlone, Christopher M Jones, Steven A Sumner
  • European Journal of Public Health 32, no. 6 (2022): 939-941. 2022


  • How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation
  • Hejie Cui, Jiaying Lu, Yao Ge, Carl Yang
  • The European Conference on Information Retrieval (ECIR). 2022


  • [Preprint] Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media
  • Yao Ge, Sudeshna Das, Karen O'Connor, Mohammed Ali Al-Garadi, Graciela Gonzalez-Hernandez, Abeed Sarker
  • arXiv preprint arXiv:2405.06145. 2024


  • [Preprint] Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data.
  • Yao Ge*, Sudeshna Das*, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker (*equal contribution)
  • arXiv preprint arXiv:2405.19519. 2024


  • [UnderReview] HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and LLMs
  • Yao Ge, Sudeshna Das, Yuting Guo, Abeed Sarker

Academic Services

Program Committee Member | Reviewer

  • The 19th / 20th Annual Workshop of the Australasian Language Technology Association (ALTA)
  • Social Media Mining for Health 2022 (#SMM4H)
  • AMIA 2023 Annual Symposium
  • The 10th IEEE International Conference on Healthcare Informatics (IEEE ICHI 2022)
  • AMIA 2022 Annual Symposium
  • Social Media Mining for Health 2024 (#SMM4H)
  • Biomedical AI Spring 2024 Symposium
  • AMIA 2024 Annual Symposium


Workshop Organizer

  • Social Media Mining for Health 2022 (#SMM4H) (Health Language Processing Lab @ Penn IBI) Task 8: Classification of self-reported chronic stress on Twitter (in English)
  • Social Media Mining for Health 2024 (#SMM4H) (Health Language Processing Lab @ Penn IBI) Task 4: Extraction of the clinical and social impacts of nonmedical substance use from Reddit


Shared Task Participant

  • TREC-COVID: Building a Pandemic Retrieval Test Collection (2020)
  • Social Media Mining for Health 2021 (#SMM4H) Task 1 : Classification, Extraction and Normalization of Adverse Effect mentions in English tweets
  • Australasian Language Technology Association (ALTA) Shared Task 2021: Language Technology Programming Competition
  • 2022 Challenge - National NLP Clinical Challenges (N2C2) Track 1: Contextualized Medication Event Extraction


Teaching

  • CS 170 Introduction To Computer Science I, TA (Spring 2020 and Fall 2020)
  • CS 377 Database Systems, Head TA (Spring 2021)

Timeline

Research Assistant @ Information Retrieval Lab

Advisor: Prof. Eugene Agichtein
09.2019 - 05.2020

Research Assistant @ Cryptography Lab

Advisor, Prof. Qiuliang Xu
09.2016 - 06.2019

Ph.D Candidate - Computer Science And Informatics

Emory University

Master of Science - Computer Technique

Shandong University

Bachelor of Science - Information Security

Hainan University
Yao Ge