Seminar Presentation: From Data Scarcity to Scalable Insights: An End-to-End AI Pipeline for Aviation Safety Analysis
- Ñý¼§Ö±²¥
 
Talk Title: "From Data Scarcity to Scalable Insights: An End-to-End AI Pipeline for Aviation Safety Analysis"
Abstract:
"The analysis of textual narratives is crucial for proactive aviation safety management, yet is hampered by challenges of domain-specific language and severe class imbalance, where critical but rare incident data is underrepresented. In this talk, I will present an end-to-end framework that systematically overcomes these obstacles using a data-centric approach. I will demonstrate how to create robust analytical pipelines by progressing from foundational domain-specific language models to advanced, scalable synthetic data generation systems that ensure factual accuracy and practical efficiency.
Our methodology begins by developing Aviation-BERT to create a strong baseline, which in turn reveals the critical challenge of data scarcity for rare events. To address this, we introduce a novel method using a Knowledge Graph (KG) to ground a Large Language Model (LLM), generating factually accurate synthetic reports that improve model performance. To operationalize this solution, we present an efficient dual-agent framework for automated quality assurance and scalable data augmentation. This integrated solution transforms raw, imbalanced data into a valuable asset for building more reliable models, paving the way for more automated and proactive safety management systems in aviation."
Speaker Bio:
"Xiao Jing is a Ph.D. candidate in Computational Science and Engineering at Georgia Institute of Technology, advised by Prof. Dimitri Mavris in the Aerospace Systems Design Laboratory (ASDL). Prior to his doctoral studies, he earned a Master's degree in Aerospace Engineering from Georgia Tech. He also gained research experience as a Graduate Research Intern at Oak Ridge National Laboratory (ORNL), focusing on generative AI and HPC applications for physical systems modeling. His doctoral research focuses on natural language processing (NLP) and large language models (LLMs) for aviation safety, with particular emphasis on domain adaptation, knowledge-enhanced generation, and agentic AI systems for addressing data challenges and enabling AI-driven safety analysis."
