Hwanjun Song

PhD Candidate in Machine Learning



I am a forth-year Ph.D candidate in Graduate School of Knowledge Service Engineering at KAIST. Currently, my advisor is Prof. Jae-Gil Lee, and I am a representative student in Data Mining Lab.

Now, I am working on a research project as a research intern (Google Research, 2020 July – December) under the supervision of two hosts, Eunyoung Kim and Ming-Hsuan Yang, and two mentors, Dr. Deqing Sun and Dr. Varun Jampani.

My general research interests lie in improving the performance of machine learning (ML) techniques under real-world scenarios. I am particularly interested in designing more advanced approaches to handle large-scale and noisy data, which are two main real-world challenges to hinder the practical use of ML approaches.


  • Trustworthy ML
  • Large-scale ML
  • Real-world ML challenges
  • Computer Vision


  • PhD in Graudate School of Knowledge Service Engineering (Sep. 2016 ~ )

    Korea Advanced Institute of Science and Technology (KAIST)

2020’s Accomplish­ments

A full paper got accepted at KDD 2020 (Top Conference)
Title: Hi-COVIDNet: Deep Learning Approach to Predict Inbound COVID-19 Patients and Case Study in South Korea

I got an offer from Google Research (HQ, Mountain View)

A full paper got accepted at PAKDD 2020
Title: Revisit Prediction by Deep Survival Analysis


Carpe Diem, Seize the Samples Uncertain "At the Moment" for Adaptive Batch Selection (CIKM 2020, To Appear)

The accuracy of deep neural networks is significantly affected by how well mini-batches are constructed during the training step. In …

Ada-Boundary: Accelerating DNN Training via Adaptive Boundary Batch Selection (Machine Learning 2020, SCIE IF=2.672, ECML-PKDD Journal Track, To Appear)

Neural networks converge faster with help from a smart batch selection strategy. In this regard, we propose Ada-Boundary, a novel and …

Learning from Noisy Labels with Deep Neural Networks: A Survey (Arxiv 2020, Under Review)

Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of …

Hi-COVIDNet: Deep Learning Approach to Predict Inbound COVID-19 Patients and Case Study in South Korea (KDD 2020, To Appear)

The escalating crisis of COVID-19 has put people all over the world in danger. Owing to the high contagion rate of the virus, COVID-19 …

How Does Early Stopping Help Generalization against Label Noise? (ICMLW 2020)

Noisy labels are very common in real-world training data, which lead to poor generalization on test data because of overfitting to the …

Revisit Prediction by Deep Survival Analysis (PAKDD 2020)

In this manuscript, we introduce SurvRev, a next-generation revisit prediction model that can be tested directly in the business. The …

TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data (TheWebConf 2020)

Finding low-dimensional embeddings of sparse high-dimensional data objects is important in many applications such as recommendation, …

MLAT: Metric Learning for kNN in Streaming Time Series (KDDW 2019)

Learning a good distance measure for distance-based classification in time series leads to significant performance improvement in many …

SELFIE: Refurbishing Unclean Samples for Robust Deep Learning (ICML 2019)

Owing to the extremely high expressive power of deep neural networks, their side effect is to totally memorize training data even when …

RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm based on Random Partitioning (SIGMOD 2018)

In most parallel DBSCAN algorithms, neighboring points are assigned to the same data partition for parallel processing to facilitate …


  • 291 Daehak-ro, Daejeon, 34141, Republic of Korea
  • Building E2-1 Room 1217