Hwanjun Song

PhD Candidate in Machine Learning



I am a forth-year Ph.D candidate in Graduate School of Knowledge Service Engineering at KAIST. Currently, my advisor is Prof. Jae-Gil Lee, and I am a representative student in Data Mining Lab.

I am going to work at Google Research (HQ, Mountain View) as a research intern under the supervision of two hosts, Eunyoung Kim and Ming-Hsuan Yang.

My general research interests lie in improving the performance of machine learning (ML) techniques under real-world scenarios. I am particularly interested in designing more advanced approaches to handle large-scale and noisy data, which are two main real-world challenges to hinder the practical use of ML approaches.


  • Trustworthy ML
  • Large-scale ML
  • Real-world ML challenges


  • PhD in Graudate School of Knowledge Service Engineering (Sep. 2016 ~ )

    Korea Advanced Institute of Science and Technology (KAIST)

2020’s Accomplish­ments

I got an offer from Google Research (HQ, Mountain View)

A full paper got accepted at PAKDD 2020
Title: Revisit Prediction by Deep Survival Analysis


Revisit Prediction by Deep Survival Analysis (PAKDD 2020, To appear)

In this manuscript, we introduce SurvRev, a next-generation revisit prediction model that can be tested directly in the business. The …

TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data (TheWebConf 2020, To appear)

Finding low-dimensional embeddings of sparse high-dimensional data objects is important in many applications such as recommendation, …

Prestopping: How Does Early Stopping Help Generalization against Label Noise? (Arxiv 2019)

Noisy labels are very common in real-world training data, which lead to poor generalization on test data because of overfitting to the …

Carpe Diem, Seize the Samples Uncertain "At the Moment" for Adaptive Batch Selection (Arxiv 2019)

The performance of deep neural networks is significantly affected by how well mini-batches are constructed. In this paper, we propose a …

SELFIE: Refurbishing Unclean Samples for Robust Deep Learning (ICML 2019)

Owing to the extremely high expressive power of deep neural networks, their side effect is to totally memorize training data even when …

RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm based on Random Partitioning (SIGMOD 2018)

In most parallel DBSCAN algorithms, neighboring points are assigned to the same data partition for parallel processing to facilitate …

PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency (KDD 2017)

The k-medoids algorithm is one of the best-known clustering algorithms. Despite this, however, it is not as widely used for big data …


  • 291 Daehak-ro, Daejeon, 34141, Republic of Korea
  • Building E2-1 Room 1217