LLM4Cluster

The first Workshop on
Large Language Models for Advanced Clustering Techniques


In conjunction with ICDM 2025
the 25th IEEE International Conference on Data Mining

November 12-15 2025, Washington DC, USA

Topics of interest

Call for papers

LLM4Cluster


Clustering is indispensable in many fields where data is unlabeled, aiming to group similar data points into meaningful clusters. Recent studies have shown that using the world knowledge in the Large Language Models (LLM) can significantly improve the clustering performance. Therefore, the primary goal of this workshop is to investigate how to harness the distinctive strengths and generalization abilities of LLM to address issues such as the vague cluster boundaries, the hard-to-set parameters, and the supervision signals for optimization encountered by conventional/deep clustering methods, and thus to promote the advancement of the clustering domain. The workshop will explore clustering methodologies and practices that extend beyond the specified data types, aiming to reveal the extensive potential of LLM across diverse application contexts.

Topics of interest

We welcome all papers that are related but not limited to the following topics:
  • Investigation into the reinforcement between LLM and clustering, e.g., low-rank adaptation of LLM by clustering and enhancing clustering stability using LLM
  • Novel approaches integrating LLM into classical/deep clustering methods
  • Interpretability of the clustering process leveraging LLM
  • Efficient clustering models for handling large-scale data
  • LLM-guided incremental clustering for dynamic data
  • Generalizable LLM-based clustering models for cross-domain data
  • Employing multi-modal LLM for multi-view/modal clustering
  • Leveraging LLM to supply external knowledge for few-shot clustering
  • Enabling LLM to understand topological information for graph clustering
  • Utilizing LLM to guide clustering with unknown parameters such as the subspace number and the cluster number
  • LLM-driven chain-of-thought prompting for advanced clustering and complex problem solving
  • Establishing new benchmarks and evaluations for clustering assessment
And clustering application papers that are focused on but not limited to the following domains:
  • Recommender systems
  • Information retrieval
  • Computer vision
  • Natural language processing
  • Social network analysis
  • Anomaly detection
  • Automatic speech recognition
  • Transportation networks
  • Artificial intelligence for science/engineering
  • Financial technology

Program

The workshop will be held on November 12th during ICDM 2025 in the Massachusetts Room, The Capital Hilton Hotel.

Time Event Presenter(s)
13:30-13:35 Opening Remark Wei Ye
13:35-14:30 Keynote Speech: Low Resource Textual Anomaly Detection with Large Language Models Ming Liu
14:30-14:45 Text Clustering with Large Language Models: Recent Advances, Applications, and Experimental Insights Kamal Taha
14:45-15:00 LLM-Enhanced Semantic Clustering for Predictive Industrial Incident Detection: A Zero-Shot Approach to Safety Analytics Magzhan Kairanbay, Basaveswar Reddy Mora Reddy, Sayed Ahmed Khalaf, Ali Alshehab, and Ahmed Bahar
15:00-15:15 New Topic Discovery Using LLM Analysis and Entropy Based Clustering of Short Texts Allen Detmer, Raj Bhatnagar, and Jillian Aurisano
15:15-15:30 Break
15:30-15:45 Orthogonal Semi-Nonnegative Tensor Factorization based Tensorized Label Learning Yuxuan Liu, Quanxue Gao, Jingjing Xue, Jing Li, and Cheng Deng
15:45-16:00 Anchor Label Guided Co-Clustering via Multi-Anchor Graphs Fusion Guangyu Yang, Xuhong Dong, Rui Wang, Quanxue Gao, and Yu Duan
16:00-16:30 Coffee Break
16:30-17:15 Keynote Speech: Some Challenges in Clustering Analysis Jicong Fan
17:15-17:30 Compute-Efficient Prediction of Rhetorical Relations via Late Fusion of Semantic Perturbation and Attention Features Yoga Harshitha Duddukuri

Important dates

Submission deadline: Aug 29, 2025 Sep 05, 2025
Paper notification: Sep 15, 2025 Sep 19, 2025
Early Bird Registration deadline: Sep 24, 2025 Sep 30, 2025
Camera-ready: Sep 25, 2025 Oct 5, 2025. The camera ready submission guide is here.
Workshop: Tentatively on Nov 12, 2025 (the first day)

Submission


All the accepted papers can be up to 10 pages including references in the IEEE 2-column format. Submissions longer than 10 pages will be rejected without review. All submissions will be triple-blind reviewed by the Program Committee based on technical quality, relevance to the scope of the conference, originality, significance, and clarity. Manuscripts must be submitted electronically through the online submission system: Submit your paper here. Accepted papers will be included in the ICDM Workshop Proceedings (separate from ICDM Main Conference Proceedings), and each workshop paper requires a full registration. Meanwhile, duplicate submissions of the same paper to more than one ICDM workshop are forbidden.



Organization

Organizing Committee


Program Co-Chairs

Wei Ye

Wei Ye, PhD Tenure-Track Professor Tongji University, Shanghai Innovation Institute, China

Ye Zhu

Ye Zhu, PhD Senior Lecturer Deakin University, Australia

Sourav Medya

Sourav Medya, PhD Assistant Professor University of Illinois at Chicago, USA

Xin Sun

Xin Sun, PhD Professor City University of Macau, China

Benjamin Roth

Benjamin Roth, PhD Professor University of Vienna, Austria

Christian Böhm

Christian Böhm, PhD Professor University of Vienna, Austria

Claudia Plant

Claudia Plant, PhD Professor University of Vienna, Austria

Publicity Co-Chairs

Wengang Guo

Wengang Guo PhD Candidate Tongji University, China

Chunchun Chen

Chunchun Chen PhD Candidate Tongji University, China

Xing Wei

Xing Wei PhD Candidate Tongji University, China

Program Committee Members


  • Carolina Atria - University of Vienna, Austria
  • Linchuan Zhang - Tongji University, China
  • Zhaokai Sun - Tongji University, China
  • Maozheng Li - Tongji University, China
  • Yue Niu - Tongji University, China
  • Andrii Shkabrii - University of Vienna, Austria
  • Jiayi Yang - Tongji University, China
  • Chenrun Wang - Tongji University, China
  • Chenyi Xiong - Tongji University, China
  • Yang Liu - Shanghai Innovation Institute, China
  • Jinyang Wu - Shanghai Innovation Institute, China

Keynote Speakers

Ming Liu

Ming Liu, PhD Senior Lecturer Deakin University, Australia

Title: Low Resource Textual Anomaly Detection with Large Language Models

Abstract:

Two-step approaches combining pre-trained large language model embeddings and anomaly detectors demonstrate strong performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models pose challenges due to substantial memory requirements and high computation time. In this talk, I will first show our intensive empirical results for text anomaly detection with the two stage approach, and then talk about the combined approach with learned VAE and LLM embeddings. At last I will introduce our more recent Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping. Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 state-of-the-art (SOTA) anomaly detection algorithms while maintaining computational efficiency and low memory cost.

Bio:

Dr Ming Liu is a senior lecturer in Deakin University, Australia. He works on Natural Langauge Processing and Machine Learning. He proposed the "Learn to actively learn" approach for active learning and developed a few efficient text summarization models/pipelines (e.g. SummPip, SciSummPip, GLIMMER), most of which are widely used in low-resource text generation settings. Dr Ming has strong interest in solving real world text mining problems, paticularly in domain specific settings. Dr Ming is the area chair in ACL Rolling Reveiw and the session chair in EMNLP2025.

Jicong Fan

Jicong Fan, PhD Assistant Professor School of Data Science, The Chinese University of Hong Kong, Shenzhen

Title: Some Challenges in Clustering Analysis

Abstract:

Despite the proliferation of hundreds of clustering algorithms over the past decades, significant and persistent challenges continue to complicate their application in real-world scenarios. This talk discuses these challenges, focusing on: 1) model selection and hyperparameter tuning; 2) clustering non-Euclidean data such as graphs; 3) clustering mixed-type and multi-model data; 4) clustering on extremely large-scale datasets. The talk also introduces some possible solutions to these challenges and shows some numerical comparison of different clustering algorithms across hundreds of datasets.

Bio:

Jicong Fan is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. Professor Fan previously obtained his Ph.D. from the Department of Electronic Engineering, City University of Hong Kong in 2018, and his Master’s degree in Control Science and Engineering and Bachelor’s degree in Automation from Beijing University of Chemical Technology in 2013 and 2010 respectively. Prior to joining CUHK-Shenzhen, he was a postdoc associate at Cornell University. He held research positions at The University of Wisconsin-Madison and The University of Hong Kong in 2018 and 2015 respectively. Professor Fan's research interests are Artificial Intelligence and Machine Learning. Particularly, he has done a lot of work on matrix/tensor methods, clustering algorithms, anomaly/outlier/fault detection, graph learning, and automated machine learning. His research has been published on prestigious journals and conferences such as IEEE TSP/TNNLS/TII, KDD, NeurIPS, CVPR, ICML, ICLR, AAAI, and IJCAI. He is a senior member of IEEE, is serving as an associate editor for two international journals including Pattern Recognition and Neural Processing Letters, and is an area chair of NeurIPS and ICML. He won the first prize of the Natural Science Award of Chinese Association of Automation in 2023.


Contact for information

If you have any questions regarding the workshop, feel free to reach out to us:
E-mail: yew@tongji.edu.cn, c2chen@tongji.edu.cn