Call for papers
LLM4Cluster
Clustering is indispensable in many fields where data is unlabeled, aiming to group similar data points into meaningful clusters. Recent studies have shown that using the world knowledge in the Large Language Models (LLM) can significantly improve the clustering performance. Therefore, the primary goal of this workshop is to investigate how to harness the distinctive strengths and generalization abilities of LLM to address issues such as the vague cluster boundaries, the hard-to-set parameters, and the supervision signals for optimization encountered by conventional/deep clustering methods, and thus to promote the advancement of the clustering domain. The workshop will explore clustering methodologies and practices that extend beyond the specified data types, aiming to reveal the extensive potential of LLM across diverse application contexts.
Topics of interest
We welcome all papers that are related but not limited to the following topics:
Investigation into the reinforcement between LLM and clustering, e.g., low-rank adaptation of LLM by clustering and enhancing clustering stability using LLM
Novel approaches integrating LLM into classical/deep clustering methods
Interpretability of the clustering process leveraging LLM
Efficient clustering models for handling large-scale data
LLM-guided incremental clustering for dynamic data
Generalizable LLM-based clustering models for cross-domain data
Employing multi-modal LLM for multi-view/modal clustering
Leveraging LLM to supply external knowledge for few-shot clustering
Enabling LLM to understand topological information for graph clustering
Utilizing LLM to guide clustering with unknown parameters such as the subspace number and the cluster number
LLM-driven chain-of-thought prompting for advanced clustering and complex problem solving
Establishing new benchmarks and evaluations for clustering assessment
And clustering application papers that are focused on but not limited to the following domains:
Recommender systems
Information retrieval
Computer vision
Natural language processing
Social network analysis
Anomaly detection
Automatic speech recognition
Transportation networks
Artificial intelligence for science/engineering
Financial technology
Recommender systems
Information retrieval
Computer vision
Natural language processing
Social network analysis
Anomaly detection
Automatic speech recognition
Transportation networks
Artificial intelligence for science/engineering
Financial technology
Program
The workshop will be held on November 12th during ICDM 2025 in the Massachusetts Room, The Capital Hilton Hotel.
| Time | Event | Presenter(s) |
|---|---|---|
| 13:30-13:35 | Opening Remark | Wei Ye |
| 13:35-14:30 | Keynote Speech: Low Resource Textual Anomaly Detection with Large Language Models | Ming Liu |
| 14:30-14:45 | Text Clustering with Large Language Models: Recent Advances, Applications, and Experimental Insights | Kamal Taha |
| 14:45-15:00 | LLM-Enhanced Semantic Clustering for Predictive Industrial Incident Detection: A Zero-Shot Approach to Safety Analytics | Magzhan Kairanbay, Basaveswar Reddy Mora Reddy, Sayed Ahmed Khalaf, Ali Alshehab, and Ahmed Bahar |
| 15:00-15:15 | New Topic Discovery Using LLM Analysis and Entropy Based Clustering of Short Texts | Allen Detmer, Raj Bhatnagar, and Jillian Aurisano |
| 15:15-15:30 | Break | |
| 15:30-15:45 | Orthogonal Semi-Nonnegative Tensor Factorization based Tensorized Label Learning | Yuxuan Liu, Quanxue Gao, Jingjing Xue, Jing Li, and Cheng Deng |
| 15:45-16:00 | Anchor Label Guided Co-Clustering via Multi-Anchor Graphs Fusion | Guangyu Yang, Xuhong Dong, Rui Wang, Quanxue Gao, and Yu Duan |
| 16:00-16:30 | Coffee Break | |
| 16:30-17:15 | Keynote Speech: Some Challenges in Clustering Analysis | Jicong Fan |
| 17:15-17:30 | Compute-Efficient Prediction of Rhetorical Relations via Late Fusion of Semantic Perturbation and Attention Features | Yoga Harshitha Duddukuri |
Important dates
Submission deadline: Aug 29, 2025 Sep 05, 2025
Paper notification: Sep 15, 2025 Sep 19, 2025
Early Bird Registration deadline: Sep 24, 2025 Sep 30, 2025
Camera-ready: Sep 25, 2025 Oct 5, 2025. The camera ready submission guide is here.
Workshop: Tentatively on Nov 12, 2025 (the first day)
Submission
All the accepted papers can be up to 10 pages including references in the IEEE 2-column format. Submissions longer than 10 pages will be rejected without review. All submissions will be triple-blind reviewed by the Program Committee based on technical quality, relevance to the scope of the conference, originality, significance, and clarity. Manuscripts must be submitted electronically through the online submission system: Submit your paper here. Accepted papers will be included in the ICDM Workshop Proceedings (separate from ICDM Main Conference Proceedings), and each workshop paper requires a full registration. Meanwhile, duplicate submissions of the same paper to more than one ICDM workshop are forbidden.
Organization
Organizing Committee
Program Co-Chairs
Wei Ye, PhD
Tenure-Track Professor
Tongji University, Shanghai Innovation Institute, China
Ye Zhu, PhD
Senior Lecturer
Deakin University, Australia
Sourav Medya, PhD
Assistant Professor
University of Illinois at Chicago, USA
Xin Sun, PhD
Professor
City University of Macau, China
Benjamin Roth, PhD
Professor
University of Vienna, Austria
Christian Böhm, PhD
Professor
University of Vienna, Austria
Claudia Plant, PhD
Professor
University of Vienna, Austria
Publicity Co-Chairs
Wengang Guo
PhD Candidate
Tongji University, China
Chunchun Chen
PhD Candidate
Tongji University, China
Xing Wei
PhD Candidate
Tongji University, China
Program Committee Members
Carolina Atria - University of Vienna, Austria
Linchuan Zhang - Tongji University, China
Zhaokai Sun - Tongji University, China
Maozheng Li - Tongji University, China
Yue Niu - Tongji University, China
Andrii Shkabrii - University of Vienna, Austria
Jiayi Yang - Tongji University, China
Chenrun Wang - Tongji University, China
Chenyi Xiong - Tongji University, China
Yang Liu - Shanghai Innovation Institute, China
Jinyang Wu - Shanghai Innovation Institute, China
Keynote Speakers
Ming Liu, PhD
Senior Lecturer
Deakin University, Australia
Title: Low Resource Textual Anomaly Detection with Large Language Models
Abstract:
Two-step approaches combining pre-trained large language model embeddings and anomaly detectors demonstrate strong performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models pose challenges due to substantial memory requirements and high computation time. In this talk, I will first show our intensive empirical results for text anomaly detection with the two stage approach, and then talk about the combined approach with learned VAE and LLM embeddings. At last I will introduce our more recent Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping. Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 state-of-the-art (SOTA) anomaly detection algorithms while maintaining computational efficiency and low memory cost.
Bio:
Dr Ming Liu is a senior lecturer in Deakin University, Australia. He works on Natural Langauge Processing and Machine Learning. He proposed the "Learn to actively learn" approach for active learning and developed a few efficient text summarization models/pipelines (e.g. SummPip, SciSummPip, GLIMMER), most of which are widely used in low-resource text generation settings. Dr Ming has strong interest in solving real world text mining problems, paticularly in domain specific settings. Dr Ming is the area chair in ACL Rolling Reveiw and the session chair in EMNLP2025.
Jicong Fan, PhD
Assistant Professor
School of Data Science, The Chinese University of Hong Kong, Shenzhen
Title: Some Challenges in Clustering Analysis
Abstract:
Despite the proliferation of hundreds of clustering algorithms over the past decades, significant and persistent challenges continue to complicate their application in real-world scenarios. This talk discuses these challenges, focusing on: 1) model selection and hyperparameter tuning; 2) clustering non-Euclidean data such as graphs; 3) clustering mixed-type and multi-model data; 4) clustering on extremely large-scale datasets. The talk also introduces some possible solutions to these challenges and shows some numerical comparison of different clustering algorithms across hundreds of datasets.
Bio:
Jicong Fan is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. Professor Fan previously obtained his Ph.D. from the Department of Electronic Engineering, City University of Hong Kong in 2018, and his Master’s degree in Control Science and Engineering and Bachelor’s degree in Automation from Beijing University of Chemical Technology in 2013 and 2010 respectively. Prior to joining CUHK-Shenzhen, he was a postdoc associate at Cornell University. He held research positions at The University of Wisconsin-Madison and The University of Hong Kong in 2018 and 2015 respectively. Professor Fan's research interests are Artificial Intelligence and Machine Learning. Particularly, he has done a lot of work on matrix/tensor methods, clustering algorithms, anomaly/outlier/fault detection, graph learning, and automated machine learning. His research has been published on prestigious journals and conferences such as IEEE TSP/TNNLS/TII, KDD, NeurIPS, CVPR, ICML, ICLR, AAAI, and IJCAI. He is a senior member of IEEE, is serving as an associate editor for two international journals including Pattern Recognition and Neural Processing Letters, and is an area chair of NeurIPS and ICML. He won the first prize of the Natural Science Award of Chinese Association of Automation in 2023.
Contact for information
If you have any questions regarding the workshop, feel free to reach out to us:
E-mail: yew@tongji.edu.cn, c2chen@tongji.edu.cn
Organization
Organizing Committee
Program Co-Chairs
Wei Ye, PhD Tenure-Track Professor Tongji University, Shanghai Innovation Institute, China
Ye Zhu, PhD Senior Lecturer Deakin University, Australia
Sourav Medya, PhD Assistant Professor University of Illinois at Chicago, USA
Xin Sun, PhD Professor City University of Macau, China
Benjamin Roth, PhD Professor University of Vienna, Austria
Christian Böhm, PhD Professor University of Vienna, Austria
Claudia Plant, PhD Professor University of Vienna, Austria
Publicity Co-Chairs
Wengang Guo PhD Candidate Tongji University, China
Chunchun Chen PhD Candidate Tongji University, China
Xing Wei PhD Candidate Tongji University, China
Program Committee Members
Carolina Atria - University of Vienna, Austria
Linchuan Zhang - Tongji University, China
Zhaokai Sun - Tongji University, China
Maozheng Li - Tongji University, China
Yue Niu - Tongji University, China
Andrii Shkabrii - University of Vienna, Austria
Jiayi Yang - Tongji University, China
Chenrun Wang - Tongji University, China
Chenyi Xiong - Tongji University, China
Yang Liu - Shanghai Innovation Institute, China
Jinyang Wu - Shanghai Innovation Institute, China
Keynote Speakers
Ming Liu, PhD Senior Lecturer Deakin University, Australia
Title: Low Resource Textual Anomaly Detection with Large Language Models
Abstract:
Two-step approaches combining pre-trained large language model embeddings and anomaly detectors demonstrate strong performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models pose challenges due to substantial memory requirements and high computation time. In this talk, I will first show our intensive empirical results for text anomaly detection with the two stage approach, and then talk about the combined approach with learned VAE and LLM embeddings. At last I will introduce our more recent Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping. Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 state-of-the-art (SOTA) anomaly detection algorithms while maintaining computational efficiency and low memory cost.
Bio:
Dr Ming Liu is a senior lecturer in Deakin University, Australia. He works on Natural Langauge Processing and Machine Learning. He proposed the "Learn to actively learn" approach for active learning and developed a few efficient text summarization models/pipelines (e.g. SummPip, SciSummPip, GLIMMER), most of which are widely used in low-resource text generation settings. Dr Ming has strong interest in solving real world text mining problems, paticularly in domain specific settings. Dr Ming is the area chair in ACL Rolling Reveiw and the session chair in EMNLP2025.
Jicong Fan, PhD Assistant Professor School of Data Science, The Chinese University of Hong Kong, Shenzhen
Title: Some Challenges in Clustering Analysis
Abstract:
Despite the proliferation of hundreds of clustering algorithms over the past decades, significant and persistent challenges continue to complicate their application in real-world scenarios. This talk discuses these challenges, focusing on: 1) model selection and hyperparameter tuning; 2) clustering non-Euclidean data such as graphs; 3) clustering mixed-type and multi-model data; 4) clustering on extremely large-scale datasets. The talk also introduces some possible solutions to these challenges and shows some numerical comparison of different clustering algorithms across hundreds of datasets.
Bio:
Jicong Fan is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. Professor Fan previously obtained his Ph.D. from the Department of Electronic Engineering, City University of Hong Kong in 2018, and his Master’s degree in Control Science and Engineering and Bachelor’s degree in Automation from Beijing University of Chemical Technology in 2013 and 2010 respectively. Prior to joining CUHK-Shenzhen, he was a postdoc associate at Cornell University. He held research positions at The University of Wisconsin-Madison and The University of Hong Kong in 2018 and 2015 respectively. Professor Fan's research interests are Artificial Intelligence and Machine Learning. Particularly, he has done a lot of work on matrix/tensor methods, clustering algorithms, anomaly/outlier/fault detection, graph learning, and automated machine learning. His research has been published on prestigious journals and conferences such as IEEE TSP/TNNLS/TII, KDD, NeurIPS, CVPR, ICML, ICLR, AAAI, and IJCAI. He is a senior member of IEEE, is serving as an associate editor for two international journals including Pattern Recognition and Neural Processing Letters, and is an area chair of NeurIPS and ICML. He won the first prize of the Natural Science Award of Chinese Association of Automation in 2023.