LLM4Cluster

Call for papers

LLM4Cluster

Clustering is indispensable in many fields where data is unlabeled, aiming to group similar data points into meaningful clusters. Recent studies have shown that using the world knowledge in the Large Language Models (LLM) can significantly improve the clustering performance. Therefore, the primary goal of this workshop is to investigate how to harness the distinctive strengths and generalization abilities of LLM to address issues such as the vague cluster boundaries, the hard-to-set parameters, and the supervision signals for optimization encountered by conventional/deep clustering methods, and thus to promote the advancement of the clustering domain. The workshop will explore clustering methodologies and practices that extend beyond the specified data types, aiming to reveal the extensive potential of LLM across diverse application contexts.

Topics of interest

We welcome all papers that are related but not limited to the following topics:

Investigation into the reinforcement between LLM and clustering, e.g., low-rank adaptation of LLM by clustering and enhancing clustering stability using LLM
Novel approaches integrating LLM into classical/deep clustering methods
Interpretability of the clustering process leveraging LLM
Efficient clustering models for handling large-scale data
LLM-guided incremental clustering for dynamic data
Generalizable LLM-based clustering models for cross-domain data
Employing multi-modal LLM for multi-view/modal clustering
Leveraging LLM to supply external knowledge for few-shot clustering
Enabling LLM to understand topological information for graph clustering
Utilizing LLM to guide clustering with unknown parameters such as the subspace number and the cluster number
LLM-driven chain-of-thought prompting for advanced clustering and complex problem solving
Establishing new benchmarks and evaluations for clustering assessment

And clustering application papers that are focused on but not limited to the following domains:

Recommender systems

Information retrieval

Computer vision

Natural language processing

Social network analysis

Anomaly detection

Automatic speech recognition

Transportation networks

Artificial intelligence for science/engineering

Financial technology

Program

The workshop will be held on November 12th during ICDM 2025 in the Massachusetts Room, The Capital Hilton Hotel.

Time	Event	Presenter(s)
13:30-13:35	Opening Remark	Wei Ye
13:35-14:30	Keynote Speech: Low Resource Textual Anomaly Detection with Large Language Models	Ming Liu
14:30-14:45	Text Clustering with Large Language Models: Recent Advances, Applications, and Experimental Insights	Kamal Taha
14:45-15:00	LLM-Enhanced Semantic Clustering for Predictive Industrial Incident Detection: A Zero-Shot Approach to Safety Analytics	Magzhan Kairanbay, Basaveswar Reddy Mora Reddy, Sayed Ahmed Khalaf, Ali Alshehab, and Ahmed Bahar
15:00-15:15	New Topic Discovery Using LLM Analysis and Entropy Based Clustering of Short Texts	Allen Detmer, Raj Bhatnagar, and Jillian Aurisano
15:15-15:30	Break
15:30-15:45	Orthogonal Semi-Nonnegative Tensor Factorization based Tensorized Label Learning	Yuxuan Liu, Quanxue Gao, Jingjing Xue, Jing Li, and Cheng Deng
15:45-16:00	Anchor Label Guided Co-Clustering via Multi-Anchor Graphs Fusion	Guangyu Yang, Xuhong Dong, Rui Wang, Quanxue Gao, and Yu Duan
16:00-16:30	Coffee Break
16:30-17:15	Keynote Speech: Some Challenges in Clustering Analysis	Jicong Fan
17:15-17:30	Compute-Efficient Prediction of Rhetorical Relations via Late Fusion of Semantic Perturbation and Attention Features	Yoga Harshitha Duddukuri

Important dates

Submission deadline: Aug 29, 2025 Sep 05, 2025

Paper notification: Sep 15, 2025 Sep 19, 2025

Early Bird Registration deadline: Sep 24, 2025 Sep 30, 2025

Camera-ready: Sep 25, 2025 Oct 5, 2025. The camera ready submission guide is here.

Workshop: Tentatively on Nov 12, 2025 (the first day)

Submission

All the accepted papers can be up to 10 pages including references in the IEEE 2-column format. Submissions longer than 10 pages will be rejected without review. All submissions will be triple-blind reviewed by the Program Committee based on technical quality, relevance to the scope of the conference, originality, significance, and clarity. Manuscripts must be submitted electronically through the online submission system: Submit your paper here. Accepted papers will be included in the ICDM Workshop Proceedings (separate from ICDM Main Conference Proceedings), and each workshop paper requires a full registration. Meanwhile, duplicate submissions of the same paper to more than one ICDM workshop are forbidden.

Organization

Organizing Committee

Program Co-Chairs

Wei Ye, PhD Tenure-Track Professor Tongji University, Shanghai Innovation Institute, China

Ye Zhu, PhD Senior Lecturer Deakin University, Australia

Sourav Medya, PhD Assistant Professor University of Illinois at Chicago, USA

Xin Sun, PhD Professor City University of Macau, China

Benjamin Roth, PhD Professor University of Vienna, Austria

Christian Böhm, PhD Professor University of Vienna, Austria

Claudia Plant, PhD Professor University of Vienna, Austria

Publicity Co-Chairs

Wengang Guo PhD Candidate Tongji University, China

Chunchun Chen PhD Candidate Tongji University, China

Xing Wei PhD Candidate Tongji University, China

Program Committee Members

Carolina Atria - University of Vienna, Austria
Linchuan Zhang - Tongji University, China
Zhaokai Sun - Tongji University, China
Maozheng Li - Tongji University, China
Yue Niu - Tongji University, China
Andrii Shkabrii - University of Vienna, Austria
Jiayi Yang - Tongji University, China
Chenrun Wang - Tongji University, China
Chenyi Xiong - Tongji University, China
Yang Liu - Shanghai Innovation Institute, China
Jinyang Wu - Shanghai Innovation Institute, China

Keynote Speakers

Ming Liu, PhD Senior Lecturer Deakin University, Australia

Title: Low Resource Textual Anomaly Detection with Large Language Models

Abstract:

Two-step approaches combining pre-trained large language model embeddings and anomaly detectors demonstrate strong performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models pose challenges due to substantial memory requirements and high computation time. In this talk, I will first show our intensive empirical results for text anomaly detection with the two stage approach, and then talk about the combined approach with learned VAE and LLM embeddings. At last I will introduce our more recent Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping. Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 state-of-the-art (SOTA) anomaly detection algorithms while maintaining computational efficiency and low memory cost.

Bio:

Dr Ming Liu is a senior lecturer in Deakin University, Australia. He works on Natural Langauge Processing and Machine Learning. He proposed the "Learn to actively learn" approach for active learning and developed a few efficient text summarization models/pipelines (e.g. SummPip, SciSummPip, GLIMMER), most of which are widely used in low-resource text generation settings. Dr Ming has strong interest in solving real world text mining problems, paticularly in domain specific settings. Dr Ming is the area chair in ACL Rolling Reveiw and the session chair in EMNLP2025.

Jicong Fan, PhD Assistant Professor School of Data Science, The Chinese University of Hong Kong, Shenzhen

Title: Some Challenges in Clustering Analysis

Abstract:

Despite the proliferation of hundreds of clustering algorithms over the past decades, significant and persistent challenges continue to complicate their application in real-world scenarios. This talk discuses these challenges, focusing on: 1) model selection and hyperparameter tuning; 2) clustering non-Euclidean data such as graphs; 3) clustering mixed-type and multi-model data; 4) clustering on extremely large-scale datasets. The talk also introduces some possible solutions to these challenges and shows some numerical comparison of different clustering algorithms across hundreds of datasets.

Bio:

Jicong Fan is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. Professor Fan previously obtained his Ph.D. from the Department of Electronic Engineering, City University of Hong Kong in 2018, and his Master’s degree in Control Science and Engineering and Bachelor’s degree in Automation from Beijing University of Chemical Technology in 2013 and 2010 respectively. Prior to joining CUHK-Shenzhen, he was a postdoc associate at Cornell University. He held research positions at The University of Wisconsin-Madison and The University of Hong Kong in 2018 and 2015 respectively. Professor Fan's research interests are Artificial Intelligence and Machine Learning. Particularly, he has done a lot of work on matrix/tensor methods, clustering algorithms, anomaly/outlier/fault detection, graph learning, and automated machine learning. His research has been published on prestigious journals and conferences such as IEEE TSP/TNNLS/TII, KDD, NeurIPS, CVPR, ICML, ICLR, AAAI, and IJCAI. He is a senior member of IEEE, is serving as an associate editor for two international journals including Pattern Recognition and Neural Processing Letters, and is an area chair of NeurIPS and ICML. He won the first prize of the Natural Science Award of Chinese Association of Automation in 2023.

LLM4Cluster

The first Workshop on Large Language Models for Advanced Clustering Techniques

Call for papers

LLM4Cluster

Topics of interest

We welcome all papers that are related but not limited to the following topics:

Investigation into the reinforcement between LLM and clustering, e.g., low-rank adaptation of LLM by clustering and enhancing clustering stability using LLM

Novel approaches integrating LLM into classical/deep clustering methods

Interpretability of the clustering process leveraging LLM

Efficient clustering models for handling large-scale data

LLM-guided incremental clustering for dynamic data

Generalizable LLM-based clustering models for cross-domain data

Employing multi-modal LLM for multi-view/modal clustering

Leveraging LLM to supply external knowledge for few-shot clustering

Enabling LLM to understand topological information for graph clustering

Utilizing LLM to guide clustering with unknown parameters such as the subspace number and the cluster number

LLM-driven chain-of-thought prompting for advanced clustering and complex problem solving

Establishing new benchmarks and evaluations for clustering assessment

And clustering application papers that are focused on but not limited to the following domains:

Recommender systems Information retrieval Computer vision Natural language processing Social network analysis Anomaly detection Automatic speech recognition Transportation networks Artificial intelligence for science/engineering Financial technology

Recommender systems

Information retrieval

Computer vision

Natural language processing

Social network analysis

Anomaly detection

Automatic speech recognition

Transportation networks

Artificial intelligence for science/engineering

Financial technology

Program

Important dates

Submission deadline: Aug 29, 2025 Sep 05, 2025

Paper notification: Sep 15, 2025 Sep 19, 2025

Early Bird Registration deadline: Sep 24, 2025 Sep 30, 2025

Camera-ready: Sep 25, 2025 Oct 5, 2025. The camera ready submission guide is here.

Workshop: Tentatively on Nov 12, 2025 (the first day)

Submission

Organization

Organizing Committee

Program Co-Chairs

Publicity Co-Chairs

Program Committee Members

Carolina Atria - University of Vienna, Austria

Linchuan Zhang - Tongji University, China

Zhaokai Sun - Tongji University, China

Maozheng Li - Tongji University, China

Yue Niu - Tongji University, China

Andrii Shkabrii - University of Vienna, Austria

Jiayi Yang - Tongji University, China

Chenrun Wang - Tongji University, China

Chenyi Xiong - Tongji University, China

Yang Liu - Shanghai Innovation Institute, China

Jinyang Wu - Shanghai Innovation Institute, China

Keynote Speakers

Title: Low Resource Textual Anomaly Detection with Large Language Models

Abstract:

Bio:

Title: Some Challenges in Clustering Analysis

Abstract:

Bio:

Contact for information

If you have any questions regarding the workshop, feel free to reach out to us:

E-mail: yew@tongji.edu.cn, c2chen@tongji.edu.cn

The first Workshop on
Large Language Models for Advanced Clustering Techniques

Recommender systems

Information retrieval

Computer vision

Natural language processing

Social network analysis

Anomaly detection

Automatic speech recognition

Transportation networks

Artificial intelligence for science/engineering

Financial technology