Invited Speakers (April 28, 2025) – Xtra Computing Group

We are delighted to invite every member to an exclusive seminar featuring research from authors with papers accepted at ICLR 2025. This is a unique opportunity to hear firsthand about their latest work and engage in insightful discussions.

Date

April 28, 2025 (Friday)

Time: 13:00 ~ 18:00

Venue: COM2-04-02 (Executive Class Room)

Address: COM2, 15 Computing Dr, Singapore 117418

Speaker

Divyam Madaan

PhD Student @ New York University

Divyam Madaan is a fourth-year Ph.D. student at New York University, advised by Sumit Chopra and Kyunghyun Cho. His research focuses on developing models that can effectively learn from multiple modalities and generalize across distribution shifts, with a special emphasis on healthcare applications. Prior to NYU, he earned his M.S. in Computer Science from KAIST, where he worked on model robustness against adversarial examples and continual adaptation to evolving data and architectures. His work has been published at leading venues including ICML, NeurIPS, CVPR and ICLR, where he has also been recognized with oral and spotlight presentation awards.

Zhikai Chen

PhD Student @ Michigan State Univerisity

Zhikai Chen is a third-year PhD student from Michigan State University, mentored by Prof. Jiliang Tang. His research interest lies in graph machine learning, geometric deep learning, and especially the principles regarding building scalable and transferable deep learning models for structured data. His publication appears in venues like ICLR, NeurIPS, ICML, and LOG.

Zexi Li

Ph.D. Student @ Zhejiang University

Zexi Li is a visiting researcher at the University of Cambridge, and he is an upcoming PhD graduate of AI at Zhejiang University. He now focuses on LLMs and optimization, generalization, and personalization of deep learning models, especially under federated/collaborative/trustworthy setups, through the lens of parameter space and learning dynamics. He has published 9 (co)first-author top-tier machine learning papers, including ICML, ICCV, NeurIPS, Patterns (Cell Press), IEEE Trans. on Mobile Computing, and etc.

Hanrong Zhang

MS CS @ ZJU_UIUC

Hanrong Zhang is a Master student majoring in Computer Engineering at Zhejiang University. His research interests lie at: Machine Learning: (Multimodal) Large Language Models; LLM-based Agent; Self-supervised Learning; Trustworthy ML: MLLM (Agent) Safety and Security, Responsible AIGC;

Jen-Tse Huang

Postdoc @ Johns Hopkins University

Jen-Tse Huang is a Postdoc in the Center for Language and Speech Processing (CLSP) at Johns Hopkins University (JHU), working with Prof. Mark Dredze. His research interest lies in evaluating single LLMs as individuals and multiple LLMs as a collective (multi-agent systems) through social science.

Program

Date: April 28, 2025 (Friday)

Venue: COM2-04-02 (Executive Class Room)

12:30 – 13:10	Tea and Snacks
13:10 – 13:20	Sharing by Xtra Group Qian Wang
14:00 – 18:00	Sharing by Invited Speakers
13:20 – 13:50	Divyam Madaan
13:50 – 14:30	Zhikai Chen (Remote)
14:30 – 15:00	Zexi Li
15:00 – 15:30	Break & Discussion
15:30 – 16:00	Hanrong Zhang
16:00 – 16:30	Jen-Tse Huang
16:30 – 17:00	Break & Discussion
17:00 – onwards	Dinner Buffet

Talks

Title: Multi-modal Learning: A Look Back and the Road Ahead

Speaker: Divyam Madaan

Supervised multi-modal learning involves mapping multiple modalities to a target label. While previous studies in this field have largely focused on building new benchmarks or model architectures, meaningful progress remains limited. We argue that conventional methods capture in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). These conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose an inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency.

Title: TBD

Speaker: Zhikai Chen

Foundation models have achieved remarkable success in recent years thanks to two key properties: homogenization, which unifies data and tasks under a single framework, and emergence, which ensures that new capabilities manifest as we scale up model size and training data. This ability to “dial up” performance simply by increasing scale is appealing for all modalities—especially graphs, where traditional approaches often demand task‑specific models trained from scratch. As a result, developing graph foundation models has become an important frontier for the entire field. In this talk, I will share my perspective on how to approach graph foundation modeling, centering on two fundamental challenges: feature heterogeneity, which arises from the diverse types and structures of graph data, and task heterogeneity, which requires a single model to excel at multiple graph tasks. While both are nontrivial, task heterogeneity remains particularly daunting. We argue that, before we can fully realize graph foundation models, we must first tackle these core theoretical problems in graph machine learning.

Foundation Models under Model Parameter Perspective: Model Editing, Fusion, and Generation

Speaker: Zexi Li

Understanding the model parameters of foundation models helps to understand the “physics” in AI and devise more effective algorithms. This talk mainly focuses on the model parameters, the behind insights, and the algorithms that can be built upon these insights, which include the issues from three perspectives: 1) how to edit LLMs’ parametric memory in a lifelong manner? 2) how to better improve model parameter fusion for federated learning? 3) how to generate personalized model parameters by diffusion models?

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Speaker: Hanrong Zhang

Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses across 13 LLM backbones. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. We also introduce a new metric to evaluate the agents’ capability to balance utility and security.

Title: LLMs Do Not Have Human-Like Working Memories

Speaker: Jen-Tse Huang

Human working memory is an active cognitive system that enables not only the temporary storage of information but also its processing and utilization. Without working memory, individuals may produce unreal conversations, struggle with tasks requiring mental reasoning, and exhibit self-contradiction. In this presentation, we demonstrate that Large Language Models (LLMs) lack this human-like cognitive ability, posing a significant challenge to achieving artificial general intelligence. We validate this claim through three experiments: (1) Number Guessing Game, (2) Yes or No Game, and (3) Math Magic. Experimental results on several model families indicate that current LLMs fail to exhibit human-like cognitive behaviors in these scenarios. By highlighting this limitation, we aim to encourage further research in developing LLMs with improved working memory capabilities.

Contact

Question related to the event
(e.g., venue, how to reach)

Qian Wang (organizer)
Zhaomin Wu
Bingqiao Luo
Junyi Hou