Hank Chi-Hsi Kung

I am a research assistant at National Chiao Tung University in Taiwan, focusing on visual action representations learning and event identification under the supervision of Prof. Yi-Ting Chen and Dr. Yi-Hsuan Tsai.

Meanwhile, I am visiting Indiana University Bloomington to delve into the intersection of 3D representation learning and cognitive science with Prof. David Crandall and Prof. Linda Smith.

Prior to this, I received my M.Sc from National Tsing-Hua University, where I was supervised by Prof. Che-Rung Lee, and B.Sc from National Taipei University.

I am actively looking for a Ph.D position starting from Fall 2025!

Email  /  Google Scholar  /  Twitter  /  Github

profile photo

Research

I aim to build Intuitive Physics World Models, a foundation enabling humans to learn without supervision or interruption. Intuitive physics world models can facilitate human-like intelligence by offering models simulated interaction to adapt novel environments effortlessly. Toward intuitive physics world models, learning compositional and augmentable representations is the key component. Humans excel at generalization because we have access to compositional representations and we "reuse" previously learned compositional representations to form new skills. Yet, the contemporary “scaling” deep learning paradigms or “next token prediction” lacks such compositionality, thus fundamentally suffering from intrinsically limited generalization.

Moreover, I am fascinated by how humans rapidly learn novel compositions with minimal experience, motivating me to approach learning compositionality through human learning by integrating insights from child development and cognitive sciences. For example, humans can learn novel composition by imagining possible state changes.

News

Aug 2024

Give a talk at CVGIP 2024!

Apr 2024

Co-organizing the 3rd ROAD Workshop & Challenge at ECCV 2024!

Mar 2024

One paper on Visual Action-centric Representation is accepted at CVPR 2024!

Feb 2024

One paper on Risk Identification is accepted at ICRA 2024!

Publications

Action-slot: Visual Action-centric Representations for Atomic Activity Recognition in Traffic Scenes

Chi-Hsi Kung, Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
CVPR, 2024
project page / paper / arxiv / code / TACO dataset

We use Action-slot to represent atomic activities. The learned attention can discover and localize atomic activities with only weak video labels and without using any perception module (e.g., object detector).

RiskBench: A Scenario-based Benchmark for Risk Identification

Chi-Hsi Kung, Chieh-Chi Yang, Pang-Yuan Pao, Shu-Wei Lu, Pin-Lun Chen, Hsin-Cheng Lu, Yi-Ting Chen
ICRA, 2024
project page / video / paper / code / dataset

The FIRST benchmark that enables evaluation of various types of risk identification algorithms, namely, rule-based, trajectoy-prediction-based, collision prediction, and behavior-change-based. We also assess the influence of risk identification to the downstream driving task.

ADD: A Fine-grained Dynamic Inference Architecture for Semantic Image Segmentation

Chi-Hsi Kung and Che-Rung Lee
IROS, 2021 & ACML 2021 MRVC workshop
paper / code

We use Neural Architecture Search (NAS) to find an optimal structure for dynamic inference on semantic segmentation.

Conference Reviewer

Advances in Neural Information Processing Systems (2024)

IEEE Conference on Computer Vision and Pattern Recognition (2023-2025)

IEEE International Conference on Development and Learning (2024)


Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.