Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

publications

Spatio-Temporal Adaptation with Dilated Neighbourhood Attention for Accident Anticipation

Published in IEEE International Conference on Image Processing (ICIP), 2024

This study uses Parameter-Efficient Transfer Learning (PEFTL) and Dilated Neighborhood Attention (DNA) to adapt pretrained CLIP-ViT for traffic accident anticipation. By utilizing novel Spatial and Temporal Adapters with cross-attention, the model captures long-range dependencies more effectively, achieving state-of-the-art earliness and accuracy on the DAD and CCD datasets.

Recommended citation: P. Patera, Y. -T. Chen and W. -H. Fang, "Spatio-Temporal Adaptation With Dilated Neighbourhood Attention For Accident Anticipation," 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 2452-2458, doi: 10.1109/ICIP51287.2024.10647316.
Download Paper

A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation

Published in IEEE Transactions on Circuits and Systems for Video Technology, 2025

MASTTA is a parameter-efficient, multi-modal framework that improves traffic accident anticipation by fine-tuning CLIP-based adapters for visual and textual data. By utilizing novel Temporal and Spatial Adapters alongside a Text Adapter, the model captures complex spatio-temporal interactions and aligns them in a joint embedding space. This synergy allows for more accurate, long-range context modeling, outperforming state-of-the-art methods in both earliness and correctness on the DAD and CCD benchmarks.

Recommended citation: P. Patera, Y. -T. Chen and W. -H. Fang, "A Multi-Modal Architecture With Spatio-Temporal-Text Adaptation for Video-Based Traffic Accident Anticipation," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 9, pp. 8989-9002, Sept. 2025, doi: 10.1109/TCSVT.2025.3552895.
Download Paper

System and method for identification, authentication, and verification of a person based upon a short audio-visual recording of the person

Published in US patent 2025, 2025

This method computes a unique hash by combining facial and voice representations into distinct fingerprints. It extracts these features from multimedia recordings and uses a similarity measure to compare them for identification or verification.

Recommended citation: K. Ekštein, M. Konopík, F. Pártl, P. Patera, "System and method for identification, authentication, and verification of a person based upon a short audio-visual recording of the person", US patent 2025/0184148.
Download Paper

Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation

Published in The Fourteenth International Conference on Learning Representations (ICLR), 2026

A lightweight, real-time accident predictor trained via novel temporally shifted distillation, combining efficient spatial encoding and recurrent temporal modeling, running on edge devices.

Recommended citation: P. Patera, Y.-T. Chen, W.-H. Fang, "Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation,"The Fourteenth International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 2026, pp. 1-20, url: https://openreview.net/forum?id=8zzfTSVds2
Download Paper

teaching

Database Design

Graduate course, Taiwan Tech (NTUST), Department of Electronic & Computer Engineering, 2026

This course explores the design and implementation of database systems, covering fundamental data models and query languages such as SQL and stored procedures. Students will examine transaction processing (ACID and concurrency control) and recovery mechanisms, including logging and checkpoints. A core component of the curriculum is a hands-on project involving three-tier database architecture. This course is designed for students with a strong programming background.

Intelligent Video Surveillance Systems

Graduate course, Taiwan Tech (NTUST), Department of Electronic & Computer Engineering, 2026

This course provides a comprehensive introduction to state-of-the-art Intelligent Video Surveillance Systems, focusing on advanced computer vision and deep learning methodologies.

Large Language Models and Applications

Graduate course, Taiwan Tech (NTUST), Department of Electronic & Computer Engineering, 2026

This course offers a comprehensive deep dive into Large Language Models (LLMs) and Generative AI, covering theoretical foundations, technical architectures, and real-world applications. Students will explore internal mechanisms like pre-training, transfer learning, and task-specific fine-tuning. The curriculum extends to generative models (specifically text-to-image) and their training methods, equipping students with the practical skills needed to master these cutting-edge technologies.