About James Chua

Hi! I’m working as an alignment researcher at TruthfulAI, a new org in Berkeley headed by Owain Evans.. Previously I’ve worked as a machine learning engineer (LeadiQ 2020-2023), and as a MATS 2023 scholar under Ethan Perez. I enjoy making typesafe python packages such as Slist on the side.

Google Scholar | Twitter | chuajamessh < at > gmail.

My Research

Paper visualization

Inference-Time-Compute: More Faithful? A Research Note

James Chua, Owain Evans

Inference Time Compute models (Gemini-thinking, QwQ) articulate their cues much more than their traditional counterparts. The ITC models we tested show a large improvement in faithfulness, which is worth investigating further. To speed up this investigation, we release these early results as a research note.

Paper visualization

Tell me about yourself: LLMs are aware of their learned behaviors

Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans

We study behavioral self-awareness -- an LLM's ability to articulate its behaviors without requiring in-context examples. Our results show that models have surprising capabilities for self-awareness and for the spontaneous articulation of implicit behaviors.

Paper visualization

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix J Binder, James Chua, Tomek Korbak, Henry Sleight, John Hughes, Robert Long, Ethan Perez, Miles Turpin, Owain Evans

Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind that is not accessible to external observers. Can LLMs introspect?

Paper visualization

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez

We conduct a large-scale empirical study to assess the transferability of gradient-based universal image jailbreaks using over 40 open-parameter VLMs. Transferable image jailbreaks are extremely difficult to obtain.

Paper visualization

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin

Chain-of-thought prompting can misrepresent the factors influencing models' behavior. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT).

Other writings

Paper visualization

Tips On Research Slides

James Chua, John Hughes, Ethan Perez, Owain Evans

Finding it hard to communicate your research with your mentor? Here are some tips on how to make understandable empirical research slides.