About James Chua
- published

Hi! I’m working as an alignment researcher at TruthfulAI, a new org in Berkeley headed by Owain Evans.. Before this, I worked as an Anthropic Contractor as part of the MATS 2023 program under Ethan Perez. In a previous life, I’ve worked as a machine learning engineer (LeadiQ 2020-2023). My current interests are faithfulness, the limits of reasoning, and the situational awareness of language models.
I enjoy making typesafe python packages such as Slist on the side.
Links
Google Scholar | Twitter | 小红书|chuajamessh < at > gmail.
My Research
Inference-Time-Compute: More Faithful? A Research Note
Inference Time Compute models (Gemini-thinking, QwQ) articulate their cues much more than their traditional counterparts. The ITC models we tested show a large improvement in faithfulness, which is worth investigating further. To speed up this investigation, we release these early results as a research note.
Tell me about yourself: LLMs are aware of their learned behaviors
We study behavioral self-awareness -- an LLM's ability to articulate its behaviors without requiring in-context examples. Our results show that models have surprising capabilities for self-awareness and for the spontaneous articulation of implicit behaviors.
Looking Inward: Language Models Can Learn About Themselves by Introspection
Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind that is not accessible to external observers. Can LLMs introspect?
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
We conduct a large-scale empirical study to assess the transferability of gradient-based universal image jailbreaks using over 40 open-parameter VLMs. Transferable image jailbreaks are extremely difficult to obtain.
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Chain-of-thought prompting can misrepresent the factors influencing models' behavior. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT).
Other writings
Finding it hard to communicate your research with your mentor? Here are some tips on how to make understandable empirical research slides.