James' Blog
My thoughts on python, typing and maybe AI alignment
Our new paper on introspection is out! Paper website Abstract Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states. Such a capability could enhance model interpretability.
After been doing empirical research for a year+ now, I’ve concluded that copy pasting accelerates research. Consider not refactoring as the default. Bell curve Why not to refactor. Early in a project, you’ve got a lot of ideas and you want to try them out. You can refactor your code to take this into account. This adds alot of complexity and makes it very hard to debug. Most of the time, your final work isn’t going to use all these variations.
This post may be interest people who are interested in getting into AI alignment / the MATS program are interested in the soft skills that I’ve found valuable in developing when working on a research project Background In 2023 I was working as a machine learning engineer. I wanted to work on AI alignment problems. I quit my job and participated in the MATS Summer 2023 program. The MATS program puts you together with others to work on AI alignment problems under a specific mentor.
Most people know the issue with mutable defaults in Python. But what’s the best way to fix it? The issue class User: def __init__(self, name: str, emails: list[str] = []) -> None: self.name = name self.emails = emails def add_email(self, email: str) -> None: self.emails.append(email) james = User(name="James") james.add_email("james@gmail.com") john = User(name="John") # John will have the emails ['james@gmail.com'], even though we never added that email to John's list. # That's a bug!
Let’s say you have a parent class Animal and a child class Cat that inherits from Animal. You might think that you can add a Cat to a list of Animals. But then your pyright / vscode / mypy linter will complain that you can’t do that. Why is that? Let’s start with a simple example: class Animal: def make_sound(self) -> None: print(f"animal!") class Dog(Animal): ... class Cat(Animal): ... def meow(self) -> None: print("meow!
The zip function is a built-in function in Python that allows you to combine two or more iterables into a single iterable. This is a useful function, but it has a very dangerous pitfall that can lead to very subtle bugs. It does not raise an error when the two iterables have different lengths. Instead, it will silently ignore the extra elements of the longer iterable. This can lead to very hard to track bugs (and has hurt me in the past).