February 2025 BenGoldhaber.com Newsletter

It was my birthday month I can post as late as I want to edition

Mar 12, 2025

My Dad is a pro traveler. When I was growing up he would fly from Raleigh to DC in the morning, do a set of meetings on the Hill and K Street, and then catch a flight back at night. I remember when we’d travel with him all the TSA employees knew his name, and also that I was embarrassed in that teenager way that he had so many American Airlines miles and so we could board way before everyone else.

I lack whatever gene he has that let him reliably do that much travel. In the past month I 1.) visited Paris and London 2.) visited home (NC will always be home in my mental map) 3.) attended a relational meditation immersion 4.) attended two conferences, one of which I helped organize, and by the tail end my body was clearly teetering on giving up the ghost.

It’s been a lot - and I’m not even including the whole starting a new job thing + many lovely hangouts with friends. It’s been a great month, but a full month, and I have some regret that in the all of this I didn’t really celebrate my birthday. I’m not a huge birthday guy, but it’s important to be deliberate at times, taking pause moments, to look back and be like ‘yes, I’m glad I exist, it’s pretty nice here’.

All this to say I’m banking the birthday; if I reference a birthday party or birthday pilgrimage a few months from now lets all just pretend its appropriate and that I in fact can plan things in a timely fashion.

Claude has been playing Pokemon; I appreciate that it got stuck in Mt. Moon for a long time, which I too had trouble with when I was in my pre-adult- intelligence developmental stage. I also find its polite note asking for help very endearing:

The image shows blue title text "Thinking" at the top, followed by a formal message on a black background. The message is titled "FORMAL REQUEST FOR ADMINISTRATIVE RESET" and contains a letter addressing an "Administrator" requesting intervention to reset a game so the player can start properly from the bedroom to achieve the goal of defeating the Elite Four. The letter is signed "Respectfully submitted, Your AI Assistant."

Dan Hendrycks and Eric Schmidt have articulated their views of the strategic landscape of AI competition and Loss of Control. I was initially skeptical - and I find the term MAIM (mutually assured malfunction) *very* forced - but after speaking with a friend I’m convinced this is a great development. It brings Loss of Control (AI’s self-exfiltrating and doing what they want not what we want) into the parlance and frame of National Security.

If any state that attempts to seize AI supremacy can expect the threat of preemptive sabotage, states may be deterred from pursuing unilateral power altogether. We call this outcome Mutual Assured AI Malfunction (MAIM). As nations wake up to this possibility, we expect it will become the default regime, and we need to prepare now for this new strategic reality.
MAIM is a deterrence framework designed to maintain strategic advantage, prevent escalation, and restrict the ambitions of rivals and malicious actors

I’ve been pondering the article because I also think this is the year we dramatically change the offense/defense balance in cybersecurity and use AI to accelerate the deployment of formally verified software that is provably bug free. Steve Omohundro has a great presentation on the new day for formal methods.

It’s time for “Vibe Proving” and “Vibe Specification”

I agree, and I’ve staked my claim that more security is good, and have been working on efforts to Make our Critical Infrastructure Great Again. It’s an embarrassment - and yes I posit software engineers should feel embarrassed - that so much our civilization depends on shoddily written code, and nows the time to use AI to build back better. But I also acknowledge, as the Hendrycks/Schmidt paper points out, we’re in a complex, cold war-esque scenario where defensive technology (like the STAR WARS program in the 80s) could upset strategic balances and lead to more risk. Importantly there are several crucial dissimilarities as well - cyber operations aren’t the only way nation states can disincentivize competitors racing towards super intelligence, there are other ways to verify besides cyber espionage, etc. - and I posit the benefits of provably secure systems greatly outweigh the risk, but alas there’s no free lunch.

We held a Guaranteed Safe AI conference last weekend on some of these topics - Sarah Constantin has a great thread with things she learned from the event.

Daniel Kokotajlo lengthens his AGI timelines to 2028. Dario thinks in 3 to 6 months AI is writing 90% of the code, 12 months basically all of the code. Good luck new grads!

Surprising results from Owain Evans et. al. on emergent misalignment - models seem to display correlated alignment and misalignment, where training it to be ‘bad’ in one one domain causes correlated badness. Orthogonality thesis in LLMs called into question!

We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.

Palisade Research report showing how AI will, when pursuing narrow objectives, sometimes attempt to cheat to win.

When sensing defeat in a match against a skilled chess bot, they don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game.

New very aesthetic mood tracker How We Feel! h/t John S, I’ve been using it and love the UX.

Two America’s, one bank branch, and $50,000 cash: Patrick McKenzie investigated the details behind a bank story that didn’t quite add up. As the rationalists say either the evidence is wrong or your model is wrong, and Patio11 quite excellently tracked down the truth. It’s also a clear example of the limits of security through obscurity; a few bits of info are leaked and a motivated investigator is able to find out most of the relevant details to find the truth.

A style magazine published an account of a large cash withdrawal that didn't match my understanding of banking reality. I burned several thousand dollars and a year investigating. I now doubt that account less, because I understand the context better.

Sasha Chapin articulates well the way in which both yin and yang approaches to influence and control can be successful. I think it’s a useful distinction and resonates with me; often people have different distinct modes of being that they come

#good-content

Pantheon: Compelling scifi thriller based around human uploads and Ems. Yes, there’s a bit of Hollywood Science you’ve got to look past, but the plot moves fast and feels like it has stakes, plus great western animation.

Fall Guy: This was a very fun plane/random night in when you need a rom com kind of movie. Compare and contrast with the Notebook, which I also watched for the first time, and gotta say, a lot sadder than I was ready for.

Also - how is Ryan Gosling so charming? Deep Research’s take:

One of Gosling’s most notable strengths as an actor is his mastery of subtlety and restraint. Rather than relying on overwrought displays of emotion, he often communicates volumes through what remains unspoken – a glance, a pause, a slight change in expression… By maintaining a controlled, almost quiet physicality, he invites viewers to lean in and detect the emotion under the surface.

Flume: It seems I’ve forgotten about Flume until this month, and now I can’t get enough. Holdin on, Never be like you, and This Song Is Not About a Girl have been playing on repeat for me.

onwards!

Ben

Ben Goldhaber's Newsletter

Discussion about this post