March 2023 BenGoldhaber.com Newsletter
transcribe everything, more AI discourse, and hundreds of hours of youtube content
I mentioned in my January newsletter that I’ve been experimenting with voice transcription and language models. I tried a number of approaches, and I’ve found a nice use case that I turned into a python utility for transcribing and summarizing voice memos.
It ties into Apple Voice Memos - one of my favorite bundled OSX apps - and takes new voice memos recorded from any of your apple devices - computer, iPhone, watch - and transcribes and annotates them.
More specifically it:
Transcribes all new voice memos locally with OpenAI’s Whisper.
Optionally makes a call to OpenAI to summarize the contents of the call (with a customizable prompt).
Outputs the transcription file into a designated folder.
Runs as a GUI or CLI.
I’ve been using this to take voice memos when I’m walking or driving and then automatically add them to my personal note system (tangent: I’ve recently switched to Obsidian and I like it).
The convenience factor of dictating anywhere is great, but I also just feel like there’s something different about voice compared to writing. Writing forces you to be analytical and composed - sometimes you want the rambliness and meandering feel of speech to get the ideas to flow.
If you try it let me know if it’s helpful!1
This small tool springs from my larger incomplete project that does realtime voice transcription in combination with ‘bots’. You define a bot that consumes text - like looking for a keyword or long silences - and then it adds a notification such as suggesting a new topic or flagging a followup question.
I think there’s an interesting angle here, where voice is a stream that is consumed by collections of large language bots which help with orienting.
The actual repo is very messy and probably only useful if you’re looking to rip some of the code to make your own voice-to-text app.
I’m going to continue playing around with this, mostly focusing on experimental proof of concepts for automating meetings. Meetings are the primary coordination system of… well, every organization. The possibility of basically free, omnipresent machine transcription relaxes a constraint on meeting design that I think could enable a better experience.
Ideas:
AI guided meetings: Very few groups consciously implement good meeting practices (ex. polling the group individually before asking the HiPPO) - building such structure into tools, which are less susceptible to weird status dynamic/power play moves seems like it could help. In general many conversations go better with a third party ‘moderator’, maybe transcription + LLMs could help make that happen.
I’d like to experiment with this for forecasting calls. A lifetime ago Jacob Lagerros and I held a series of forecasting sessions around AI. We found a format that worked well for aggregating group forecasts; combining this with LLMs could be a very exciting way to make it scale.
Asynchronous meetings: What if, instead of speaking directly to other people, you were instead speaking 1:1 with an LLM that was aggregating everybody else’s questions and opinions and giving you directly the summary. Then you could maximize each individual participants interaction time. Insanity inducing? Probably, but maybe also a step towards our bold cyborg future. Also, I desperately want to never again feel that sense of turning into a zombie when I’m in a twenty person update call.
Thanks to Ivan Vendrov for the discussion and ideas for using AI in meetings.
#links
A Short 100-Question Diligence Checklist: More good content from Byrne Hobart; a due diligence checklist useful when considering an investment or, in general, evaluating a company or project.
9. Do incentives for employees who don't show up in the proxy statement match incentives for the ones who do? Wells Fargo got into a lot of trouble by telling employees to open as many accounts as possible, and Salomon Brothers ran into problems when its arbitrage group was charged a lower cost of capital than other parts of the firm, meaning that some trades that were unprofitable in some groups still paid off for Team Arbitrage.
Pausing AI Developments Isn't Enough. We Need to Shut it All Down - If you’re caught up on AI news - which, uh, isn’t really possible with the speed new things are dropping - you might have seen this piece in Time magazine by Eliezer Yudkowsky calling for stopping future AI development to reduce the risk of misaligned AI.
Read the whole thing; it’s a very clear accessible take on one of the major ‘schools of thought’ of AI alignment and from the arguable founder of the field.
The key issue is not “human-competitive” intelligence… it’s what happens after AI gets to smarter-than-human intelligence. Key thresholds there may not be obvious, we definitely can’t calculate in advance what happens when, and it currently seems imaginable that a research lab would cross critical lines without noticing.
This has blown up on twitter, and the discourse is, predictably, terrible, but on some level I’m glad we’re going through it. I feel like periods of particularly bad discourse are actually reflective of large scale 'reorienting', where everyone is challenging each others frames and assumptions, and eventually there's a convergence where, even if people don't agree, at least they're speaking the same language. That said, I agree with Will Eden:
Since there's enough hot takes flowing, I feel empowered to wait and give a lukewarm take. I'll share a longer form essay on my own model of AI Development/Strategy later in the month.
Related: List of AI predictions from Richard Ngo:
Tate-Pilled: What a generation of boys have found in Andrew Tate’s extreme male gospel. We've been having a crisis of masculinity in America since the 1800s, but the latest iteration does seem disappointingly nihilistic (post-irony?).
As a young kickboxer in the 2010s, Tate learned that being a good fighter wasn’t enough. He had to put on a show. Back then, he called himself King Cobra and played the pretty but anguished assassin. (“Pain,” Cobra said during one post-fight interview. “It’s all I know.”) Top G is his current persona, a two-dimensional man, all muscles and priapic display, apparently impervious to human vulnerability.
Overall I thought this was a good article that had the ring of truth, though some degree of skepticism is warranted - I was once a teenager and can recall thinking even then how overblown the male-teen-trend stories. However, given that Mr. Tate seems to have been running webcam brothels and generally glorifying misogyny, there is pretty good evidence he’s uh not a good person.
Blogging is dead, long live sites: An ode to long form, eccentric sites and their creators. The list of exemplars contains some of my favorite writers on the internet.
My point: ‘Blogging’ has been used for both short-term indie punditry/self-expression, and long-term indie creative/intellectual work. The first is now on social media. The second lives on: I learned more from these personal sites than I did in three stints at university. Many of these sites are called blogs, but I say leave the word to the first thing.
The use-the-best heuristic for lie detection: When evaluating whether someone is trying to deceive you focus on how much detail they provide.
Participants performed at the chance level when they made intuitive judgements, free to use any possible cue. But when instructed to rely only on the best available cue (detailedness), they were consistently able to discriminate lies from truths.
The past is a foreign country exhibit #22591:
#good-content
I didn’t read or watch much TV this last month, *except* I got really into two YouTube shows:
Jet Lag The Game: It’s like the Amazing Race but as a board game. Very fun! I started with Race to Visit the Most US States in 100 Hours and am now watching the New Zealand season.
Dimension 20: Former CollegeHumor cast members play Dungeons and Dragons. The host is a great DM; start with Fantasy High.
Congratulations on making it through Q1 2023, may your objectives and key results stay green,
Ben
Running the code currently requires some degree of familiarity with Python - I got too frustrated trying to get pyinstaller to build a general standalone executable app (I think this is a quirk of my computer idk). If anyone opens a pull request on the repo which adds an executable and reproducible build spec and I merge it I’ll pay you $75.