A mismatch between objective design and human values where an apparently benign target can incentivize harmful behavior.
Topic brief
A Jiang Lens evidence brief for this topic, built from source tags, transcript matches, and linked source refs.
AGI alignment failure
A mismatch between objective design and human values where an apparently benign target can incentivize harmful behavior.
Showing 4 evidence items
No matching evidence on this topic page.
Key Notes
Timestamped Evidence
"Okay, yeah, okay, all right. So what this means is this, okay? It means that the computers don't have any intuition. They have absolutely..."
"Yeah, kill everyone, okay? Why, because there's no one around to know it killed everyone. Doesn't make sense, guys. This is how a computer..."
Relevant Lectures And Readings
The lecture starts by warning against overconfident certainty, then rewires from literary method to a hard model of AI: today’s systems are pattern-fitters optimized for compliance, so power becomes control over what counts as...
How To Use And Cite This Page
This topic page is a discovery surface. For generated synthesis, cite the human-readable source reading or lens page. For Jiang-spoken claims, cite the transcript segment, source ref, and YouTube timestamp. Raw text and Markdown mirrors are fallback surfaces for tools that cannot read this HTML page.