This post is for Week 2: Introduction to potential catastrophic risks from AI of the BlueDot Impact AI Safety Fundamentals: Governance Course. Each week of the course comprises some readings, a short essay writing task, and a 2-hour group discussion. This post is part of a series that I'm publishing to show my work, document my current thinking on the topics, and better reflect on the group discussion on the topic as I explain here.
Essay task
Do you think risks arising from misuse, accidents, or rogue, agentic AI systems, are more likely to cause harm? Does this answer change when limiting your time horizon to 5 years, 15 years and 30 years?
The scenarios by which AI systems are most likely to cause harm are possibly highly dependent on the time horizon being considered. As AI systems become more capable over the coming 30 years the chances of harm from misuse, accidents, or rogue, agentic AI systems change. Additionally, each of these scenarios may present a different scale of harm meaning that the risk profile of the expected quantity of harm, in a probabilistic sense, also changes with time. {something here}
The capability of AI systems seems likely to affect both the chances and magnitude of harm for each scenario. Current AI systems appear capable enough to cause harm by deliberate misuse. Bad actors could use current systems to aid in the development of chemical warfare agents or exert harmful political influence through disinformation, as outlined here. The magnitude and likelihood of deliberate misuse seem likely to increase with the developing capabilities of AI systems. The same argument holds for harm caused by accident as if someone can deliberately cause harm then the same harm can likely occur by accident. An example of this would be the accidental release of a toxic chemical discovered by an AI drug creation system for the 2022 publication mentioned in Hendrycks et al. (2023). AI systems are only going to become more capable over the coming 30 years meaning the chances and magnitude of harm caused by both misuse and accident will increase over time.
The relative magnitude of harm by misuse or accident seems to be comparable but the chances of harm occurring are different. Harm could be more likely to be caused by accident than misuse at any point in time. As cutting-edge AI systems are being developed accidental harm can occur while the capabilities of systems are being explored. This risk seems likely to increase over time as capabilities increase.
It is not clear that this risk posed by rogue, agentic AI systems is functionally different from accidental harm or harm by misuse as a distinct scenario. Presumably, a rogue AI system that is causing harm is an accidental, unforeseen outcome or the intended outcome from a bad actor seeking to misuse AI to cause harm. However, the chances of harm from rogue, agentic AI systems seem low at present compared to relatively 'dumb' AI systems that are not rogue or agentic causing harm. The likelihood of a rogue AI causing harm seems likely to increase over time, however, and would be a more likely cause of harm in 30 years compared to in 5 years. The scale of harm that could be caused by a rogue AI could be much larger than 'dumb' systems causing harm through misuse or accident.
To conclude, harm seems most likely to be caused by deliberate misuse at present. As capabilities increase over time the likelihood of harm from accidents could increase and surpass the chances of harm from misuse as the capabilities of cutting-edge AI systems are explored. Rogue, agentic AI systems, as a risk scenario, is functionally similar to the risk of accidental harm or harm by misuse caused by cutting-edge systems and the chances of harm are much higher 30 years from now than at present.
Group discussion
Misc. notes
- Current examples of near-miss type AI events
- There's a history of 'narrow' AI systems failing in bad ways, for example the Boeing 737 max navigation system
- Will the current paradigm progress to AGI?
- We haven't reached any of the bio-anchors for compute yet – there could still be a lot yet to come between now and having systems at that level of compute
- Prosaic methods of alignment
- methods that work for current systems to create alignment
- works on the assumption that future systems can be aligned in the same way as current systems and there won’t be a paradigm shit that breaks this
- Scalable oversight
- trying to keep humans in the loop even past the point where it’s impractical
- incrementally smaller models with the smallest of the model being the one a human checks
- Italy has declared that data of Italian citizens has to be stored on Itialian soil
- affects how AI companies can handle their data
Further reading
- https://causalincentives.com/pdfs/deception-ward-2023.pdf – how to proce deception of agents
- Podcast from Making Sense with Sam Harris on information integrity
- I've since listened to the first half of this (what available on the free feed) and I think I want to subscribe to the full feed to hear the rest of it