尊敬的 微信汇率:1円 ≈ 0.046374 元 支付宝汇率:1円 ≈ 0.046466元 [退出登录]
SlideShare a Scribd company logo
Introduction to AI Safety
Aryeh L. Englander
AMDS / A4I
Overview
2
What do we mean by Technical AI Safety?
3
• Critical systems: systems whose failure may lead to injury or loss of life, damage to the
environment, unauthorized disclosure of information, or serious financial losses
• Safety-critical systems: systems whose failure may result in injury, loss of life, or
serious environmental damage
• Technical AI safety: designing safety-critical AI systems (and more broadly, critical AI
systems) in ways that guard against accident risks – i.e., harms arising from AI systems
behaving in unintended ways
Sources:
- Ian Sommerville, supplement to Software Engineering (10th edition)
- Remco Zwetsloot and Allan Dafoe, “Thinking About Risks From AI:
Accidents, Misuse and Structure”
4
Other related concerns
• Security against exploits by adversaries
- Often considered part of AI Safety
• Misuse from people using AI in unethical or
malicious ways
- Ex: deepfakes, terrorism, suppression of dissent
• Machine ethics
- Designing AI systems to make ethical decisions
- Debate over lethal autonomous weapons
• Structural risks from AI shaping the
environment in subtle ways
- Ex: job loss, increased risks of arms races
• Governance, strategy, and policy
- Should government regulate AI?
- Who should be held accountable?
- How do we coordinate with other governments and
stakeholders to prevent risks?
• AI forecasting and risk analysis
- When are these concerns likely to materialize?
- How concerned should we be?
Adversarial examples: fooling AI into
thinking a stop sign is a 45 mph sign
(image source)
(image source)
Potential terrorist use of lethal
fully autonomous drones
(image source, based on a report from the OECD)
Jobs at risk of automation by AI
AI Safety research communities
• Two related research communities: AI Safety, Assured Autonomy
• AI Safety
- Focus on long-term risks from roughly human-level AI or beyond
- Also focused on near-term concerns that may scale up / provide insight into long-term issues
- Relatively new field – past 10 years or so
- Becoming progressively more mainstream
 Many leading AI researchers have expressed strong support for the research
 AI Safety research groups set up at several major universities and AI companies
• Assured Autonomy
- Older, established community with broader focus on assuring autonomous systems in general
- Recently started looking at challenges posed by machine learning
- Current and near-term focus
• In the past year both communities have finally started trying to collaborate and work out a
shared research landscape and vision
• APL’s focus: near- and mid-term concerns, but it would be nice if our research also scales up
to longer-term concerns
5
AI Safety: Lots of ways to frame conceptually
• Many different ways to divide up the problem space, and many different research
agendas from different organizations
• It can get pretty complicated
6
AI Safety Landscape overview from the Future of Life Institute (FLI)
Connections between different research agendas
(Source: Everitt et al, AGI Safety Literature Review)
7
AI Safety: DeepMind’s conceptual framework
Source: DeepMind Safety Research Blog
Assured Autonomy: AAIP conceptual framework
8
Source: Ashmore et al., Assuring the Machine Learning Lifecycle
AAIP = Assuring Autonomy International Programme (University of York)
Combined framework
• This is the proposed framework for
combining AI Safety and Assured Autonomy
research communities
• Also tries to address relevant topics from the
AI Ethics, Security and Privacy communities
• Until now these communities haven’t been
talking to each other as much as they
should
• Still in development; AAAI 2020 has a full-
day workshop on this
• Personal opinion: I like that it’s general, but I
think it’s a bit too general – best used only
for very abstract overviews of the field
9
= focus of AI Safety / Deepmind framework
= focus of Assured Autonomy / AAIP framework
My personal preference
Problems that scale up to long term:
DeepMind framework
10
Near-term machine learning:
AAIP framework
+ +
Everything else:
Combined framework
AI safety concerns and APL’s mission areas
• All of APL’s mission areas involve safety- or mission-critical systems
• The military is concerned with assurance rather than safety (obviously, military systems
are unsafe for the enemy), but the two concepts are very similar and involve similar
problems and solutions
• The government is very aware of these problems, and this is part of why the military has
been reluctant to adopt AI technologies
- Recent report from the Defense Innovation Board: primary document, supporting document
- Congressional Report on AI and National Security
- DARPA: Assured Autonomy program, Explainable AI program
• If we want to get the military to adopt the AI technologies we develop here, those
technologies will need to be assured and secure
11
Technical AI Safety
12
13
14
Specification problems
• These problems arise when there is a gap (often very subtle
and unnoticed) between what we really want and what the
system is actually optimizing for
• Powerful optimizers can find surprising and sometimes
undesirable solutions for objectives that are even subtly
mis-specified
• Often extremely difficult or impossible to fully specify
everything we really want
• Some examples:
- Specification gaming
- Avoiding side effects
- Unintended emergent behaviors
- Bugs and errors
15
Specification: Specification Gaming
• Agent exploits a flaw in the specification
• Powerful optimizers can find extremely
novel and potentially harmful solutions
• Example: evolved radio
• Example: Coast Runners
• There are many other similar examples
The evolvable motherboard that led to the evolved radio
A reinforcement learning agent discovers an unintended strategy
for achieving a higher score
(Source: OpenAI, Faulty Reward Functions in the Wild)
16
Specification: Specification Gaming (cont.)
• Can be a problem for classifiers as well:
The loss function (“reward”) might not
really be what we care about, and we
may not discover the discrepancy until
later
• Example: Bias
- We care about the difference between humans
and animals more than between breeds of
dogs, but loss function optimizes for all equally
- We only discovered this problem after it
caused major issues
• Example: Adversarial examples
- Deep Learning (DL) systems discovered weird
correlations that humans never thought to look
for, so predictions don’t match what we really
care about
- We only discovered this problem well after the
systems were in use
Google images misidentified black people as gorillas
(source)
Blank labels can make DL systems misidentify stop signs as
Speed Limit 45 MPH signs
(source)
17
Specification: Avoiding side effects
• What we really want: achieve goals
subject to common sense constraints
• But current systems do not have anything
like human common sense
• In any case would not by default
constrain itself unless specifically
programmed to do so
• Problem likely to get much more difficult
going forward:
- Increasingly complex, hard-to-predict
environments
- Increasing number of possible side effects
- Increasingly difficult to think of all those side
effects in advance
Two side effect scenarios
(source: DeepMind Safety Research blog)
Specification: Avoiding side effects (cont.)
• Standard TEV&V approach: brainstorm
with experts "what could possibly go
wrong?"
• In complex environments it might not be
possible to think about all the things that
could go wrong beforehand (unknown
unknowns) until it's too late
• Is there a general method we can use to
guard against even unknown unknowns?
• Ideas in this category
- Penalize changing the environment (example)
- Agent learns constraints by observing humans
(example)
18
Get from point A to point B – but don’t knock over the vase!
Can we think of all possible side effects like this in advance?
(image source)
19
Specification: Other problems
OpenAI’s hide and seek AI agents demonstrated
surprising emergent behaviors (source)
(image source)
• Emergent behaviors
- E.g., multi-agent systems, human-AI teams
- Makes it much more difficult to predict and
verify, which makes a lot of the above
problems worse
• Bugs and errors
- Can be even harder to find and correct logic
errors in complex ML systems (especially Deep
Learning) than in regular software systems
- (See later on TEV&V)
20
Robustness problems
• How to ensure that the system continues to operate within
safe limits upon perturbation
• Some examples:
- Distributional shift / generalization
- Safe exploration
- Security
Robustness: Distributional shift / generalization
• How do we get a system trained on one distribution to perform well and safely if it
encounters a different distribution after deployment?
• Especially, how do we get the system to proceed more carefully when it encounters
safety-critical situations that it did not encounter during training?
• Generalization is a well-known problem in ML, but more work needs to be done
• Some approaches:
- Cautious generalization
- “Knows what it knows”
- Expanding on anomaly detection techniques
21
(image source)
Robustness: Safe exploration
• If an RL agent uses online learning or needs to train in a real-world environment, then the
exploration itself needs to be safe
• Example: A self-driving car can't learn by experimenting with swerving onto sidewalks
• Restricting learning to a controlled, safe environment might not provide sufficient training
for some applications
22
How do we tell a cleaning robot not to experiment with sticking wet
brooms into sockets during training?
(image source)
Robustness: Security
• (Security is sometimes considered part of safety / assurance, and sometimes separate)
• ML systems pose unique security challenges
• Data poisoning: Adversaries can corrupt the training data, leading to undesirable results
• Adversarial examples: Adversaries can use tricks to fool ML systems
• Privacy and classified information: By probing ML systems, adversaries may be able
to uncover private or classified information that was used during training
23
What if an adversary fools an AI into
thinking a school bus is a tank?
24
• (DeepMind calls this Assurance, but that’s confusing since
we’ve also been discussing Assured Autonomy)
• Interpretability: Many ML systems (esp. DL) are mostly
black boxes
• Scalable oversight: It can be very difficult to provide
oversight of increasingly autonomous and complex agents
• Human override: We need to be able to shut down the
system if needed
- Building in mechanisms to do this is often difficult
- If the operator is part of the environment that the system learns
about, the AI could conceivably learn policies that try to avoid the
human shutting it down
 “You can't get the cup of coffee if you're dead"
 Example: robot blocks camera to avoid being shut off
Monitoring and Control
Scaling up testing, evaluation, verification, and validation
• The extremely complex, mostly black-box models learned by powerful Deep Learning
systems makes it difficult or impossible to scale up existing TEV&V techniques
• Hard to do enough testing or evaluation when the possible types of unusual inputs or
situations can be huge
• Most existing TEV&V techniques need to specify exactly what the boundaries are that we
care about, which can be difficult or intractable
• Often can only be verified in relatively simple constrained environments – doesn’t scale
up well to more complex environments
• Especially difficult to use standard TEV&V techniques for systems that continue to learn
after deployment (online learning)
• Also difficult to use TEV&V for multi-agent or human-machine teaming environments due
to possible emergent behaviors
25
26
Theoretical issues
• A lot of decision theory and game theory
breaks down if the agent is itself part of
the environment that it's learning about
• Reasoning correctly about powerful ML
systems might become very difficult and
lead to mistaken assumptions with
potentially dangerous consequences
• Especially difficult to model and predict
the actions of agents that can modify
themselves in some way or create other
agents
Embedding agents in the environment can lead to a host of theoretical problems
(source: MIRI Embedded Agency sequence)
Human-AI teaming
• Understanding the boundaries - often even the system designers don't really understand
where the system does or doesn't work
• Example: Researchers didn’t discover the problem of adversarial examples until well after
the systems were already in use; it took several more years to understand the causes of
the problem (and it’s still debated)
• Humans (even the designers) sometimes anthropomorphize too much and therefore use
faulty “machine theories of mind” – current ML systems do not process data and
information in the same way humans do
• Can lead to people trusting AI systems in unsafe situations
27
28
Systems engineering and best practices
• Careful design with safety / assurance issues in
mind from the start
• Getting people to incorporate the best technical
solutions and TEV&V tools
• Systems engineering perspective would likely be
very helpful, but further work is needed to adapt
systems / software engineering approaches to AI
• Training people to not using AI systems beyond
what they're good for
• Being aware of the dual use nature of AI and
developing / implementing best practices to
prevent malicious use (a different issue from
what we’ve been discussing)
- Examples: deepfakes, terrorist use of drones, AI-
powered cyber attacks, use by oppressive regimes
- Possibly borrowing techniques and practices from
other dual-use technologies, such as cybersecurity
(image source)
(image source)
Assuring the Machine Learning
Lifecycle
29
30
Data management
31
Model learning
32
Model verification
33
Model deployment
34
Final notes
• Some of these areas have received a significant amount of attention and research (e.g.,
adversarial examples, generalizability, safe exploration, interpretability), others not quite
as much (e.g., avoiding side effects, reward hacking, verification & validation)
• It's generally believed that if early programming languages such as C had been designed
from the ground up with security in mind, then computer security today would be in a
much stronger position
• We are mostly still in the early days of the most recent batch of powerful ML techniques
(mostly Deep Learning); we should probably build in safety / assurance and security from
the ground up
• Again, the military knows all this; if we want the military to adopt the AI technologies that
we develop here, those technologies will need to be assured and secure
35
Research groups outside APL (partial list)
• Technical AI Safety
- DeepMind safety research (two teams – AI Safety team, Robust & Verified Deep Learning team)
- OpenAI safety team (no particular team website – core part of their mission)
- Machine Intelligence Research Institute (MIRI)
- Stanford AI Safety research group
- Center for Human-Compatible AI (CHAI, UC Berkeley)
• Assured Autonomy
- Institute for Assured Autonomy (IAA, partnership between Johns Hopkins University and APL)
- Assuring Autonomy International Programme (University of York)
- University of Pennsylvania Assured Autonomy research group
- University of Waterloo AssuredAI project
• AI Safety Risks – Strategy, Policy, Analysis
- Future of Life Institute (MIT)
- Future of Humanity Institute (University of Oxford)
- Center for the Study of Existential Risk (CSER, University of Cambridge)
- Center for Security and Emerging Technology (CSET, Georgetown University)
• Many of these organizations are closely tied to the Effective Altruism movement
36
Primary reading
• Technical AI Safety
- Amodei et al, Concrete Problems in AI Safety (2016) – still probably the best technical introduction
- Alignment Newsletter – excellent coverage of related research
 Podcast version
 Database of all links from previous newsletters, arranged by topic – covers almost all major papers
related to the field from the past year or two
- DeepMind’s Safety Research blog
- Informal document from Jacob Steinhardt (UC Berkeley) - overview of several current research directions
• Assured Autonomy: Ashmore et al, Assuring the Machine Learning Lifecycle (2019)
• Longer-term concerns
- Stuart Russell, Human Compatible: Artificial Intelligence and the Problem of Control (2019)
- Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)
 Excellent series of posts summarizing each chapter and providing additional notes
- [Tom Chivers, The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World
(2019) – lighter overview of the subject from a journalist; includes a good history of the AI Safety
movement and other closely related groups]
37
Partial bibliography: General / Literature Reviews
• Saria et al (JHU), Tutorial on safe and reliable ML (2019); video, slides, references
• Richard Mallah (Future of Life Institute), “The Landscape of AI Safety and Beneficence
Research,” 2017
• Hernandez-Orallo et al, Surveying Safety-relevant AI Characteristics (2019)
• Rohin Shah (UC Berkeley), An overview of technical AGI alignment (podcast episode with
transcript, 2019) – part 1, part 2, related video lecture
• Everitt et al, AGI Safety literature review (2018)
• Paul Christiano, AI alignment landscape (2019 blog post)
• Andrew Critch and Stuart Russell, detailed syllabus with links from a fall 2018 AGI Safety
course at UC Berkeley
• Joel Lehman (Uber), Evolutionary Computation and AI Safety: Research Problems
Impeding Routine and Safe Real-world Application of Evolution (2019)
• Victoria Krakovna, AI safety resources list
38
Partial bibliography: Technical AI Safety literature
• AI Alignment Forum, including several good curated post sequences
• Paul Chrisiano, Directions and desiderata for AI alignment (2017 blog post)
• Rohin Shah (UC Berkeley), Value Learning sequence (2018) – gives a thorough introduction to the
problem and explains some of the most promising approaches
• Leike et al (DeepMind), Reward Modeling (2018); associated blog post
• Dylan Hadfield-Menell (UC Berkeley), Cooperative Inverse Reinforcement Learning (2016); associated
podcast episode; also see this video lecture
• Dylan Hadfield-Menell (UC Berkeley), Inverse Reward Design (2017)
• Christiano et al (OpenAI), Iterative Amplification (2018); associated blog post; Iterative Amplification
sequence on the Alignment Forum
• Irving et al (OpenAI), Value alignment via debate (2018); associated blog post, podcast episode
• Christiano et al (OpenAI, DeepMind), Deep reinforcement learning from human preferences (2017)
• Andreas Stuhlmüller (Ought), Factored Cognition (2018 blog post)
• Stuart Armstrong (MIRI / FHI), Research Agenda v0.9: Synthesizing a human's preferences into a utility
function (2019 blog post)
39
Partial bibliography: Assured Autonomy literature
• University of York, Assuring Autonomy Body of Knowledge (in development)
• Assuring Autonomy International Program, list of research papers
• Sandeep Neema (DARPA), Assured Autonomy presentation (2019)
• Schwarting et al (MIT, Delft University), Planning and Decision-Making for Autonomous Vehicles (2018)
• Kuwajima et al, Open Problems in Engineering Machine Learning Systems and the Quality Model (2019)
• Colinescu et al (University of York), Socio-Cyber-Physical Systems: Models, Opportunities, Open
Challenges (2019) – focuses on the human component of human-machine teaming
• Salay et al (University of Waterloo), Using Machine Learning Safely in Automotive Software (2018)
• Czarnecki et al (University of Waterloo), Towards a Framework to Manage Perceptual Uncertainty for
Safe Automated Driving (2018)
• Colinescu et al (University of York), Engineering Trustworthy Self-Adaptive Software with Dynamic
Assurance Cases (2017)
• Lee et al (University of Waterloo), WiseMove: A Framework for Safe Deep Reinforcement Learning for
Autonomous Driving (2019)
• Garcia et al, A Comprehensive Survey on Safe Reinforcement Learning (2015)
40
Partial bibliography: Misc.
• Avoiding side effects
- Krakovna et al (DeepMind), Penalizing side effects using stepwise relative reachability (2019); associated blog post
- Alex Turner, Towards a new impact measure (2018 blog post)
- Achiam et al (UC Berkeley), Constrained Policy Optimization (2017)
• Testing and verification
- Defense Innovation Board, AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department
of Defense, Appendix IV.C (2019) – study by the MITRE Corporation on the state of AI T&E
- Kohli et al (DeepMind), Towards Robust and Verified AI: Specification Testing, Robust Training, and Formal Verification
(2019 blog post) – references several important papers on testing and validation of advanced ML techniques, and
summarizes some of DeepMind’s research in this area
- Haugh et al, The Status of Test, Evaluation, Verification, and Validation (TEV&V) of Autonomous Systems (2018)
- Hains et al, Formal methods and software engineering for DL (2019)
• Security: Xiao et al, Characterizing Attacks on Deep Reinforcement Learning (2019)
• Control: Babcock et al, Guidelines for Artificial Intelligence Containment (2017)
• Risks from emergent behavior: Jesse Clifton, Cooperation, Conflict, and Transformative Artificial
Intelligence: A Research Agenda (blog post sequence, 2019)
• Long term risks:
- AI Impacts
- Ben Cottier and Rohin Shah, Clarifying some key hypotheses in AI alignment (blog post, 2019)
41
Introduction to AI Safety (public presentation).pptx

More Related Content

Similar to Introduction to AI Safety (public presentation).pptx

AI cybersecurity
AI cybersecurityAI cybersecurity
AI cybersecurity
ShauryaGupta38
 
Testing Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidTesting Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking Stupid
Steve Branam
 
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Steve Werby
 
E-safety Impero slides Mar2015
E-safety Impero slides Mar2015E-safety Impero slides Mar2015
E-safety Impero slides Mar2015
James Grew
 
Intro to INFOSEC
Intro to INFOSECIntro to INFOSEC
Intro to INFOSEC
Sean Whalen
 
Current Article Review1. Locate a current article about Regul.docx
Current Article Review1. Locate a current article about Regul.docxCurrent Article Review1. Locate a current article about Regul.docx
Current Article Review1. Locate a current article about Regul.docx
annettsparrow
 
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin DunnNetworking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
North Texas Chapter of the ISSA
 
Beyond security testing
Beyond security testingBeyond security testing
Beyond security testing
Cu Nguyen
 
Jack Whitsitt - Yours, Anecdotally
Jack Whitsitt - Yours, AnecdotallyJack Whitsitt - Yours, Anecdotally
Jack Whitsitt - Yours, Anecdotally
EnergySec
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona
 
Secure software chapman
Secure software chapmanSecure software chapman
Secure software chapman
AdaCore
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
Priyanka Aash
 
Threat Modeling: Applied on a Publish-Subscribe Architectural Style
Threat Modeling: Applied on a Publish-Subscribe Architectural StyleThreat Modeling: Applied on a Publish-Subscribe Architectural Style
Threat Modeling: Applied on a Publish-Subscribe Architectural Style
Dharmalingam Ganesan
 
Can You Really Automate Yourself Secure
Can You Really Automate Yourself SecureCan You Really Automate Yourself Secure
Can You Really Automate Yourself Secure
Cigital
 
Embedded Systems Security
Embedded Systems Security Embedded Systems Security
Embedded Systems Security
Malachi Jones
 
Keynote at the Cyber Security Summit Prague 2015
Keynote at the Cyber Security Summit Prague 2015Keynote at the Cyber Security Summit Prague 2015
Keynote at the Cyber Security Summit Prague 2015
Claus Cramon Houmann
 
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdfTru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
Trupti Shiralkar, CISSP
 
02-overview.pptx
02-overview.pptx02-overview.pptx
02-overview.pptx
EmanAzam
 
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
EC-Council
 
Computer Ethics_Satyajit Patil.pptx
Computer Ethics_Satyajit Patil.pptxComputer Ethics_Satyajit Patil.pptx
Computer Ethics_Satyajit Patil.pptx
SATYAJIT58
 

Similar to Introduction to AI Safety (public presentation).pptx (20)

AI cybersecurity
AI cybersecurityAI cybersecurity
AI cybersecurity
 
Testing Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidTesting Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking Stupid
 
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
 
E-safety Impero slides Mar2015
E-safety Impero slides Mar2015E-safety Impero slides Mar2015
E-safety Impero slides Mar2015
 
Intro to INFOSEC
Intro to INFOSECIntro to INFOSEC
Intro to INFOSEC
 
Current Article Review1. Locate a current article about Regul.docx
Current Article Review1. Locate a current article about Regul.docxCurrent Article Review1. Locate a current article about Regul.docx
Current Article Review1. Locate a current article about Regul.docx
 
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin DunnNetworking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
Networking 2016-06-14 - The Dirty Secrets of Enterprise Security by Kevin Dunn
 
Beyond security testing
Beyond security testingBeyond security testing
Beyond security testing
 
Jack Whitsitt - Yours, Anecdotally
Jack Whitsitt - Yours, AnecdotallyJack Whitsitt - Yours, Anecdotally
Jack Whitsitt - Yours, Anecdotally
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
Secure software chapman
Secure software chapmanSecure software chapman
Secure software chapman
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
 
Threat Modeling: Applied on a Publish-Subscribe Architectural Style
Threat Modeling: Applied on a Publish-Subscribe Architectural StyleThreat Modeling: Applied on a Publish-Subscribe Architectural Style
Threat Modeling: Applied on a Publish-Subscribe Architectural Style
 
Can You Really Automate Yourself Secure
Can You Really Automate Yourself SecureCan You Really Automate Yourself Secure
Can You Really Automate Yourself Secure
 
Embedded Systems Security
Embedded Systems Security Embedded Systems Security
Embedded Systems Security
 
Keynote at the Cyber Security Summit Prague 2015
Keynote at the Cyber Security Summit Prague 2015Keynote at the Cyber Security Summit Prague 2015
Keynote at the Cyber Security Summit Prague 2015
 
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdfTru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
 
02-overview.pptx
02-overview.pptx02-overview.pptx
02-overview.pptx
 
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
Finding the Sweet Spot: Counter Honeypot Operations (CHOps) by Jonathan Creek...
 
Computer Ethics_Satyajit Patil.pptx
Computer Ethics_Satyajit Patil.pptxComputer Ethics_Satyajit Patil.pptx
Computer Ethics_Satyajit Patil.pptx
 

Recently uploaded

FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
MuhammadJazib15
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
LokerXu2
 
SMT process how to making and defects finding
SMT process how to making and defects findingSMT process how to making and defects finding
SMT process how to making and defects finding
rameshqapcba
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdfFUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
EMERSON EDUARDO RODRIGUES
 

Recently uploaded (20)

FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
 
SMT process how to making and defects finding
SMT process how to making and defects findingSMT process how to making and defects finding
SMT process how to making and defects finding
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdfFUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
 

Introduction to AI Safety (public presentation).pptx

  • 1. Introduction to AI Safety Aryeh L. Englander AMDS / A4I
  • 3. What do we mean by Technical AI Safety? 3 • Critical systems: systems whose failure may lead to injury or loss of life, damage to the environment, unauthorized disclosure of information, or serious financial losses • Safety-critical systems: systems whose failure may result in injury, loss of life, or serious environmental damage • Technical AI safety: designing safety-critical AI systems (and more broadly, critical AI systems) in ways that guard against accident risks – i.e., harms arising from AI systems behaving in unintended ways Sources: - Ian Sommerville, supplement to Software Engineering (10th edition) - Remco Zwetsloot and Allan Dafoe, “Thinking About Risks From AI: Accidents, Misuse and Structure”
  • 4. 4 Other related concerns • Security against exploits by adversaries - Often considered part of AI Safety • Misuse from people using AI in unethical or malicious ways - Ex: deepfakes, terrorism, suppression of dissent • Machine ethics - Designing AI systems to make ethical decisions - Debate over lethal autonomous weapons • Structural risks from AI shaping the environment in subtle ways - Ex: job loss, increased risks of arms races • Governance, strategy, and policy - Should government regulate AI? - Who should be held accountable? - How do we coordinate with other governments and stakeholders to prevent risks? • AI forecasting and risk analysis - When are these concerns likely to materialize? - How concerned should we be? Adversarial examples: fooling AI into thinking a stop sign is a 45 mph sign (image source) (image source) Potential terrorist use of lethal fully autonomous drones (image source, based on a report from the OECD) Jobs at risk of automation by AI
  • 5. AI Safety research communities • Two related research communities: AI Safety, Assured Autonomy • AI Safety - Focus on long-term risks from roughly human-level AI or beyond - Also focused on near-term concerns that may scale up / provide insight into long-term issues - Relatively new field – past 10 years or so - Becoming progressively more mainstream  Many leading AI researchers have expressed strong support for the research  AI Safety research groups set up at several major universities and AI companies • Assured Autonomy - Older, established community with broader focus on assuring autonomous systems in general - Recently started looking at challenges posed by machine learning - Current and near-term focus • In the past year both communities have finally started trying to collaborate and work out a shared research landscape and vision • APL’s focus: near- and mid-term concerns, but it would be nice if our research also scales up to longer-term concerns 5
  • 6. AI Safety: Lots of ways to frame conceptually • Many different ways to divide up the problem space, and many different research agendas from different organizations • It can get pretty complicated 6 AI Safety Landscape overview from the Future of Life Institute (FLI) Connections between different research agendas (Source: Everitt et al, AGI Safety Literature Review)
  • 7. 7 AI Safety: DeepMind’s conceptual framework Source: DeepMind Safety Research Blog
  • 8. Assured Autonomy: AAIP conceptual framework 8 Source: Ashmore et al., Assuring the Machine Learning Lifecycle AAIP = Assuring Autonomy International Programme (University of York)
  • 9. Combined framework • This is the proposed framework for combining AI Safety and Assured Autonomy research communities • Also tries to address relevant topics from the AI Ethics, Security and Privacy communities • Until now these communities haven’t been talking to each other as much as they should • Still in development; AAAI 2020 has a full- day workshop on this • Personal opinion: I like that it’s general, but I think it’s a bit too general – best used only for very abstract overviews of the field 9 = focus of AI Safety / Deepmind framework = focus of Assured Autonomy / AAIP framework
  • 10. My personal preference Problems that scale up to long term: DeepMind framework 10 Near-term machine learning: AAIP framework + + Everything else: Combined framework
  • 11. AI safety concerns and APL’s mission areas • All of APL’s mission areas involve safety- or mission-critical systems • The military is concerned with assurance rather than safety (obviously, military systems are unsafe for the enemy), but the two concepts are very similar and involve similar problems and solutions • The government is very aware of these problems, and this is part of why the military has been reluctant to adopt AI technologies - Recent report from the Defense Innovation Board: primary document, supporting document - Congressional Report on AI and National Security - DARPA: Assured Autonomy program, Explainable AI program • If we want to get the military to adopt the AI technologies we develop here, those technologies will need to be assured and secure 11
  • 13. 13
  • 14. 14 Specification problems • These problems arise when there is a gap (often very subtle and unnoticed) between what we really want and what the system is actually optimizing for • Powerful optimizers can find surprising and sometimes undesirable solutions for objectives that are even subtly mis-specified • Often extremely difficult or impossible to fully specify everything we really want • Some examples: - Specification gaming - Avoiding side effects - Unintended emergent behaviors - Bugs and errors
  • 15. 15 Specification: Specification Gaming • Agent exploits a flaw in the specification • Powerful optimizers can find extremely novel and potentially harmful solutions • Example: evolved radio • Example: Coast Runners • There are many other similar examples The evolvable motherboard that led to the evolved radio A reinforcement learning agent discovers an unintended strategy for achieving a higher score (Source: OpenAI, Faulty Reward Functions in the Wild)
  • 16. 16 Specification: Specification Gaming (cont.) • Can be a problem for classifiers as well: The loss function (“reward”) might not really be what we care about, and we may not discover the discrepancy until later • Example: Bias - We care about the difference between humans and animals more than between breeds of dogs, but loss function optimizes for all equally - We only discovered this problem after it caused major issues • Example: Adversarial examples - Deep Learning (DL) systems discovered weird correlations that humans never thought to look for, so predictions don’t match what we really care about - We only discovered this problem well after the systems were in use Google images misidentified black people as gorillas (source) Blank labels can make DL systems misidentify stop signs as Speed Limit 45 MPH signs (source)
  • 17. 17 Specification: Avoiding side effects • What we really want: achieve goals subject to common sense constraints • But current systems do not have anything like human common sense • In any case would not by default constrain itself unless specifically programmed to do so • Problem likely to get much more difficult going forward: - Increasingly complex, hard-to-predict environments - Increasing number of possible side effects - Increasingly difficult to think of all those side effects in advance Two side effect scenarios (source: DeepMind Safety Research blog)
  • 18. Specification: Avoiding side effects (cont.) • Standard TEV&V approach: brainstorm with experts "what could possibly go wrong?" • In complex environments it might not be possible to think about all the things that could go wrong beforehand (unknown unknowns) until it's too late • Is there a general method we can use to guard against even unknown unknowns? • Ideas in this category - Penalize changing the environment (example) - Agent learns constraints by observing humans (example) 18 Get from point A to point B – but don’t knock over the vase! Can we think of all possible side effects like this in advance? (image source)
  • 19. 19 Specification: Other problems OpenAI’s hide and seek AI agents demonstrated surprising emergent behaviors (source) (image source) • Emergent behaviors - E.g., multi-agent systems, human-AI teams - Makes it much more difficult to predict and verify, which makes a lot of the above problems worse • Bugs and errors - Can be even harder to find and correct logic errors in complex ML systems (especially Deep Learning) than in regular software systems - (See later on TEV&V)
  • 20. 20 Robustness problems • How to ensure that the system continues to operate within safe limits upon perturbation • Some examples: - Distributional shift / generalization - Safe exploration - Security
  • 21. Robustness: Distributional shift / generalization • How do we get a system trained on one distribution to perform well and safely if it encounters a different distribution after deployment? • Especially, how do we get the system to proceed more carefully when it encounters safety-critical situations that it did not encounter during training? • Generalization is a well-known problem in ML, but more work needs to be done • Some approaches: - Cautious generalization - “Knows what it knows” - Expanding on anomaly detection techniques 21 (image source)
  • 22. Robustness: Safe exploration • If an RL agent uses online learning or needs to train in a real-world environment, then the exploration itself needs to be safe • Example: A self-driving car can't learn by experimenting with swerving onto sidewalks • Restricting learning to a controlled, safe environment might not provide sufficient training for some applications 22 How do we tell a cleaning robot not to experiment with sticking wet brooms into sockets during training? (image source)
  • 23. Robustness: Security • (Security is sometimes considered part of safety / assurance, and sometimes separate) • ML systems pose unique security challenges • Data poisoning: Adversaries can corrupt the training data, leading to undesirable results • Adversarial examples: Adversaries can use tricks to fool ML systems • Privacy and classified information: By probing ML systems, adversaries may be able to uncover private or classified information that was used during training 23 What if an adversary fools an AI into thinking a school bus is a tank?
  • 24. 24 • (DeepMind calls this Assurance, but that’s confusing since we’ve also been discussing Assured Autonomy) • Interpretability: Many ML systems (esp. DL) are mostly black boxes • Scalable oversight: It can be very difficult to provide oversight of increasingly autonomous and complex agents • Human override: We need to be able to shut down the system if needed - Building in mechanisms to do this is often difficult - If the operator is part of the environment that the system learns about, the AI could conceivably learn policies that try to avoid the human shutting it down  “You can't get the cup of coffee if you're dead"  Example: robot blocks camera to avoid being shut off Monitoring and Control
  • 25. Scaling up testing, evaluation, verification, and validation • The extremely complex, mostly black-box models learned by powerful Deep Learning systems makes it difficult or impossible to scale up existing TEV&V techniques • Hard to do enough testing or evaluation when the possible types of unusual inputs or situations can be huge • Most existing TEV&V techniques need to specify exactly what the boundaries are that we care about, which can be difficult or intractable • Often can only be verified in relatively simple constrained environments – doesn’t scale up well to more complex environments • Especially difficult to use standard TEV&V techniques for systems that continue to learn after deployment (online learning) • Also difficult to use TEV&V for multi-agent or human-machine teaming environments due to possible emergent behaviors 25
  • 26. 26 Theoretical issues • A lot of decision theory and game theory breaks down if the agent is itself part of the environment that it's learning about • Reasoning correctly about powerful ML systems might become very difficult and lead to mistaken assumptions with potentially dangerous consequences • Especially difficult to model and predict the actions of agents that can modify themselves in some way or create other agents Embedding agents in the environment can lead to a host of theoretical problems (source: MIRI Embedded Agency sequence)
  • 27. Human-AI teaming • Understanding the boundaries - often even the system designers don't really understand where the system does or doesn't work • Example: Researchers didn’t discover the problem of adversarial examples until well after the systems were already in use; it took several more years to understand the causes of the problem (and it’s still debated) • Humans (even the designers) sometimes anthropomorphize too much and therefore use faulty “machine theories of mind” – current ML systems do not process data and information in the same way humans do • Can lead to people trusting AI systems in unsafe situations 27
  • 28. 28 Systems engineering and best practices • Careful design with safety / assurance issues in mind from the start • Getting people to incorporate the best technical solutions and TEV&V tools • Systems engineering perspective would likely be very helpful, but further work is needed to adapt systems / software engineering approaches to AI • Training people to not using AI systems beyond what they're good for • Being aware of the dual use nature of AI and developing / implementing best practices to prevent malicious use (a different issue from what we’ve been discussing) - Examples: deepfakes, terrorist use of drones, AI- powered cyber attacks, use by oppressive regimes - Possibly borrowing techniques and practices from other dual-use technologies, such as cybersecurity (image source) (image source)
  • 29. Assuring the Machine Learning Lifecycle 29
  • 30. 30
  • 35. Final notes • Some of these areas have received a significant amount of attention and research (e.g., adversarial examples, generalizability, safe exploration, interpretability), others not quite as much (e.g., avoiding side effects, reward hacking, verification & validation) • It's generally believed that if early programming languages such as C had been designed from the ground up with security in mind, then computer security today would be in a much stronger position • We are mostly still in the early days of the most recent batch of powerful ML techniques (mostly Deep Learning); we should probably build in safety / assurance and security from the ground up • Again, the military knows all this; if we want the military to adopt the AI technologies that we develop here, those technologies will need to be assured and secure 35
  • 36. Research groups outside APL (partial list) • Technical AI Safety - DeepMind safety research (two teams – AI Safety team, Robust & Verified Deep Learning team) - OpenAI safety team (no particular team website – core part of their mission) - Machine Intelligence Research Institute (MIRI) - Stanford AI Safety research group - Center for Human-Compatible AI (CHAI, UC Berkeley) • Assured Autonomy - Institute for Assured Autonomy (IAA, partnership between Johns Hopkins University and APL) - Assuring Autonomy International Programme (University of York) - University of Pennsylvania Assured Autonomy research group - University of Waterloo AssuredAI project • AI Safety Risks – Strategy, Policy, Analysis - Future of Life Institute (MIT) - Future of Humanity Institute (University of Oxford) - Center for the Study of Existential Risk (CSER, University of Cambridge) - Center for Security and Emerging Technology (CSET, Georgetown University) • Many of these organizations are closely tied to the Effective Altruism movement 36
  • 37. Primary reading • Technical AI Safety - Amodei et al, Concrete Problems in AI Safety (2016) – still probably the best technical introduction - Alignment Newsletter – excellent coverage of related research  Podcast version  Database of all links from previous newsletters, arranged by topic – covers almost all major papers related to the field from the past year or two - DeepMind’s Safety Research blog - Informal document from Jacob Steinhardt (UC Berkeley) - overview of several current research directions • Assured Autonomy: Ashmore et al, Assuring the Machine Learning Lifecycle (2019) • Longer-term concerns - Stuart Russell, Human Compatible: Artificial Intelligence and the Problem of Control (2019) - Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)  Excellent series of posts summarizing each chapter and providing additional notes - [Tom Chivers, The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World (2019) – lighter overview of the subject from a journalist; includes a good history of the AI Safety movement and other closely related groups] 37
  • 38. Partial bibliography: General / Literature Reviews • Saria et al (JHU), Tutorial on safe and reliable ML (2019); video, slides, references • Richard Mallah (Future of Life Institute), “The Landscape of AI Safety and Beneficence Research,” 2017 • Hernandez-Orallo et al, Surveying Safety-relevant AI Characteristics (2019) • Rohin Shah (UC Berkeley), An overview of technical AGI alignment (podcast episode with transcript, 2019) – part 1, part 2, related video lecture • Everitt et al, AGI Safety literature review (2018) • Paul Christiano, AI alignment landscape (2019 blog post) • Andrew Critch and Stuart Russell, detailed syllabus with links from a fall 2018 AGI Safety course at UC Berkeley • Joel Lehman (Uber), Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution (2019) • Victoria Krakovna, AI safety resources list 38
  • 39. Partial bibliography: Technical AI Safety literature • AI Alignment Forum, including several good curated post sequences • Paul Chrisiano, Directions and desiderata for AI alignment (2017 blog post) • Rohin Shah (UC Berkeley), Value Learning sequence (2018) – gives a thorough introduction to the problem and explains some of the most promising approaches • Leike et al (DeepMind), Reward Modeling (2018); associated blog post • Dylan Hadfield-Menell (UC Berkeley), Cooperative Inverse Reinforcement Learning (2016); associated podcast episode; also see this video lecture • Dylan Hadfield-Menell (UC Berkeley), Inverse Reward Design (2017) • Christiano et al (OpenAI), Iterative Amplification (2018); associated blog post; Iterative Amplification sequence on the Alignment Forum • Irving et al (OpenAI), Value alignment via debate (2018); associated blog post, podcast episode • Christiano et al (OpenAI, DeepMind), Deep reinforcement learning from human preferences (2017) • Andreas Stuhlmüller (Ought), Factored Cognition (2018 blog post) • Stuart Armstrong (MIRI / FHI), Research Agenda v0.9: Synthesizing a human's preferences into a utility function (2019 blog post) 39
  • 40. Partial bibliography: Assured Autonomy literature • University of York, Assuring Autonomy Body of Knowledge (in development) • Assuring Autonomy International Program, list of research papers • Sandeep Neema (DARPA), Assured Autonomy presentation (2019) • Schwarting et al (MIT, Delft University), Planning and Decision-Making for Autonomous Vehicles (2018) • Kuwajima et al, Open Problems in Engineering Machine Learning Systems and the Quality Model (2019) • Colinescu et al (University of York), Socio-Cyber-Physical Systems: Models, Opportunities, Open Challenges (2019) – focuses on the human component of human-machine teaming • Salay et al (University of Waterloo), Using Machine Learning Safely in Automotive Software (2018) • Czarnecki et al (University of Waterloo), Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving (2018) • Colinescu et al (University of York), Engineering Trustworthy Self-Adaptive Software with Dynamic Assurance Cases (2017) • Lee et al (University of Waterloo), WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving (2019) • Garcia et al, A Comprehensive Survey on Safe Reinforcement Learning (2015) 40
  • 41. Partial bibliography: Misc. • Avoiding side effects - Krakovna et al (DeepMind), Penalizing side effects using stepwise relative reachability (2019); associated blog post - Alex Turner, Towards a new impact measure (2018 blog post) - Achiam et al (UC Berkeley), Constrained Policy Optimization (2017) • Testing and verification - Defense Innovation Board, AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department of Defense, Appendix IV.C (2019) – study by the MITRE Corporation on the state of AI T&E - Kohli et al (DeepMind), Towards Robust and Verified AI: Specification Testing, Robust Training, and Formal Verification (2019 blog post) – references several important papers on testing and validation of advanced ML techniques, and summarizes some of DeepMind’s research in this area - Haugh et al, The Status of Test, Evaluation, Verification, and Validation (TEV&V) of Autonomous Systems (2018) - Hains et al, Formal methods and software engineering for DL (2019) • Security: Xiao et al, Characterizing Attacks on Deep Reinforcement Learning (2019) • Control: Babcock et al, Guidelines for Artificial Intelligence Containment (2017) • Risks from emergent behavior: Jesse Clifton, Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda (blog post sequence, 2019) • Long term risks: - AI Impacts - Ben Cottier and Rohin Shah, Clarifying some key hypotheses in AI alignment (blog post, 2019) 41

Editor's Notes

  1. These are debateably part of AI Safety
  2. We must be able to fully specify what we want the system to do The system must be able to robustly achieve its goals We need assurance that the system is doing what we want
  3. Heather Roff was primary author on the DIB report
  翻译: