Soroush J. Pour
San Francisco, California, United States
5K followers
500+ connections
About
Building Harmony Intelligence, where we provide AI-powered, human-verified white-box…
Experience
Education
-
ARENA (Alignment Research Engineer Accelerator)
-
https://www.arena.education/ - 5 week intensive program focused on accelerated learning of technical fundamentals of AI safety research engineering. Shared an office & collaborated with SERI MATS cohort. Coursework covered fundamentals of:
* Deep learning
* Transformer architecture
* Reinforcement learning
* GPU training: optimisation per GPU (incl. CUDA kernel programming) & distributed training.
Capstone project: automated red-teaming of LLMs (to be published soon) -
-
-
-
-
-
Publications
-
The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence
arXiv preprint arXiv:2408.12622
See publicationThe risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference. This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two…
The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference. This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two overarching taxonomies and easily accessed, modified, and updated via our website and online spreadsheets. We construct our Repository with a systematic review of taxonomies and other structured classifications of AI risk followed by an expert consultation. We develop our taxonomies of AI risk using a best-fit framework synthesis. Our high-level Causal Taxonomy of AI Risks classifies each risk by its causal factors (1) Entity: Human, AI; (2) Intentionality: Intentional, Unintentional; and (3) Timing: Pre-deployment; Post-deployment. Our mid-level Domain Taxonomy of AI Risks classifies risks into seven AI risk domains: (1) Discrimination & toxicity, (2) Privacy & security, (3) Misinformation, (4) Malicious actors & misuse, (5) Human-computer interaction, (6) Socioeconomic & environmental, and (7) AI system safety, failures, & limitations. These are further divided into 23 subdomains. The AI Risk Repository is, to our knowledge, the first attempt to rigorously curate, analyze, and extract AI risk frameworks into a publicly accessible, comprehensive, extensible, and categorized risk database. This creates a foundation for a more coordinated, coherent, and complete approach to defining, auditing, and managing the risks posed by AI systems.
-
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Accepted to NeurIPS SoLaR 2023
Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a…
Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a range of harmful completions made possible by persona modulation, including detailed instructions for synthesising methamphetamine, building a bomb, and laundering money. These automated attacks achieve a harmful completion rate of 42.5% in GPT-4, which is 185 times larger than before modulation (0.23%). These prompts also transfer to Claude 2 and Vicuna with harmful completion rates of 61.0% and 35.9%, respectively. Our work reveals yet another vulnerability in commercial large language models and highlights the need for more comprehensive safeguards.
Other authorsSee publication
Projects
-
Bitcoin Multisignature Transaction Builder
See projectBuilt open-source Go implementation of Bitcoin protocol Multisignature transactions. Focus on usability and maintainability, with full suite of tests, complete GoDoc style documentation and examples for CLI interface. Blog post: http://bit.ly/1CQLwHA
-
Built InMoov Humanoid Robotic Arm
3D printed and assembled the InMoov Humanoid Robotic Arm as detailed on: http://www.inmoov.fr/ We succeeded in getting full finger, wrist, bicep and shoulder articulation. We hooked up the arm to an Arduino and wrote a script to control the arm from the command line.
Other creatorsSee project -
Dubbit Cartoon Creator App
Built an iOS app that allows users to create animated cartoons and share them with friends. We used the Cocos2D library on the client side, ZeroMQ to communicate with a Python backend server converting character position coordinates into .mp4 movie files using a Pygame rendering engine, FFmpeg and the Youtube API for hosting.
Other creatorsSee project -
Mobile Live Chat Support
Began development on a mobile live chat iOS plug-in that would enable live chat integration into iOS apps. Built with a Node.js backend with XMPP support and a frontend library and UI written in Objective-C.
Other creatorsSee project
Honors & Awards
-
LLM Evals Hackathon - Honourable Mention - AI Pentester
AGI House (https://agihouse.ai/)
Alex Browne and I won honourable mention for our GPT-4 AI pentesting agent ("AI Pentester"), which was coded up in just ~5 hours & was able to exploit a vulnerability in a target server given access to a base Kali Linux bash instance and basic instructions -- nothing else. The vulnerability was trivial & known CVE, but still impressed us that it worked at all, especially after with such minimal development time on our part!
-
Frank Borchardt Prize for Undergraduate Entrepreneurship
Duke Innovation & Entrepreneurship
$20,000 grant to support the top undergrad entrepreneurs at Duke. My collaborators (Fabio Berger, Alex Browne) & I were the inaugural winners for our startup work throughout our college years.
https://entrepreneurship.duke.edu/borchardt-prize/ -
Magna cum Laude
Duke University
-
Robertson Scholarship
Robertson Scholars Program
The Robertson Scholars Leadership Program invests in young leaders who strive to make transformational contributions to society.
The scholarship provided:
- Four-year scholarship, including undergraduate tuition, room and board
- Attending classes at Duke and UNC-Chapel Hill
- Three summers of domestic and international experiences
- Customized leadership and professional development
http://robertsonscholars.org/
Languages
-
English
Native or bilingual proficiency
-
Japanese
Limited working proficiency
-
Persian
Limited working proficiency
-
Spanish
Limited working proficiency
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content