Soroush J. Pour

Soroush J. Pour

San Francisco, California, United States
5K followers 500+ connections

About

Building Harmony Intelligence, where we provide AI-powered, human-verified white-box…

Experience

  • Harmony Intelligence Graphic
  • -

    Greater Shepparton, Victoria, Australia

  • -

  • -

    Sydney, New South Wales, Australia

  • -

    Sydney, New South Wales, Australia

  • -

    Sydney, Australia

  • -

  • -

    Sydney, Australia

  • -

    Multiple locations in NZ, Australia, USA, LATAM

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    Raleigh-Durham, North Carolina Area

  • -

    Berlin Area, Germany

  • -

Education

  • ARENA (Alignment Research Engineer Accelerator)

    -

    https://www.arena.education/ - 5 week intensive program focused on accelerated learning of technical fundamentals of AI safety research engineering. Shared an office & collaborated with SERI MATS cohort. Coursework covered fundamentals of:

    * Deep learning
    * Transformer architecture
    * Reinforcement learning
    * GPU training: optimisation per GPU (incl. CUDA kernel programming) & distributed training.

    Capstone project: automated red-teaming of LLMs (to be published soon)

  • -

  • -

  • -

Publications

  • The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence

    arXiv preprint arXiv:2408.12622

    The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference. This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two…

    The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference. This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two overarching taxonomies and easily accessed, modified, and updated via our website and online spreadsheets. We construct our Repository with a systematic review of taxonomies and other structured classifications of AI risk followed by an expert consultation. We develop our taxonomies of AI risk using a best-fit framework synthesis. Our high-level Causal Taxonomy of AI Risks classifies each risk by its causal factors (1) Entity: Human, AI; (2) Intentionality: Intentional, Unintentional; and (3) Timing: Pre-deployment; Post-deployment. Our mid-level Domain Taxonomy of AI Risks classifies risks into seven AI risk domains: (1) Discrimination & toxicity, (2) Privacy & security, (3) Misinformation, (4) Malicious actors & misuse, (5) Human-computer interaction, (6) Socioeconomic & environmental, and (7) AI system safety, failures, & limitations. These are further divided into 23 subdomains. The AI Risk Repository is, to our knowledge, the first attempt to rigorously curate, analyze, and extract AI risk frameworks into a publicly accessible, comprehensive, extensible, and categorized risk database. This creates a foundation for a more coordinated, coherent, and complete approach to defining, auditing, and managing the risks posed by AI systems.

    See publication
  • Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

    Accepted to NeurIPS SoLaR 2023

    Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a…

    Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a range of harmful completions made possible by persona modulation, including detailed instructions for synthesising methamphetamine, building a bomb, and laundering money. These automated attacks achieve a harmful completion rate of 42.5% in GPT-4, which is 185 times larger than before modulation (0.23%). These prompts also transfer to Claude 2 and Vicuna with harmful completion rates of 61.0% and 35.9%, respectively. Our work reveals yet another vulnerability in commercial large language models and highlights the need for more comprehensive safeguards.

    Other authors
    See publication

Projects

  • Bitcoin Multisignature Transaction Builder

    Built open-source Go implementation of Bitcoin protocol Multisignature transactions. Focus on usability and maintainability, with full suite of tests, complete GoDoc style documentation and examples for CLI interface. Blog post: http://bit.ly/1CQLwHA

    See project
  • Built InMoov Humanoid Robotic Arm

    3D printed and assembled the InMoov Humanoid Robotic Arm as detailed on: http://www.inmoov.fr/ We succeeded in getting full finger, wrist, bicep and shoulder articulation. We hooked up the arm to an Arduino and wrote a script to control the arm from the command line.

    Other creators
    See project
  • Dubbit Cartoon Creator App

    Built an iOS app that allows users to create animated cartoons and share them with friends. We used the Cocos2D library on the client side, ZeroMQ to communicate with a Python backend server converting character position coordinates into .mp4 movie files using a Pygame rendering engine, FFmpeg and the Youtube API for hosting.

    Other creators
    See project
  • Mobile Live Chat Support

    Began development on a mobile live chat iOS plug-in that would enable live chat integration into iOS apps. Built with a Node.js backend with XMPP support and a frontend library and UI written in Objective-C.

    Other creators
    See project

Honors & Awards

  • LLM Evals Hackathon - Honourable Mention - AI Pentester

    AGI House (https://agihouse.ai/)

    Alex Browne and I won honourable mention for our GPT-4 AI pentesting agent ("AI Pentester"), which was coded up in just ~5 hours & was able to exploit a vulnerability in a target server given access to a base Kali Linux bash instance and basic instructions -- nothing else. The vulnerability was trivial & known CVE, but still impressed us that it worked at all, especially after with such minimal development time on our part!

  • Frank Borchardt Prize for Undergraduate Entrepreneurship

    Duke Innovation & Entrepreneurship

    $20,000 grant to support the top undergrad entrepreneurs at Duke. My collaborators (Fabio Berger, Alex Browne) & I were the inaugural winners for our startup work throughout our college years.

    https://entrepreneurship.duke.edu/borchardt-prize/

  • Magna cum Laude

    Duke University

  • Robertson Scholarship

    Robertson Scholars Program

    The Robertson Scholars Leadership Program invests in young leaders who strive to make transformational contributions to society.

    The scholarship provided:

    - Four-year scholarship, including undergraduate tuition, room and board
    - Attending classes at Duke and UNC-Chapel Hill
    - Three summers of domestic and international experiences
    - Customized leadership and professional development

    http://robertsonscholars.org/

Languages

  • English

    Native or bilingual proficiency

  • Japanese

    Limited working proficiency

  • Persian

    Limited working proficiency

  • Spanish

    Limited working proficiency

View Soroush’s full profile

  • See who you know in common
  • Get introduced
  • Contact Soroush directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses