About
Technology leader with experience in Machine Learning for over 15 years. 9 years in…
Activity
-
LLMs need a trusted platform. That's why 90% of the top AI companies run their businesses on Salesforce, along with these leading enterprises…
LLMs need a trusted platform. That's why 90% of the top AI companies run their businesses on Salesforce, along with these leading enterprises…
Liked by Sarah Aerni
-
A few days in-person with our Intuit team in San Diego, shaping a sharper view of where the work goes next. The industry is in the middle of a real…
A few days in-person with our Intuit team in San Diego, shaping a sharper view of where the work goes next. The industry is in the middle of a real…
Liked by Sarah Aerni
-
Just got back from #TDX2026 — and MAN what an awesome experience. The speed that we, (as a MASSIVE company), are innovating in the AI space, is…
Just got back from #TDX2026 — and MAN what an awesome experience. The speed that we, (as a MASSIVE company), are innovating in the AI space, is…
Liked by Sarah Aerni
Experience
Education
-
Stanford University
-
-
Activities and Societies: Founding member of the Stanford Association for Multi-Disciplinary Medicine and Science (SAMMS), organizing committee member for Biomedical Computing at Stanford (a student-run conference), elected student representative to the executive committee of the BMI program, organizing industry panels.
Primary Faculty Advisor : Serafim Batzoglou
Co-advisor: Stuart Kim
William R. Hewlett Fellow (Stanford Graduate Fellowships), National Science Foundation Graduate Research Fellow -
-
-
-
-
-
-
-
-
-
-
Volunteer Experience
Publications
-
A Bioinformatics Guide for Molecular Biologists
Cold Spring Harbor Laboratory Press
Informatics can vastly assist progress in research and development in cell and molecular biology and biomedicine. However, many investigators are either unaware of the ways in which informatics can improve their research or find it inaccessible due to a feeling of “informatics anxiety.” This sense of apprehension results from improper communication of the principles behind these approaches and of the value of the many tools available. In fact, many researchers are inherently distrustful of…
Informatics can vastly assist progress in research and development in cell and molecular biology and biomedicine. However, many investigators are either unaware of the ways in which informatics can improve their research or find it inaccessible due to a feeling of “informatics anxiety.” This sense of apprehension results from improper communication of the principles behind these approaches and of the value of the many tools available. In fact, many researchers are inherently distrustful of these tools. A more complete understanding of bioinformatics offered in A Bioinformatics Guide for Molecular Biologists will allow the reader to become comfortable with these techniques, encouraging their use—thus helping to make sense of the vast accumulation of data. To make these concepts more accessible, the editors approach the field of bioinformatics from the viewpoint of a molecular biologist, (1) arming the biologist with a basic understanding of the fundamental concepts in the field, (2) presenting approaches for using the tools from the standpoint of the data for which they are created, and (3) showing how the field of informatics is quickly adapting to the advancements in biology and biomedical technologies. All concepts are paired with recommendations for the appropriate programming environment and tools best suited to solve the particular problem at hand. It is a must-read for those interested in learning informatics techniques required for successful research and development in the laboratory.
Other authorsSee publication -
Automated Cellular Annotation for High Resolution Images of Adult C. elegans
Bioinformatics [ISMB/ECCB] 2013
Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to…
Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.
Results: In this article, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal and hypodermal cells) in high-resolution images of adult C.elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum-flow and apply a cross-entropy–based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system. These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C.elegans.Other authorsSee publication -
Reconstructing cancer genomes from paired-end sequencing data
BMC Bioinformatics
-
Reconstruction of genealogical relationships with applications to Phase III of HapMap.
Bioinformatics [ISMB/ECCB]
-
Analysis of gene regulation and cell fate from single-cell gene expression profiles in C. elegans
Cell
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures…
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.
Other authorsSee publication -
BJ Raphael, S Volik, P Yu, C Wu, G Huang, EV Linardopoulou, BJ Trask, FM Waldman, J Costello, KJ Pienta, GB Mills, K Bajsarowicz, Y Kobayashi, S Shivaranjani, P Paris, Q Tao, SJ Aerni, RP Brown, A Bashir, JW Gray, JF Cheng, P de Jong, M Nefedov, T Ried, H
-
-
BT Messmer*, B Raphael*, SJ Aerni, GF Widhopf, LZ Rassenti, JG Gribben, NE Kay, TJ Kipps "Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia" Leukemia Research, Volume 33, Issue 3, Pages
-
-
Reconstructing Cancer Genome Organization
BMC Bioinformatics
A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data.
We demonstrate that PREGO efficiently…A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data.
We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.Other authorsSee publication -
SJ Aerni, E Eskin “10 Years of the International Conference on Research in Computational Molecular Biology (RECOMB)”,RECOMB 2006: 546-562
-
Patents
-
AUTOMATIC DETERMINATION OF ALTERNATIVE PATHS FOR A PROCESS FLOW USING MACHINE LEARNING
Issued US 12,105,725
Methods, systems, apparatuses, devices, and computer program products are described. A system may identify, from an event log including log entries for a tenant of a multi-tenant database system, a pattern of log entries corresponding to main actions and satisfying a frequency threshold. The system may identify log entries associated with the pattern and corresponding to the main actions, detailed actions, or both. The system may retrieve data corresponding to a history field of a data object…
Methods, systems, apparatuses, devices, and computer program products are described. A system may identify, from an event log including log entries for a tenant of a multi-tenant database system, a pattern of log entries corresponding to main actions and satisfying a frequency threshold. The system may identify log entries associated with the pattern and corresponding to the main actions, detailed actions, or both. The system may retrieve data corresponding to a history field of a data object associated with the pattern and may determine at least a portion of a process flow for the data object according to the pattern and based on the log entries and the historical data. The process flow may include operations to perform using the data object. In some cases, the system may transmit, to a user device, an indication of the portion of the process flow for user review and implementation.
Other inventors -
RAPID PROCESSING OF BIOLOGICAL SEQUENCE DATA
Issued US 9703925
In general, one aspect of the subject matter described in this specification is embodied in operations of processing sequence data by selecting a distribution key according to a type of one or more tasks to be performed on the data. The key is one or more data fields of a sequence data file, e.g., a sequence alignment/map (SAM) format or binary sequence alignment/map (BAM) format file, or derived from one or more data fields of a sequence data file. The sequence data is then distributed to…
In general, one aspect of the subject matter described in this specification is embodied in operations of processing sequence data by selecting a distribution key according to a type of one or more tasks to be performed on the data. The key is one or more data fields of a sequence data file, e.g., a sequence alignment/map (SAM) format or binary sequence alignment/map (BAM) format file, or derived from one or more data fields of a sequence data file. The sequence data is then distributed to multiple nodes of a parallel processing relational database system. The system performs the tasks of processing the sequence data by executing database queries. The system executes the database queries on multiple nodes in parallel. The system can use query optimization functions built into the database to expedite performance of each task.
Other inventors -
IN-DATABASE SINGLE-NUCLEOTIDE GENETIC VARIANT ANALYSIS
Issued US 9,594,777
Genetic data in row-wise flat files, such as VCF and VCF-like files, comprising a plurality of data elements of different types is analyzed using a parallel framework in an MPP shared-nothing distributed database having a plurality of distributed segments by first parsing the data into groups of data elements of the same types, converting the data into entry-wise genetic data such that the same types of data elements are in a column, and distributing and storing the entry-wise genetic data in…
Genetic data in row-wise flat files, such as VCF and VCF-like files, comprising a plurality of data elements of different types is analyzed using a parallel framework in an MPP shared-nothing distributed database having a plurality of distributed segments by first parsing the data into groups of data elements of the same types, converting the data into entry-wise genetic data such that the same types of data elements are in a column, and distributing and storing the entry-wise genetic data in the distributed segments. SQL database queries are used to analyze the genetic data, including locating probable significant associations between genotype and phenotype data.
Other inventorsSee patent -
ELEMENT IDENTIFICATION IN DATABASE
Issued US 9,569,464
This document describes, among other things, a computer-implemented method. The method includes obtaining a structured data object that having a plurality of nodes that represent elements in the data object. One or more tables that define a table representation of the data object can be generated. The one or more tables can include a plurality of table entries that correspond to the plurality of nodes, respectively. For each of one or more first nodes from among the plurality of nodes, the…
This document describes, among other things, a computer-implemented method. The method includes obtaining a structured data object that having a plurality of nodes that represent elements in the data object. One or more tables that define a table representation of the data object can be generated. The one or more tables can include a plurality of table entries that correspond to the plurality of nodes, respectively. For each of one or more first nodes from among the plurality of nodes, the method can include identifying information about one or more second nodes that are determined to be adjacent or otherwise related to the first node by performing window functions along two or more coordinate systems in the one or more tables. The window function can be centered on a particular table entry that corresponds to the first node of the data object.
Other inventorsSee patent
Courses
-
Machine Learning at Stanford University
CS 229
Languages
-
English
Native or bilingual proficiency
-
German
Native or bilingual proficiency
-
Swiss German
Native or bilingual proficiency
-
French
Limited working proficiency
More activity by Sarah
-
Incredible opportunities!!! Encourage you to take a look
Incredible opportunities!!! Encourage you to take a look
Shared by Sarah Aerni
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content