Arun Swami

Arun Swami · 2026-05-02T20:46:22.592Z

For people in Bengaluru who love books ...

Cupertino, California, United States
4K followers 500+ connections

View mutual connections with Arun

Arun can introduce you to 10+ people at LinkedIn

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to follow

Stanford University

About

I love solving hard problems that lead to personal learning and growth. A key theme of my…

Articles by Arun

Serendipity in the COVID-19 World

May 13, 2020

Serendipity in the COVID-19 World

One of the challenges when everyone works from home in the COVID-19 world is that we no longer have serendipitous…

9 Comments

Activity

For people in Bengaluru who love books ...

For people in Bengaluru who love books ...

Shared by Arun Swami

Experience

LinkedIn

Sunnyvale, California
-

San Jose, California
-

Palo Alto, California
-

Mountain View, California
-

San Jose, California
-

Palo Alto, California
-

San Mateo, California
-

Los Gatos, California
-

San Francisco, California
-

Mountain View, California
-

San Jose, California

Education

Stanford University

-

1983 - 1989

Advisors: Prof. Gio Wiederhold and Prof. Anoop Gupta
-

1983 - 1986
-

1978 - 1983
-

1977 - 1978
1966 - 1977

Licenses & Certifications

How to Slow Down and Be More Productive

LinkedIn

Issued Dec 2020

See credential
The Inner MBA: Becoming a Mindful Leader

New York University

Issued Dec 2020

See credential
Ken Blanchard on Servant Leadership

LinkedIn

Issued Oct 2020

See credential
Technology and Design Ethics

LinkedIn

Issued Oct 2020

See credential
Jeff Weiner on Managing Compassionately

LinkedIn

Issued Jul 2019

See credential
Shane Snow on Storytelling

LinkedIn

Issued Jan 2019

See credential
AI The LinkedIn Way: A Conversation with Deepak Agarwal

LinkedIn

Issued Oct 2018

See credential
Delivering Employee Feedback

Lynda.com

Issued Apr 2018

See credential
Hire, Retain, and Grow Top Millennial Talent

LinkedIn

Issued Apr 2018

See credential
Stanford Advanced Computer Security Certificate

Stanford Center for Professional Development

Issued Nov 2017

Join now to see all certifications

Volunteer Experience

Coach

Team Asha

2014 - Present 12 years

Education

Team Asha is a premier endurance sports training program. It provides personalized coaching, motivation and support for individuals with a goal to run a half/full marathon or bike 100 km/100 miles. Since year 2000, Team Asha has helped hundreds of people realize their marathon and endurance biking aspirations. People who train with Team Asha raise funds to help support educational initiatives for underprivileged children in India. Please visit https://team-asha.org
Counselor

KARA

2010 - 2011 1 year

Health

Kara's mission is to provide grief support for children, teens, families and adults. Clients include those who are grieving a death as well as those coping with a terminal illness (their own or another's).
Counselor

Santa Clara County Suicide and Crisis Service

1992 - 2002 10 years

Health

The Santa Clara County Suicide & Crisis Hotline is a 24-hour, toll-free confidential suicide prevention hotline. The toll free number is: 1-855-278-4204. The service is available 7 days a week for phone intervention and emotional support by highly trained volunteer Crisis Phone Counselors.

Publications

Publications and Patents

2015

Over 40 publications and patents in areas ranging from database query optimization to data mining. The complete list is available in Google Scholar (https://goo.gl/I58olh).

See publication

Patents

Method and system for anonymizing activity records

Filed October 2, 2015 US US20170098093A1

A method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and…

A method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and generating an anonymization identifier for each target entity in the subset of target entities to obtain a set of anonymization identifiers, each including the anonymized identity. The method further includes processing the activity record using the anonymization dictionary to obtain an anonymized activity record and storing the anonymized activity record.

See patent
System and method of efficiently representing and searching directed acyclic graph structures in databases

Issued August 25, 2009 US US7580918B2

The present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the…

The present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the entries of the path table correspond to paths in the generalized DAG from nodes of the generalized DAG to a root node of the generalized DAG; triggering generation of a lexical index of the path table using the text strings, wherein the lexical index separately lists tokens included in the entries; and retrieving one or more portions of the generalized DAG from the database for in-memory operations.

See patent
Method for external sorting in shared-nothing parallel architectures

Issued December 1, 1998 US US5845113A

A system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample…

A system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample from each storage site to the designated coordinator site wherein the local random samples are merged to provide a single global sample. The coordinator site determines the global interval key values based on the global sample. The interval key values being determined such that each interval fits in a single sort site's main memory, wherein the tuples between two interval key values define the interval. The interval key values are sent to the various storage sites wherein each storage site scans its portion of the relation in order to determine for each tuple the assigned interval and its corresponding sort site before sending each tuple to the assigned sort site. At each sort site the tuples are stored in temporary files using a single temporary file for each interval whereafter repeating, for each interval on each sort site, the steps of reading an interval and performing an in-memory sort in any fashion of the interval read before sending the tuples of the sorted interval to the sink site.

See patent
Computer program product for optimizing data retrieval using index scanning

Issued July 7, 1998 US US5778353A

A method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting…

A method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting and stopping conditions is determined and scaled down in proportion to the selectivity of any starting and stopping conditions present in the search criteria in the query. A suitable correction factor is applied to the number of transfers to account for few rows remaining to be transferred after applying the search criteria, the buffer pool being large, or a low degree of clustering for an index,, as well as accounting for any remaining index sargable search criteria. The search procedures that results in the least number of page transfers in then implemented.

See patent
System and method for query optimization using quantile values of a large unordered data set

Issued September 2, 1997 US US5664171A

A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query…

A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query plan.

See patent
Method for high-dimensionality indexing in a multi-media database

Issued July 8, 1997 US US5647058A

A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector…

A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector v is concentrated in only a few coefficients. The entries of the feature vectors are truncated such that the entries which contribute little on the average to the information of the transformed vectors are removed. An index based on the truncated feature vectors is subsequently built using a point access method (PAM). A preliminary similarity search can then be conducted on the set of truncated transformed vectors using the previously created index to retrieve the qualifying records. A second search on the previously retrieved set of vectors is used to eliminate the false positives and to get the results of the desired similarity search.

See patent
Method for choosing largest selectivities among eligible predicates of join equivalence classes for query optimization

Issued November 21, 1995 US US5469568A

A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing…

A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing for each relation an estimate of the cardinality and the number of distinct values in each attribute after all the local predicates have been included. These are used in further computation of join selectivities and join result sizes. The join predicates must then be processed by correctly choosing the join selectivities. The join result sizes can then be correctly calculated.

See patent
Method for optimizing processing of join queries by determining optimal processing order and assigning optimal join methods to each of the join operations

Issued September 6, 1994 US US5345585A

A join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and…

A join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and repeating a polynomial number of times. The Krishanmurthy, Boral and Zaniolo (KBZ) Algorithm is used to determine a join optimization sequence and further refinement is provided by determining costs for alternate join order sequences using alternate join methods.

See patent

Courses

Functional Programming Principles in Scala

-
Machine Learning

-
Statistical Learning

-
Statistics

-

Honors & Awards

ACM SIGMOD Test of Time Award

ACM SIGMOD

2003

The paper on Mining Association Rules was awarded the 10 Year Test of Time Award for being the paper published in 1993 ACM SIGMOD Conference which had the most impact over the subsequent 10 years. This paper is among the 20 most cited papers in Computer Science.
President of India Gold Medal

Indian Institute of Technology, Bombay

1983

Award given to the valedictorian of the entire graduating class of 1983.

Languages

English

Native or bilingual proficiency
Hindi

Native or bilingual proficiency
Kannada

Native or bilingual proficiency

Organizations

ACM

-

Oct 2008 - Present

Recommendations received

12 people have recommended Arun

Join now to view

View Arun’s full profile

See who you know in common
Get introduced
Contact Arun directly

Join to view full profile

Other similar profiles

Karthik Ramgopal

Karthik Ramgopal

San Francisco Bay Area

Connect
Terry Tong

Terry Tong

New York, NY

Connect
Yanping Huang

Yanping Huang

Mountain View, CA

Connect
Yanan Qian

Yanan Qian

San Francisco Bay Area

Connect
Yuxing Zhang

Yuxing Zhang

San Francisco, CA

Connect
Arnab Roy

Arnab Roy

Mercer Island, WA

Connect
Chang(chauncey) Liu

Chang(chauncey) Liu

Pittsburgh, PA

Connect
Jiting Xu

Jiting Xu

San Francisco, CA

Connect
Yixing Chen

Yixing Chen

Kirkland, WA

Connect
Qiangjian Xi

Qiangjian Xi

San Mateo, CA

Connect
Hanqing Liu

Hanqing Liu

Mountain View, CA

Connect
Zhengyi Liu

Zhengyi Liu

San Francisco Bay Area

Connect
Xueqiao (Joe) Xu

Xueqiao (Joe) Xu

San Francisco Bay Area

Connect
Krishna Sharma

Krishna Sharma

Munich

Connect
Josef Ziegler

Josef Ziegler

New York, NY

Connect
Prachi Bhavsar

Prachi Bhavsar

Seattle, WA

Connect
Jiaxi Wang

Jiaxi Wang

Seattle, WA

Connect
Qihui Li

Qihui Li

Seattle, WA

Connect
Jinglin Li

Jinglin Li

Los Angeles Metropolitan Area

Connect
Andrew C.

Andrew C.

San Francisco, CA

Connect

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses

See all courses

Arun Swami

Cupertino, California, United States 4K followers 500+ connections

About

Articles by Arun

Serendipity in the COVID-19 World

Activity

For people in Bengaluru who love books ...

Shared by Arun Swami

Experience

-

-

-

-

-

-

-

-

-

-

Education

-

-

-

-

Licenses & Certifications

Stanford Advanced Computer Security Certificate

Volunteer Experience

Coach

Team Asha

Counselor

KARA

Counselor

Santa Clara County Suicide and Crisis Service

Publications

2015

Patents

Filed October 2, 2015 US US20170098093A1

Issued August 25, 2009 US US7580918B2

Issued December 1, 1998 US US5845113A

Issued July 7, 1998 US US5778353A

Issued September 2, 1997 US US5664171A

Issued July 8, 1997 US US5647058A

Issued November 21, 1995 US US5469568A

Issued September 6, 1994 US US5345585A

Courses

Functional Programming Principles in Scala

-

Machine Learning

-

Statistical Learning

-

Statistics

-

Honors & Awards

ACM SIGMOD Test of Time Award

ACM SIGMOD

President of India Gold Medal

Indian Institute of Technology, Bombay

Languages

English

Native or bilingual proficiency

Hindi

Native or bilingual proficiency

Kannada

Native or bilingual proficiency

Organizations

ACM

-

Recommendations received

Madhumita Mantri

Al N.

View Arun’s full profile

Other similar profiles

Karthik Ramgopal

Terry Tong

Yanping Huang

Yanan Qian

Yuxing Zhang

Arnab Roy

Chang(chauncey) Liu

Cupertino, California, United States
4K followers 500+ connections