Arun Swami

Arun Swami

Cupertino, California, United States
4K followers 500+ connections

About

I love solving hard problems that lead to personal learning and growth. A key theme of my…

Articles by Arun

  • Serendipity in the COVID-19 World

    One of the challenges when everyone works from home in the COVID-19 world is that we no longer have serendipitous…

    9 Comments

Activity

Experience

  • LinkedIn Graphic

    LinkedIn

    Sunnyvale, California

  • -

    San Jose, California

  • -

    Palo Alto, California

  • -

    Mountain View, California

  • -

    San Jose, California

  • -

    Palo Alto, California

  • -

    San Mateo, California

  • -

    Los Gatos, California

  • -

    San Francisco, California

  • -

    Mountain View, California

  • -

    San Jose, California

Education

  • Stanford University Graphic

    Stanford University

    -

    -

    Advisors: Prof. Gio Wiederhold and Prof. Anoop Gupta

  • -

    -

  • -

    -

  • -

    -

  • -

Licenses & Certifications

Volunteer Experience

  • Coach

    Team Asha

    - Present 12 years

    Education

    Team Asha is a premier endurance sports training program. It provides personalized coaching, motivation and support for individuals with a goal to run a half/full marathon or bike 100 km/100 miles. Since year 2000, Team Asha has helped hundreds of people realize their marathon and endurance biking aspirations. People who train with Team Asha raise funds to help support educational initiatives for underprivileged children in India. Please visit https://team-asha.org

  • Counselor

    KARA

    - 1 year

    Health

    Kara's mission is to provide grief support for children, teens, families and adults. Clients include those who are grieving a death as well as those coping with a terminal illness (their own or another's).

  • Counselor

    Santa Clara County Suicide and Crisis Service

    - 10 years

    Health

    The Santa Clara County Suicide & Crisis Hotline is a 24-hour, toll-free confidential suicide prevention hotline. The toll free number is: 1-855-278-4204. The service is available 7 days a week for phone intervention and emotional support by highly trained volunteer Crisis Phone Counselors.

Publications

  • Publications and Patents

    Over 40 publications and patents in areas ranging from database query optimization to data mining. The complete list is available in Google Scholar (https://goo.gl/I58olh).

    See publication

Patents

  • Method and system for anonymizing activity records

    Filed US US20170098093A1

    A method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and…

    A method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and generating an anonymization identifier for each target entity in the subset of target entities to obtain a set of anonymization identifiers, each including the anonymized identity. The method further includes processing the activity record using the anonymization dictionary to obtain an anonymized activity record and storing the anonymized activity record.

    See patent
  • System and method of efficiently representing and searching directed acyclic graph structures in databases

    Issued US US7580918B2

    The present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the…

    The present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the entries of the path table correspond to paths in the generalized DAG from nodes of the generalized DAG to a root node of the generalized DAG; triggering generation of a lexical index of the path table using the text strings, wherein the lexical index separately lists tokens included in the entries; and retrieving one or more portions of the generalized DAG from the database for in-memory operations.

    See patent
  • Method for external sorting in shared-nothing parallel architectures

    Issued US US5845113A

    A system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample…

    A system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample from each storage site to the designated coordinator site wherein the local random samples are merged to provide a single global sample. The coordinator site determines the global interval key values based on the global sample. The interval key values being determined such that each interval fits in a single sort site's main memory, wherein the tuples between two interval key values define the interval. The interval key values are sent to the various storage sites wherein each storage site scans its portion of the relation in order to determine for each tuple the assigned interval and its corresponding sort site before sending each tuple to the assigned sort site. At each sort site the tuples are stored in temporary files using a single temporary file for each interval whereafter repeating, for each interval on each sort site, the steps of reading an interval and performing an in-memory sort in any fashion of the interval read before sending the tuples of the sorted interval to the sink site.

    See patent
  • Computer program product for optimizing data retrieval using index scanning

    Issued US US5778353A

    A method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting…

    A method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting and stopping conditions is determined and scaled down in proportion to the selectivity of any starting and stopping conditions present in the search criteria in the query. A suitable correction factor is applied to the number of transfers to account for few rows remaining to be transferred after applying the search criteria, the buffer pool being large, or a low degree of clustering for an index,, as well as accounting for any remaining index sargable search criteria. The search procedures that results in the least number of page transfers in then implemented.

    See patent
  • System and method for query optimization using quantile values of a large unordered data set

    Issued US US5664171A

    A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query…

    A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query plan.

    See patent
  • Method for high-dimensionality indexing in a multi-media database

    Issued US US5647058A

    A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector…

    A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector v is concentrated in only a few coefficients. The entries of the feature vectors are truncated such that the entries which contribute little on the average to the information of the transformed vectors are removed. An index based on the truncated feature vectors is subsequently built using a point access method (PAM). A preliminary similarity search can then be conducted on the set of truncated transformed vectors using the previously created index to retrieve the qualifying records. A second search on the previously retrieved set of vectors is used to eliminate the false positives and to get the results of the desired similarity search.

    See patent
  • Method for choosing largest selectivities among eligible predicates of join equivalence classes for query optimization

    Issued US US5469568A

    A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing…

    A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing for each relation an estimate of the cardinality and the number of distinct values in each attribute after all the local predicates have been included. These are used in further computation of join selectivities and join result sizes. The join predicates must then be processed by correctly choosing the join selectivities. The join result sizes can then be correctly calculated.

    See patent
  • Method for optimizing processing of join queries by determining optimal processing order and assigning optimal join methods to each of the join operations

    Issued US US5345585A

    A join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and…

    A join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and repeating a polynomial number of times. The Krishanmurthy, Boral and Zaniolo (KBZ) Algorithm is used to determine a join optimization sequence and further refinement is provided by determining costs for alternate join order sequences using alternate join methods.

    See patent

Courses

  • Functional Programming Principles in Scala

    -

  • Machine Learning

    -

  • Statistical Learning

    -

  • Statistics

    -

Honors & Awards

  • ACM SIGMOD Test of Time Award

    ACM SIGMOD

    The paper on Mining Association Rules was awarded the 10 Year Test of Time Award for being the paper published in 1993 ACM SIGMOD Conference which had the most impact over the subsequent 10 years. This paper is among the 20 most cited papers in Computer Science.

  • President of India Gold Medal

    Indian Institute of Technology, Bombay

    Award given to the valedictorian of the entire graduating class of 1983.

Languages

  • English

    Native or bilingual proficiency

  • Hindi

    Native or bilingual proficiency

  • Kannada

    Native or bilingual proficiency

Organizations

  • ACM

    -

    - Present

Recommendations received

12 people have recommended Arun

Join now to view

View Arun’s full profile

  • See who you know in common
  • Get introduced
  • Contact Arun directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses