About
I love solving hard problems that lead to personal learning and growth. A key theme of my…
Articles by Arun
Activity
-
For people in Bengaluru who love books ...
For people in Bengaluru who love books ...
Shared by Arun Swami
Experience
Education
Licenses & Certifications
Volunteer Experience
-
Coach
Team Asha
- Present 12 years
Education
Team Asha is a premier endurance sports training program. It provides personalized coaching, motivation and support for individuals with a goal to run a half/full marathon or bike 100 km/100 miles. Since year 2000, Team Asha has helped hundreds of people realize their marathon and endurance biking aspirations. People who train with Team Asha raise funds to help support educational initiatives for underprivileged children in India. Please visit https://team-asha.org
-
Counselor
KARA
- 1 year
Health
Kara's mission is to provide grief support for children, teens, families and adults. Clients include those who are grieving a death as well as those coping with a terminal illness (their own or another's).
-
Counselor
Santa Clara County Suicide and Crisis Service
- 10 years
Health
The Santa Clara County Suicide & Crisis Hotline is a 24-hour, toll-free confidential suicide prevention hotline. The toll free number is: 1-855-278-4204. The service is available 7 days a week for phone intervention and emotional support by highly trained volunteer Crisis Phone Counselors.
Publications
-
Publications and Patents
See publicationOver 40 publications and patents in areas ranging from database query optimization to data mining. The complete list is available in Google Scholar (https://goo.gl/I58olh).
Patents
-
Method and system for anonymizing activity records
Filed US US20170098093A1
See patentA method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and…
A method for processing activity records. The method includes obtaining an activity record, and generating an anonymization dictionary. Generating the anonymization dictionary includes detecting, in the activity record, a set of target entities to be anonymized, making a determination that a resource is associated with a subset of the target entities of the set of target entities, and after making the determination, assigning an anonymized identity to the subset of target entities, and generating an anonymization identifier for each target entity in the subset of target entities to obtain a set of anonymization identifiers, each including the anonymized identity. The method further includes processing the activity record using the anonymization dictionary to obtain an anonymized activity record and storing the anonymized activity record.
-
System and method of efficiently representing and searching directed acyclic graph structures in databases
Issued US US7580918B2
See patentThe present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the…
The present disclosure includes systems and techniques relating to representation and retrieval of data structures in databases. In general, embodiments of the invention feature a computer program product and a method including storing a generalized directed acyclic graph (DAG) in a database, wherein the storing includes encoding path information of the generalized DAG in entries of a path table in the database, the encoding includes converting the path information into text strings, and the entries of the path table correspond to paths in the generalized DAG from nodes of the generalized DAG to a root node of the generalized DAG; triggering generation of a lexical index of the path table using the text strings, wherein the lexical index separately lists tokens included in the entries; and retrieving one or more portions of the generalized DAG from the database for in-memory operations.
-
Method for external sorting in shared-nothing parallel architectures
Issued US US5845113A
See patentA system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample…
A system and method is provided for distributed relational databases for parallel sorting of a relation wherein the relation is a set of tuples to be sorted on multiple sort sites which completely decouples the return phase from the sort phase in order to eliminate the merge phase. The method involves selecting one coordinator site from any of the available logical sites, then generating and sorting a local sample on each of the available storage sites before sending the local random sample from each storage site to the designated coordinator site wherein the local random samples are merged to provide a single global sample. The coordinator site determines the global interval key values based on the global sample. The interval key values being determined such that each interval fits in a single sort site's main memory, wherein the tuples between two interval key values define the interval. The interval key values are sent to the various storage sites wherein each storage site scans its portion of the relation in order to determine for each tuple the assigned interval and its corresponding sort site before sending each tuple to the assigned sort site. At each sort site the tuples are stored in temporary files using a single temporary file for each interval whereafter repeating, for each interval on each sort site, the steps of reading an interval and performing an in-memory sort in any fashion of the interval read before sending the tuples of the sorted interval to the sink site.
-
Computer program product for optimizing data retrieval using index scanning
Issued US US5778353A
See patentA method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting…
A method of index scanning involves scanning one or more selected indexes and determining the number of data transfers required to traverse all or a portion of a selected index for a selected number of buffer pool sizes. The number of page transfers to scan a whole table of interest versus the number of page transfers to scan each relevant index in accordance with the buffer pool size is determined for a query. The number of page transfers required in proportion to the selectivity of starting and stopping conditions is determined and scaled down in proportion to the selectivity of any starting and stopping conditions present in the search criteria in the query. A suitable correction factor is applied to the number of transfers to account for few rows remaining to be transferred after applying the search criteria, the buffer pool being large, or a low degree of clustering for an index,, as well as accounting for any remaining index sargable search criteria. The search procedures that results in the least number of page transfers in then implemented.
-
System and method for query optimization using quantile values of a large unordered data set
Issued US US5664171A
See patentA database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query…
A database management system determines, in a single pass over an unordered database, the quantile information. The system sequentially compares each tuple in the data set to a test value, and then selectively inserts the tuple in a test set having a cardinality less than the cardinality of the data set based upon the comparison. The system next uses the quantile information to estimate the number of tuples in the database which satisfy a user-defined predicate to generate an efficient query plan.
-
Method for high-dimensionality indexing in a multi-media database
Issued US US5647058A
See patentA high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector…
A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector v is concentrated in only a few coefficients. The entries of the feature vectors are truncated such that the entries which contribute little on the average to the information of the transformed vectors are removed. An index based on the truncated feature vectors is subsequently built using a point access method (PAM). A preliminary similarity search can then be conducted on the set of truncated transformed vectors using the previously created index to retrieve the qualifying records. A second search on the previously retrieved set of vectors is used to eliminate the false positives and to get the results of the desired similarity search.
-
Method for choosing largest selectivities among eligible predicates of join equivalence classes for query optimization
Issued US US5469568A
See patentA method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing…
A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing for each relation an estimate of the cardinality and the number of distinct values in each attribute after all the local predicates have been included. These are used in further computation of join selectivities and join result sizes. The join predicates must then be processed by correctly choosing the join selectivities. The join result sizes can then be correctly calculated.
-
Method for optimizing processing of join queries by determining optimal processing order and assigning optimal join methods to each of the join operations
Issued US US5345585A
See patentA join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and…
A join optimization method is provided for use with a data processor for optimizing the processing of a query for retrieval of data from a relational computer database. The database is organized by relations and data is retrieved by preforming join operations on the relations. The join operations are optimized by randomly selecting an initial order for the join operations, assigning optimal join methods based on the initial order, finding an optimal order based on the assigned methods and repeating a polynomial number of times. The Krishanmurthy, Boral and Zaniolo (KBZ) Algorithm is used to determine a join optimization sequence and further refinement is provided by determining costs for alternate join order sequences using alternate join methods.
Courses
-
Functional Programming Principles in Scala
-
-
Machine Learning
-
-
Statistical Learning
-
-
Statistics
-
Honors & Awards
-
ACM SIGMOD Test of Time Award
ACM SIGMOD
The paper on Mining Association Rules was awarded the 10 Year Test of Time Award for being the paper published in 1993 ACM SIGMOD Conference which had the most impact over the subsequent 10 years. This paper is among the 20 most cited papers in Computer Science.
-
President of India Gold Medal
Indian Institute of Technology, Bombay
Award given to the valedictorian of the entire graduating class of 1983.
Languages
-
English
Native or bilingual proficiency
-
Hindi
Native or bilingual proficiency
-
Kannada
Native or bilingual proficiency
Organizations
-
ACM
-
- Present
Recommendations received
12 people have recommended Arun
Join now to viewOther similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content