About
I grew up in a sleepy town in Bihar, India, where mathematics was practically in the…
Articles by Shirshanka
Activity
-
It always starts so 𝙧𝙚𝙖𝙨𝙤𝙣𝙖𝙗𝙡𝙮. One team needs context for their AI agent. They spin up a database, build a RAG pipeline, define their…
It always starts so 𝙧𝙚𝙖𝙨𝙤𝙣𝙖𝙗𝙡𝙮. One team needs context for their AI agent. They spin up a database, build a RAG pipeline, define their…
Liked by Shirshanka Das
-
The UK Supreme Court recently posted a job listing for a data governance officer paying £26,417 a year. That's below London's minimum wage. 🤯 Aldi…
The UK Supreme Court recently posted a job listing for a data governance officer paying £26,417 a year. That's below London's minimum wage. 🤯 Aldi…
Shared by Shirshanka Das
Experience
Education
Publications
-
All Aboard the Databus! LinkedIn's Scalable Consistent Change Data Capture Platform
ACM Symposium on Cloud Computing
In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but…
In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but needs to get invalidated or refreshed when the primary data gets mutated. A fundamental requirement emerging from these kinds of data architectures is the need to reliably capture, flow and process primary data changes.
We have built Databus, a source-agnostic distributed change data capture system, which is an integral part of LinkedIn's data processing pipeline. The Databus transport layer provides latencies in the low milliseconds and handles throughput of thousands of events per second per server while supporting infinite look back capabilities and rich subscription functionality. This paper covers the design, implementation and trade-offs underpinning the latest generation of Databus technology. We also present experimental results from stress-testing the system and describe our experience supporting a wide range of LinkedIn production applications built on top of Databus.Other authorsSee publication -
Efficient online ad serving in a display advertising exchange
Fourth ACM international conference on Web search and data mining
We introduce and formalize a novel constrained path optimization problem that is the heart of the real-time ad serving task in the Yahoo! (formerly RightMedia) Display Advertising Exchange. In the Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the hundreds of thousands of business entities that are participating in the Exchange. These entities include not…
We introduce and formalize a novel constrained path optimization problem that is the heart of the real-time ad serving task in the Yahoo! (formerly RightMedia) Display Advertising Exchange. In the Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the hundreds of thousands of business entities that are participating in the Exchange. These entities include not only publishers and advertisers, but also intermediate entities called "ad networks" which have delegated their ad serving responsibilities to the Exchange. Path optimality is determined by the payment to the publisher, and is affected by an advertiser's bid and also by the revenue-sharing agreements between the entities in the chosen path leading back to the publisher. Path validity is determined by constraints which focus on the following three issues: 1) suitability of the opportunity's web page and its publisher 2)suitability of the user who is currently viewing that web page, and 3) suitability of a candidate ad and its advertiser. Because the Exchange's constrained path optimization task is novel, there are no published algorithms for it. This paper describes two different algorithms that have both been successfully used in the actual Yahoo! ad server. The first algorithm has the advantage of being extremely simple, while the second is more robust thanks to its polynomial worst-case running time. In both cases, meeting latency caps has required that the basic algorithms be improved by optimizations; we will describe a candidate ordering scheme and a pre-computation scheme that have both been effective in reducing latency in the real ad serving system that serves over ten billion ad calls per day.
Other authorsSee publication -
Data Infrastructure at LinkedIn
Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
See publicationLinked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault…
Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault tolerant key-value store, (2) Data bus: a framework for delivering database changes to downstream applications, (3) Espresso: a distributed data store that supports flexible schemas and secondary indexing, (4) Kafka: a scalable and efficient messaging system for collecting various user activity events and log data.
Patents
-
Bid gateway architecture for an online advertisement bidding system
Issued US 8,135,626
An online advertising system integrates third party agents to permit the third party agents to participate in auctions to bid on a per opportunity basis. An advertising exchange module receives requests for opportunities to serve online advertisements to users. In response, an advertising exchange module applies one or more business rules to determine third party agents that qualify to serve the online advertisement. A bid gateway module generates and transmits requests for bids to the third…
An online advertising system integrates third party agents to permit the third party agents to participate in auctions to bid on a per opportunity basis. An advertising exchange module receives requests for opportunities to serve online advertisements to users. In response, an advertising exchange module applies one or more business rules to determine third party agents that qualify to serve the online advertisement. A bid gateway module generates and transmits requests for bids to the third party agents. The bid gateway module then receives bids from the third party agents in response to the requests for bids. The advertising exchange module then selects an advertisement based on the bid. The online advertisement exchange system provides a unified marketplace to permit integrator networks to bid on both ads pursuant to guaranteed contracts and ads not subject to guaranteed contracts (e.g., non-guaranteed ads). The online advertisement system further includes traffic management to allow the third parties to regulate bid requests sent from the online advertisement system. In some embodiments, the online advertising system caches bids, to efficiently implement the per opportunity auction, and transmits information, such as targeting information, to the third party agents to aid in the third party agents' formulation of bids.
Other inventorsSee patent
Projects
-
Databus
- Present
Databus provides a timeline-consistent stream of change capture events for a database. It enables applications to watch a database, view and process updates in near real-time. Databus provides a complete after-image of every new/changed record as well as deletes, while maintaining timeline consistency and transactional boundaries. The application integration is decoupled from the source database, and each application integration is isolated, which allows for parallel development and rapid…
Databus provides a timeline-consistent stream of change capture events for a database. It enables applications to watch a database, view and process updates in near real-time. Databus provides a complete after-image of every new/changed record as well as deletes, while maintaining timeline consistency and transactional boundaries. The application integration is decoupled from the source database, and each application integration is isolated, which allows for parallel development and rapid innovation.
Other creatorsSee project
Recommendations received
2 people have recommended Shirshanka
Join now to viewMore activity by Shirshanka
-
The teams that successfully derive business value from enterprise AI are the ones who understand their data best. That means rich context from…
The teams that successfully derive business value from enterprise AI are the ones who understand their data best. That means rich context from…
Liked by Shirshanka Das
-
In 2009 I joined LinkedIn as a senior software engineer. Today I become its CTO, Engineering - an opportunity I don’t take lightly. I couldn’t have…
In 2009 I joined LinkedIn as a senior software engineer. Today I become its CTO, Engineering - an opportunity I don’t take lightly. I couldn’t have…
Liked by Shirshanka Das
-
#GartnerDA London is just around the corner! DataHub's CEO Swaroop Jagadish and Alexandra Cracian PhD of OVO Energy are speaking on operationalizing…
#GartnerDA London is just around the corner! DataHub's CEO Swaroop Jagadish and Alexandra Cracian PhD of OVO Energy are speaking on operationalizing…
Liked by Shirshanka Das
-
We started DataHub as a metadata platform. Catalog, lineage, governance. Over the years, talking to thousands of data leaders and watching AI go from…
We started DataHub as a metadata platform. Catalog, lineage, governance. Over the years, talking to thousands of data leaders and watching AI go from…
Posted by Shirshanka Das
-
Pinterest. Omni. And one more thing 👀 DataHub's April Town Hall is this Thursday — and it might be our best lineup yet. We're going deep on…
Pinterest. Omni. And one more thing 👀 DataHub's April Town Hall is this Thursday — and it might be our best lineup yet. We're going deep on…
Shared by Shirshanka Das
-
DataHub CTO and co-founder Shirshanka Das discusses how organizations are spending too much time tuning prompts and juggling the latest foundational…
DataHub CTO and co-founder Shirshanka Das discusses how organizations are spending too much time tuning prompts and juggling the latest foundational…
Liked by Shirshanka Das
-
Keqiang L., Martin Yau, and Aman Gairola of Pinterest built a unified context and intent layer on DataHub that powers Analytics Agents and…
Keqiang L., Martin Yau, and Aman Gairola of Pinterest built a unified context and intent layer on DataHub that powers Analytics Agents and…
Liked by Shirshanka Das
-
🚀 Scotia Intelligence is live. A few months ago, this was a slide in a deck. Today, it’s a suite of AI tools - powered by a robust data lakehouse -…
🚀 Scotia Intelligence is live. A few months ago, this was a slide in a deck. Today, it’s a suite of AI tools - powered by a robust data lakehouse -…
Liked by Shirshanka Das
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content