Shirshanka Das

Shirshanka Das

San Francisco Bay Area
9K followers 500+ connections

About

I grew up in a sleepy town in Bihar, India, where mathematics was practically in the…

Articles by Shirshanka

Activity

Join now to see all activity

Experience

Education

Publications

  • All Aboard the Databus! LinkedIn's Scalable Consistent Change Data Capture Platform

    ACM Symposium on Cloud Computing

    In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but…

    In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but needs to get invalidated or refreshed when the primary data gets mutated. A fundamental requirement emerging from these kinds of data architectures is the need to reliably capture, flow and process primary data changes.

    We have built Databus, a source-agnostic distributed change data capture system, which is an integral part of LinkedIn's data processing pipeline. The Databus transport layer provides latencies in the low milliseconds and handles throughput of thousands of events per second per server while supporting infinite look back capabilities and rich subscription functionality. This paper covers the design, implementation and trade-offs underpinning the latest generation of Databus technology. We also present experimental results from stress-testing the system and describe our experience supporting a wide range of LinkedIn production applications built on top of Databus.

    Other authors
    See publication
  • Efficient online ad serving in a display advertising exchange

    Fourth ACM international conference on Web search and data mining

    We introduce and formalize a novel constrained path optimization problem that is the heart of the real-time ad serving task in the Yahoo! (formerly RightMedia) Display Advertising Exchange. In the Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the hundreds of thousands of business entities that are participating in the Exchange. These entities include not…

    We introduce and formalize a novel constrained path optimization problem that is the heart of the real-time ad serving task in the Yahoo! (formerly RightMedia) Display Advertising Exchange. In the Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the hundreds of thousands of business entities that are participating in the Exchange. These entities include not only publishers and advertisers, but also intermediate entities called "ad networks" which have delegated their ad serving responsibilities to the Exchange. Path optimality is determined by the payment to the publisher, and is affected by an advertiser's bid and also by the revenue-sharing agreements between the entities in the chosen path leading back to the publisher. Path validity is determined by constraints which focus on the following three issues: 1) suitability of the opportunity's web page and its publisher 2)suitability of the user who is currently viewing that web page, and 3) suitability of a candidate ad and its advertiser. Because the Exchange's constrained path optimization task is novel, there are no published algorithms for it. This paper describes two different algorithms that have both been successfully used in the actual Yahoo! ad server. The first algorithm has the advantage of being extremely simple, while the second is more robust thanks to its polynomial worst-case running time. In both cases, meeting latency caps has required that the basic algorithms be improved by optimizations; we will describe a candidate ordering scheme and a pre-computation scheme that have both been effective in reducing latency in the real ad serving system that serves over ten billion ad calls per day.

    Other authors
    See publication
  • Data Infrastructure at LinkedIn

    Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

    Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault…

    Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault tolerant key-value store, (2) Data bus: a framework for delivering database changes to downstream applications, (3) Espresso: a distributed data store that supports flexible schemas and secondary indexing, (4) Kafka: a scalable and efficient messaging system for collecting various user activity events and log data.

    See publication

Patents

  • Bid gateway architecture for an online advertisement bidding system

    Issued US 8,135,626

    An online advertising system integrates third party agents to permit the third party agents to participate in auctions to bid on a per opportunity basis. An advertising exchange module receives requests for opportunities to serve online advertisements to users. In response, an advertising exchange module applies one or more business rules to determine third party agents that qualify to serve the online advertisement. A bid gateway module generates and transmits requests for bids to the third…

    An online advertising system integrates third party agents to permit the third party agents to participate in auctions to bid on a per opportunity basis. An advertising exchange module receives requests for opportunities to serve online advertisements to users. In response, an advertising exchange module applies one or more business rules to determine third party agents that qualify to serve the online advertisement. A bid gateway module generates and transmits requests for bids to the third party agents. The bid gateway module then receives bids from the third party agents in response to the requests for bids. The advertising exchange module then selects an advertisement based on the bid. The online advertisement exchange system provides a unified marketplace to permit integrator networks to bid on both ads pursuant to guaranteed contracts and ads not subject to guaranteed contracts (e.g., non-guaranteed ads). The online advertisement system further includes traffic management to allow the third parties to regulate bid requests sent from the online advertisement system. In some embodiments, the online advertising system caches bids, to efficiently implement the per opportunity auction, and transmits information, such as targeting information, to the third party agents to aid in the third party agents' formulation of bids.

    Other inventors
    See patent
  • MIDDLEWARE DATA LOG SYSTEM

    Filed US 13/296,894

    Other inventors

Projects

  • Databus

    - Present

    Databus provides a timeline-consistent stream of change capture events for a database. It enables applications to watch a database, view and process updates in near real-time. Databus provides a complete after-image of every new/changed record as well as deletes, while maintaining timeline consistency and transactional boundaries. The application integration is decoupled from the source database, and each application integration is isolated, which allows for parallel development and rapid…

    Databus provides a timeline-consistent stream of change capture events for a database. It enables applications to watch a database, view and process updates in near real-time. Databus provides a complete after-image of every new/changed record as well as deletes, while maintaining timeline consistency and transactional boundaries. The application integration is decoupled from the source database, and each application integration is isolated, which allows for parallel development and rapid innovation.

    Other creators
    See project

Recommendations received

More activity by Shirshanka

View Shirshanka’s full profile

  • See who you know in common
  • Get introduced
  • Contact Shirshanka directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses