Jay Joshi

Jay Joshi

Santa Clara, California, United States
4K followers 500+ connections

About

My key technical skill-set includes, but are not limited to:

Java , Python , AWS…

Activity

Experience

  • LinkedIn Graphic

    LinkedIn

    Mountain View, CA

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

Education

Courses

  • Applied Science – I

    107002

  • Applied Science – II

    107009

  • Basic Electrical Engineering

    103004

  • Computer Networks

    310250

  • Data Communications

    -

  • Data Structures and Algorithms

    210244

  • Database Management Systems

    310241

  • Design & Analysis Algorithm

    410441

  • Digital Electronics and Logic Design

    210243

  • Digital Signal Processing

    -

  • Discrete Structures

    210241

  • Engineering Mathematics – I

    107001

  • Engineering Mathematics – II

    107008

  • Engineering Mathematics – III

    207003

  • Engineering Mechanics

    101010

  • Fundamentals of Programming Languages

    110003

  • Humanities and Social science

    207005

  • Microprocessors and Microcontrollers

    -

  • Programming & problem solving

    210242

  • Business Analytic & Intelligence

    410451

  • Cloud Computing

    15619

  • Computer Architecture and Organization

    210252

  • Computer Graphics

    210251

  • Data Mining Technology & Application

    410444D

  • Data Structures

    210225

  • Data Structures

    210250

  • Data Structures for Application Programmers

    17683

  • Finance and Management Information Systems

    310251

  • J2EE Web Application Development

    17682

  • Java for Application Programmers

    17681

  • Law of Computer Technology

    17662

  • Microprocessors and Interfacing Techniques

    210249

  • Microprocessors and interfacing

    210254

  • Object Oriented Programming & Computer Graphics Laboratory

    210253

  • Parallel and High Performance Computing

    410449

  • Pervasive Computing

    410445B

  • Principles of Complier Design

    410442

  • Principles of Programming Languages

    310249

  • Smart System Design & Application

    410443

  • Software Design Method & Test

    410448

  • Systems Programming & Operating System

    310252

  • Theory of Computation

    -

Projects

  • Big Data Analysis - Kaggle Challenge of Home Depot's Product Search Relevance

    Improved Home Depot's customers' shopping experience by developing a model that can accurately predict the relevance of search results, using Apache Spark's MLlib, and using Python's numpy and pandas libraries in an interactive Jupyter notebook. Created a basic pipeline with Tokenizer - HashingTF - Linear Regression and iteratively improved the performance by experimenting with others transformers and estimators such as Word2Vec and Random Forest. Preprocessed the data using Stemming and…

    Improved Home Depot's customers' shopping experience by developing a model that can accurately predict the relevance of search results, using Apache Spark's MLlib, and using Python's numpy and pandas libraries in an interactive Jupyter notebook. Created a basic pipeline with Tokenizer - HashingTF - Linear Regression and iteratively improved the performance by experimenting with others transformers and estimators such as Word2Vec and Random Forest. Preprocessed the data using Stemming and transformed using stop words remover. Used Cosine similarity, Jaccard similarity and Match words to compute the resultant feature vectors. Improved the performance using Spark's PARAM Grid tuning and deployed the Spark job on YARN (Hadoop) Cluster. Achieved a final Root mean square error of 0.506 within a deadline of 2 weeks (in context, the highest RMSE among the 2125 teams that participated was 0.432).

    Tools, Technologies and APIs used: Apache Spark's MLlib, pandas and numpy libraries from Python, Jupyter /Zeppelin notebook, Anaconda Python 3 distribution, Hortonworks Data Platform, HDFS

  • Auto Text Completion engine

    - Designed and implemented a MapReduce solution to pre-process a large text-based dataset for data cleaning purposes.
    - Build a probabilistic language model using MapReduce batch processing to store the count and probability of pharses upto 5-grams in a 10 GB dataset.
    - Configured and deployed a storage caching layer using Amazon elastic cache and Redis (Jedis) to improve the performance of the text completion.

  • Uber like driver matching service

    - Generated a stream of data using Kafka producer and made it available for a Samza consumer on AWS.
    - Designed and implemented a solution for a driver matching service like Uber by joining and processing multiple real-time streams of GPS data and driver data using the Samza API.
    - Implemented an auto-scaling cluster with AWS APIs to dynamically adjust server instances based on real-time load.

  • Consistency Models and Multithreading

    - Compared and contrasted the advantages and disadvantages of using replication in distributed key-value stores.
    - Studied the pros and cons of different techniques to improve consistency, availability, and partitioning in a system.
    - Discussed the various levels of consistency that can be employed in a distributed data store.
    - Used multithreading to achieve strong and eventual consistency models for a distributed key-value store in different geographic regions.

  • Data Mining CRM - Product Selection and Success Prediction

    Implemented popular Data Mining algorithms such as K-Nearest Neighbors and Decision Trees, to predict product selection information, as well as the potential success of the newly introduced products, based on buying habits of existing customers and their profiles, and historical data on the sales volume of past products. Cross-validated the algorithms on the given training set and optimized weights programmatically to get maximum accuracy on the training data. Used the optimized weights that…

    Implemented popular Data Mining algorithms such as K-Nearest Neighbors and Decision Trees, to predict product selection information, as well as the potential success of the newly introduced products, based on buying habits of existing customers and their profiles, and historical data on the sales volume of past products. Cross-validated the algorithms on the given training set and optimized weights programmatically to get maximum accuracy on the training data. Used the optimized weights that yielded the highest accuracy on the given test data set to predict the product selection (89% accuracy) and success likelihood (97% accuracy).

  • Horizontal Scaling and Auto-Scaling Web Application

    • Implemented a horizontal scaling web application with GCP, Azure, and AWS APIs able to process 3000+ RPS load
    • Implemented an auto-scaling web application with AWS APIs to dynamically adjust server instances based on read-time load

  • Contextual Design and UI Testing

    • Designed the user tasks and interviewer script of a contextual inquiry for existing drug store websites
    • Implemented and improved user interface of a hypothetical online drug store based on contextual inquiry, heuristic analysis, and usability testing

  • Networking solutions for vehicles and infrastructure

    Developed vehicle to vehicle (V2V) networking and communication solution.
    Developed networking solution for vehicle to Infrastructure communication.
    Developed a multi-modular Infotainment system.

    1. Proposed application for Real-Time routing and weather related navigation using DSRC technology.
    2. Applications within project explored DSRC, Over the Air, CAN, ECU and Wi-Fi technologies.

  • IoT Solutions for Contemporary Healthcare (Ubiquitous Computing)

    Developed contemporary IOT Solutions for a healthcare institution.

    1. Proposed applications in Surgery, Human Resources, Cost accounting and food services.
    2. Charted cost estimation for the complete project.
    3. Led a team of five people

  • Cloud Based Health and video Service

    -

    - Created a simulated system to score users based on their dietary habits. It was created as a part of cloud computing project at CMU using AWS EC2, S3, SNS, Lambda, RDS (MySQL), Rekognition, Docker and Clarifai.
    - Modified the existing architecture to a youtube like video which allows the user to search the videos on the basis of on the content present in them with the help of labels generated in last step and CloudSearch. Users can also preview the video by hovering over them.

  • Twitter Analytics Web Service

    -

    - Implemented a web-service to extract tweets and users given trending topics, hashtags and a time frame.
    - Designed and implemented a high performance, fault-tolerant and scalable cloud deployment strategy responding to live load while meeting infrastructure and budgetary needs.
    - Performed ETL on a 1 TB dataset to load data into MySQL and HBase systems using MapReduce and Spark frameworks on AWS, GCP, and Azure.
    - Hiked the performance of service from 3000RPS to 10,000RPS by modeling…

    - Implemented a web-service to extract tweets and users given trending topics, hashtags and a time frame.
    - Designed and implemented a high performance, fault-tolerant and scalable cloud deployment strategy responding to live load while meeting infrastructure and budgetary needs.
    - Performed ETL on a 1 TB dataset to load data into MySQL and HBase systems using MapReduce and Spark frameworks on AWS, GCP, and Azure.
    - Hiked the performance of service from 3000RPS to 10,000RPS by modeling effective schemas, sharding the database and optimizing server threads while utilizing the same resources.
    - Configured the service to handle data from all languages, including emoji.
    - Deployed the web service using Docker images on Kubernetes across multiple cloud service

  • Search Engine Optimization

    -

    Performed competitor analysis and keyword research using Google Trends and computed Inverse document frequency to find out attractor and discriminatory terms. By effecting changes in the URL structure, targeted HTML changes, keyword concentrated content to increase the term frequency and link building, I optimized to increase the search ranking of a low ranked website to within top 3, in CMU's Indri Search Engine.

  • Social Networking Timeline with Heterogeneous Backends

    -

    - Compared the advantages and disadvantages of utilizing flat files, SQL databases, and NoSQL database solutions.
    - Performed ETL for a dataset.
    - Integrated together SQL and NoSQL databases to work on complex applications to build a social networking website.
    - Responded to complex queries that span multiple databases.
    - Implemented User Access functionalities like login, logout on SQL.
    - Stored a social graph using Hbase
    - Build a timeline using MongoDB
    - Build a…

    - Compared the advantages and disadvantages of utilizing flat files, SQL databases, and NoSQL database solutions.
    - Performed ETL for a dataset.
    - Integrated together SQL and NoSQL databases to work on complex applications to build a social networking website.
    - Responded to complex queries that span multiple databases.
    - Implemented User Access functionalities like login, logout on SQL.
    - Stored a social graph using Hbase
    - Build a timeline using MongoDB
    - Build a recommendation engine for the social network.

  • Big Data Analytics

    -

    - Processed a large text dataset (Wikipedia) using MapReduce running within distributed frameworks on AWS (EMR), Azure (HD Insight) and GCP.
    - Derived popular trends based on tags such as "Donald Trump".

  • Mutual Fund Web Application Project description

    -

    • Implemented a web application that allows users or super users to buy, sell or manage Mutual Funds
    • Led the software design and implementation of the MVC components and deployment on the AWS Cloud
    • Implemented a RESTful web service based on this web application

  • Locating targets through mention in Twitter(Python) using sentiment an Sentiment Analysis

    -

    -Categorization of a pool of users from Twitter based on their location, content of the tweet, social factors (number of followers, social influence) and activeness of user to generate a list of top users likely to retweet.
    -In addition to these , performed sentiment analysis on the tweets to match the most relevant tweets using NLP, NTLK and Ranked the users using a modified version of Support Vector Machine

  • Bus Tracker Hybrid application( IOS and Android) for Port Authority of Allegheny County

    -

    Developed an Android application for bus tracking in Port Authority of Allegheny County. Core functionalities of the app includes: Listing the ETA of all buses (with the help of PAAC API) for a bus stop clicked on the map (rendered using Google maps API), a search toolbar, powered by Google PlacesAPI, that helps search the input destination, and display all possible buses to that destination from the current location. In addition, an alarm feature is implemented to notify the user at a custom…

    Developed an Android application for bus tracking in Port Authority of Allegheny County. Core functionalities of the app includes: Listing the ETA of all buses (with the help of PAAC API) for a bus stop clicked on the map (rendered using Google maps API), a search toolbar, powered by Google PlacesAPI, that helps search the input destination, and display all possible buses to that destination from the current location. In addition, an alarm feature is implemented to notify the user at a custom time prior to the actual arrival of a bus at a chosen stop.

    Tools, Technologies and APIs used: Android SDK, Android Studio, Java, Google MapsAPI, Google PlacesAPI, PAAC API.

Recommendations received

View Jay’s full profile

  • See who you know in common
  • Get introduced
  • Contact Jay directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses