Joy Zhang

Joy Zhang

United States
9K followers 500+ connections

Articles by Joy

Activity

Join now to see all activity

Experience

  • GEICO Graphic
  • -

  • -

  • -

  • -

  • -

  • -

    Menlo Park, CA

  • -

    Silicon Valley

  • -

  • -

  • -

Education

Licenses & Certifications

Volunteer Experience

  • Palo Alto Little League Graphic

    Coach

    Palo Alto Little League

    - 4 months

    Children

    Coach of Palo Alto Little League T-ball team Rockies

  • Team Manager

    STANFORD SOCCER CLUB

    - Present 4 years 10 months

    Children

    Co-manage the soccer team

Publications

Patents

  • Systems and methods for training a machine learning model for a second language based on a machine learning model for a first language

    Issued US20230169388A1

    Systems, methods, and non-transitory computer readable media can train a machine learning model for a first language to determine a classification for a content item in the first language. Machine translation can be performed to generate respective machine translations of a plurality of content items in a second language into the first language. Respective classifications for the plurality of content items in the second language can be determined based on the machine translations of the…

    Systems, methods, and non-transitory computer readable media can train a machine learning model for a first language to determine a classification for a content item in the first language. Machine translation can be performed to generate respective machine translations of a plurality of content items in a second language into the first language. Respective classifications for the plurality of content items in the second language can be determined based on the machine translations of the plurality of content items in the second language and the machine learning model for the first language. Training data in the second language can be automatically generated, where the training data in the second language includes the plurality of content items in the second language and the respective classifications.

    Other inventors
  • Labeling video files using acoustic vectors

    Issued 11,372,917

    In one embodiment, a method includes receiving a video file. The video file includes a corresponding audio stream. The method further includes accessing the audio stream, and generating, based on the audio stream, a representative vector. The vector has a particular number of dimensions. The method further includes accessing a label-embedding space, which has the same particular number of dimensions, and includes a number of regions that each correspond to a respective label. The method further…

    In one embodiment, a method includes receiving a video file. The video file includes a corresponding audio stream. The method further includes accessing the audio stream, and generating, based on the audio stream, a representative vector. The vector has a particular number of dimensions. The method further includes accessing a label-embedding space, which has the same particular number of dimensions, and includes a number of regions that each correspond to a respective label. The method further includes determining a region of the label-embedding space that corresponds to the vector, the determined region corresponding to a particular label. The method further includes associating the particular label with the video file.

    Other inventors
    See patent
  • Systems and methods for scraping URLs based on viewport views

    Issued US 11,195,106

    Systems, methods, and non-transitory computer readable media are configured to receive a uniform resource locator. A time and one or more features associated with the uniform resource locator can be provided to a first machine learning model. A prediction relating to a quantity of views the uniform resource locator achieves by the time can be received from the first machine learning model.

  • Post topic classification

    Issued US 11,144,826

    In one embodiment, a method includes accessing an input vector representing an input post, wherein: the vector space comprises clusters each associated with a topic; each cluster was determined based on a clustering of training-page vectors corresponding to training pages that each comprise training posts, each training post submitted by a user to a training page and comprises content selected by the user; and each training-page vector was generated by an ANN that was trained, based on the…

    In one embodiment, a method includes accessing an input vector representing an input post, wherein: the vector space comprises clusters each associated with a topic; each cluster was determined based on a clustering of training-page vectors corresponding to training pages that each comprise training posts, each training post submitted by a user to a training page and comprises content selected by the user; and each training-page vector was generated by an ANN that was trained, based on the training posts of training pages associated with the ANN, to receive a post and then output a probability that the received post is related to the training posts of the training pages; determining that the input vector is located within a particular cluster in the vector space; and determining a topic of the input post based on the topic associated with the particular cluster that the input vector is located within.

    Other inventors
    See patent
  • Social hash for language models

    Issued US 10,902,221

    Components of language processing engines, such as translation models and language models, can be customized for groups of users or based on user type values. Users can be organized into groups or assigned a value on a continuum based on factors such as interests, biographical characteristics, social media interactions, etc. In some implementations, translation engine components can be customized for groups of users by selecting the training data from content created by users in that group. In…

    Components of language processing engines, such as translation models and language models, can be customized for groups of users or based on user type values. Users can be organized into groups or assigned a value on a continuum based on factors such as interests, biographical characteristics, social media interactions, etc. In some implementations, translation engine components can be customized for groups of users by selecting the training data from content created by users in that group. In some implementations, the group identifier or continuum value can be part of the input into a general translation component allowing the translation component to take a language style of that user group into account when performing language processing tasks.

    Other inventors
    See patent
  • Automatic personalized story generation for visual media

    Issued US 10664664

    Exemplary embodiments relate to the automatic generation of captions for visual media, including photos, photo albums, non-live video, and live video. The visual media may be analyzed to determine contextual information (such as location information, people and objects in the video, time, etc.). A system may integrate this information with information from the user's social network and a personalized language model built using public-facing language from the user. The personalized language…

    Exemplary embodiments relate to the automatic generation of captions for visual media, including photos, photo albums, non-live video, and live video. The visual media may be analyzed to determine contextual information (such as location information, people and objects in the video, time, etc.). A system may integrate this information with information from the user's social network and a personalized language model built using public-facing language from the user. The personalized language model captures the user's way of speaking to make the generated captions more detailed and personalized. The language model may account for the context in which the video was generated. The captions maybe used to simplify and encourage content generation, and may also be used to index visual media, rank the media, and recommend the media to users likely to engage with the media.

    See patent
  • Extracting questions and answers

    Issued US 10,762,438

    A system for answering user questions can provide answers from a knowledge base that stores question/answer pairs. These pairs can be associated with characteristics of the asking user so that, when subsequent users ask similar questions, answers can be selected that have been identified as most relevant to that type of user. The question/answer pairs in the knowledge base can be identified from social media posts where the original post contains a question and one or more comments on the post…

    A system for answering user questions can provide answers from a knowledge base that stores question/answer pairs. These pairs can be associated with characteristics of the asking user so that, when subsequent users ask similar questions, answers can be selected that have been identified as most relevant to that type of user. The question/answer pairs in the knowledge base can be identified from social media posts where the original post contains a question and one or more comments on the post provide an answer. Posts can be identified as containing a question using a question classification model. A post comment can be identified as an answer based on: whether the question poster responded positively to the comment, whether the comment has similar keywords to the question, whether the comment has the characteristics of an answer, and how often a similar answer has been provided for similar questions.

    Other inventors
    See patent
  • User clustering in a latent space for identifying user interest in a content item

    Issued US 10,740,825

    An online system targets users with a candidate content item. The online system generates a user embedding for each of a plurality of users by identifying content items interacted with by the user, identifying one or more keywords within the content items, identifying word embeddings for each of the identified keywords, and generating the user embedding with the word embeddings. The online system clusters the user embeddings while generating a cluster embedding for each cluster. The online…

    An online system targets users with a candidate content item. The online system generates a user embedding for each of a plurality of users by identifying content items interacted with by the user, identifying one or more keywords within the content items, identifying word embeddings for each of the identified keywords, and generating the user embedding with the word embeddings. The online system clusters the user embeddings while generating a cluster embedding for each cluster. The online system generates a targeting embedding for the candidate content item. Then the online system generates a score for each cluster based on a comparison of the targeting embedding with each cluster embedding. From the generated scores, the online system ranks and selects some clusters for presentation of the candidate content item.

    Other inventors
    • Yang Yang
    See patent
  • Crowdsourced chatbot answers

    Issued US 10,692,006

    A chatbot can use a knowledge base including question/answer pairs to respond to questions. When a question is asked that does not correspond to a question/answer pair in the knowledge base, the chatbot can send the question to one or more humans to obtain an answer. However, only some people will have the experience, context, knowledge, etc., to answer the question. A model can be trained to select "experts" that are likely to be able to provide a good answer to a question by using both A) a…

    A chatbot can use a knowledge base including question/answer pairs to respond to questions. When a question is asked that does not correspond to a question/answer pair in the knowledge base, the chatbot can send the question to one or more humans to obtain an answer. However, only some people will have the experience, context, knowledge, etc., to answer the question. A model can be trained to select "experts" that are likely to be able to provide a good answer to a question by using both A) a vector comprising characteristics of questions and of the person posing the questions and B) a vector comprising characteristics of a possible expert. The model can trained to produce a value predicting how good an identified expert's answer is likely to be. The model can be trained based on measures of past answers provided for types of questions/questioners.

    See patent
  • Systems and methods for training machine learning models for language clusters

    Issued US 10,685,188

    Systems, methods, and non-transitory computer readable media can generate a plurality of language clusters based on one or more of: language similarity between languages or social behavior similarity between languages. A representative language for a language cluster of the plurality of language clusters can be determined. For the language cluster of the plurality of language clusters, a machine learning model can be trained based on the representative language for the language cluster to…

    Systems, methods, and non-transitory computer readable media can generate a plurality of language clusters based on one or more of: language similarity between languages or social behavior similarity between languages. A representative language for a language cluster of the plurality of language clusters can be determined. For the language cluster of the plurality of language clusters, a machine learning model can be trained based on the representative language for the language cluster to classify content items in languages included in the language cluster.

    Other inventors
    See patent
  • Language-agnostic understanding

    Issued US 10,657,332

    Exemplary embodiments relate to techniques to classify or detect the intent of content written in a language for which a classifier does not exist. These techniques involve building a code-switching corpus via machine translation, generating a universal embedding for words in the code-switching corpus, training a classifier on the universal embeddings to generate an embedding mapping/table; accessing new content written in a language for which a specific classifier may not exist, and mapping…

    Exemplary embodiments relate to techniques to classify or detect the intent of content written in a language for which a classifier does not exist. These techniques involve building a code-switching corpus via machine translation, generating a universal embedding for words in the code-switching corpus, training a classifier on the universal embeddings to generate an embedding mapping/table; accessing new content written in a language for which a specific classifier may not exist, and mapping entries in the embedding mapping/table to the universal embeddings. Using these techniques, a classifier can be applied to the universal embedding without needing to be trained on a particular language. Exemplary embodiments may be applied to recognize similarities in two content items, make recommendations, find similar documents, perform deduplication, and perform topic tagging for stories in foreign languages.

    Other inventors
    See patent
  • Deep translations

    Issued US US10586168B2

    Other inventors
  • Incorporation of user-provided natural language translations in a social networking system

    Issued US US10528677B1

    Other inventors
  • Associating a user identity with a mobile device identity

    Issued US 10,354,145

    A system includes, in one aspect, one or more processing devices that perform operations comprising: detecting one or more human objects in images captured by a visual image recording device; obtaining a motion time series for each of the detected one or more human objects using the captured images; obtaining a received signal strength (RSS) time series for each of the one or more mobile devices, based on received RF signals from the one or more mobile devices; and generating an association…

    A system includes, in one aspect, one or more processing devices that perform operations comprising: detecting one or more human objects in images captured by a visual image recording device; obtaining a motion time series for each of the detected one or more human objects using the captured images; obtaining a received signal strength (RSS) time series for each of the one or more mobile devices, based on received RF signals from the one or more mobile devices; and generating an association between (i) identifying data for a first mobile device of the one or more mobile devices, and (ii) identifying data for one of the one or more human objects representing a first human, wherein the first mobile device has an RSS time series that fluctuates at a time period corresponding to movement in the obtained motion time series for the one of the one or more human objects representing the first human.

    Other inventors
    See patent
  • Crowd matching translators

    Issued US US10255277B2

    Exemplary embodiments relate to techniques for selecting translators willing to provide high-quality translations for a cause, organization, or individual. Users having a high level of engagement with the cause, organization, or individual may be identified as translator candidates. For example, the user may actively engage with the organization or individual on social media, or may be interested in the topics discussed in the source document. The translators may be evaluated based on the…

    Exemplary embodiments relate to techniques for selecting translators willing to provide high-quality translations for a cause, organization, or individual. Users having a high level of engagement with the cause, organization, or individual may be identified as translator candidates. For example, the user may actively engage with the organization or individual on social media, or may be interested in the topics discussed in the source document. The translators may be evaluated based on the quality of their previous translations and their level of engagement/interest. The translator candidates may be directly connected with the originator of the request to translate the document. Because exemplary embodiments select highly engaged users to translate the source document, the resulting translation is likely to be of higher quality, and produced at a lower cost, than a translation by a non-engaged user, and user participation and awareness of a cause, organization, or individual may be increased.

    Other inventors
    See patent
  • Language model personalization

    Issued US 20170185583

  • Language independent representations

    Filed US 20170103062A1

    Snippets can be represented in a language-independent semantic manner. Each portion of a snippet can be represented by a combination of a semantic representation and a syntactic representation, each in its own dimensional space. A snippet can be divided into portions by constructing a dependency structure based on relationships between words and phrases. Leaf nodes of the dependency structure can be assigned: A) a semantic representation according to pre-defined word mappings and B) a syntactic…

    Snippets can be represented in a language-independent semantic manner. Each portion of a snippet can be represented by a combination of a semantic representation and a syntactic representation, each in its own dimensional space. A snippet can be divided into portions by constructing a dependency structure based on relationships between words and phrases. Leaf nodes of the dependency structure can be assigned: A) a semantic representation according to pre-defined word mappings and B) a syntactic representation according to the grammatical use of the word. A trained semantic model can assign to each non-leaf node of the dependency structure a semantic representation based on a combination of the semantic and syntactic representations of the corresponding lower-level nodes. A trained syntactic model can assign to each non-leaf node a syntactic representation based on a combination of the syntactic representations of the corresponding lower-level nodes and the semantic representation of that node.

    See patent
  • Mining multi-lingual data

    Issued US 9,864,744

  • Multilingual Business Intelligence for Actions

    Issued US 201,601,886,61A1

  • Analyzing language dependency structures

    Issued US 9,830,404

  • Contrastive multilingual business intelligence

    Issued US 20160188703A1

  • Predicting future translations

    Issued US 9,805,029

  • Predicting future translations

    Issued US 9,747,283

Languages

  • Chinese

    -

More activity by Joy

View Joy’s full profile

  • See who you know in common
  • Get introduced
  • Contact Joy directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses