Zhihao Li
Mountain View, California, United States
7K followers
500+ connections
View mutual connections with Zhihao
Zhihao can introduce you to 10+ people at Google DeepMind
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Zhihao
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Experience
Education
View Zhihao’s full profile
-
See who you know in common
-
Get introduced
-
Contact Zhihao directly
Other similar profiles
Explore more posts
-
Yizhe Zhang
Apple • 4K followers
We (w/ Shansan Gong, Ruixiang ZHANG, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong) released a family of 7B diffusion language models, DiffuCoder, that specializes on code generation, with a focus on understanding and improving masked diffusion models. A core analysis of DiffuCoder is the autoregressiveness (AR-ness) score, a novel metric that quantifies the causal patterns in decoding, revealing how diffusion models break from strict left-to-right generation for more flexible, non-linear code planning. Recent advances in autoregressive (AR) models dominate code generation, but diffusion-based LLMs (dLLMs) like DiffuCoder offer a promising alternative, especially for complex programming tasks. DiffuCoder explores how these models decode differently—showing less global AR-ness in code tasks compared to math—and how temperature affects both token selection and generation order, unlike traditional AR models. We also introduce coupled-GRPO, a post-training RL method with a coupled-sampling scheme, to reduce performance drops during accelerated decoding, boosting parallelism and efficiency. We use a self-improvement pipeline that leverages AR-ness analysis, coupled-GRPO optimization, and evaluation on benchmarks like AceCode-89k to refine decoding strategies. This approach enables DiffuCoder to navigate diverse code generation pathways and enhance performance with modest computational overhead. Looking ahead, we aim to further leverage Reinforcement Learning to steer code generation through these decoding patterns, with the discrete nature of AR-ness scores providing a foundation for search-based strategies—ideal for the sparse rewards of optimizing complex code structures. Check out our full paper and code for a deeper dive! Paper: https://lnkd.in/gVWU3BDJ Code: https://lnkd.in/gmXTZ_6n Models: https://lnkd.in/gTcKCDr9 #MachineLearning #AI #CodeGeneration #DiffusionModels #NLP
220
5 Comments -
Julius Kusuma
Meta • 3K followers
We developed an open-source AI tool to design concrete mixes that are stronger, more sustainable, and ready to build with faster—speeding up construction while reducing environmental impact. https://lnkd.in/gPCk8tCM But the impact of this AI tool is not just hypothetical! Amrize used Meta’s AI-based technologies to design a new low-carbon mix, and successfully deployed it in an at-scale slab-on-grade application at Meta's new data center in Rosemont, MN. Compared to the legacy mix, this new AI-designed mix is: 🦁 Stronger ⏱ Faster 🍃 Lower carbon ⏱ ️The ideal set time All this was achieved without needing any new materials, nor special equipment. Best of all, the AI is open-sourced. https://lnkd.in/g2KA7KZW This work was featured in a Meta engineering blog article published today! https://lnkd.in/gBU9HY8H
98
9 Comments -
Jibran Hutchins
Haladir (YC W26) • 5K followers
We at Haladir (YC W26) released a report on RLFR: Reinforcement Learning from Formally-Defined Rewards for Code Generation (https://lnkd.in/guCH7w7C). The core idea is simple: we present a simple offline and single-turn RL framework that uses formal verification as the primary reward signal for program synthesis in LLMs rather than unit test-based rewards (RLVR), showing improvements over a swath of different coding benchmarks. We fine-tune both Qwen2.5-Coder-7B-Instruct and Qwen3-8B using RLFR and Dafny, a specification language that utilizes Boogie (IVL) and Z3 (SMT solver) to check verification conditions. Despite solely training on Dafny, we show improvements on Python-native coding benchmarks, suggesting generalization past the formal verification domain.
90
8 Comments -
Dr. S Sagar Srinivas
Tata Research Development and… • 1K followers
🚀 Excited to share our latest work on advancing Retrieval-Augmented Generation (RAG) systems! Our paper --- Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding --- will be presented at the KDD 2025(W) on Inference Optimization for Generative AI. 📄 Paper: https://lnkd.in/dErdTDf7 What's the Challenge? Traditional RAG systems struggle with: Static retrieval strategies that don't adapt to context Inefficient memory usage in long-context applications Suboptimal balance between retrieval fidelity and response quality Computational bottlenecks during inference Our Solution: PORAG + ATLAS Framework We introduce two complementary techniques: PORAG (Policy-Optimized RAG): Extends Group Relative Policy Optimization (GRPO) to RAG settings Uses dual reward heads for retrieval fidelity AND response quality Employs QLoRA for parameter-efficient fine-tuning Prevents catastrophic forgetting while optimizing retrieval utilization ATLAS (Adaptive Token-Layer Attention Scoring): Multi-Layer Attention Gradient (MLAG) analysis detects information gaps Layerwise Representation Pooling (LRP) constructs targeted queries Dynamic retrieval scaling based on computational load Only retrieves when truly necessary CRITIC for Memory Optimization Our Cache Reduction via Importance-based Token Inclusion Criteria (CRITIC) technique: Reduces memory usage significantly Maintains performance with minimal quality trade-offs Enables longer context processing Test-Time Scaling Integration We demonstrate how inference techniques like Self-Consistency and Monte Carlo Tree Search further enhance performance on complex reasoning tasks. Looking forward to presenting at KDD 2025! Thanks to my amazing teammates Akash Das, SHIVAM GUPTA and Venkataramana Runkana for their incredible contributions! #KDD2025 #RAG #LLM #MachineLearning #AI #RetrievalAugmentedGeneration #DeepLearning #NLP #InferenceOptimization
35
-
Saran Menon
CodeWork • 1K followers
Today's AI/ML News 🤖💻 Researchers are advancing AI capabilities with new models and tools! Microsoft and Tsinghua University introduced Reward Reasoning Models, enhancing Large Language Models' ability to judge with reasoning, while the Synthetic Data Vault offers a step-by-step guide to creating synthetic data. LLM Reasoning Boosted Can LLMs Really Judge https://lnkd.in/gXu2rPbP 🤖 Researchers from Microsoft and Tsinghua University introduced Reward Reasoning Models, enhancing Large Language Models' ability to judge with reasoning. Synthetic Data Made Easy Step-by-Step Guide https://lnkd.in/gcarYFbS 🚀 The Synthetic Data Vault offers a step-by-step guide to generating realistic tabular data using machine learning. #AI #ML #LargeLanguageModels #SyntheticData #Innovation 🤖💻
6
-
Vasily Ilin
UW Math AI lab • 484 followers
Most existing #lean4 datasets contain only correct proofs. Models learn error correction with RL, that's expensive. With UW Math AI lab we release a dataset of 260k erroneous Lean proofs with - compiler feedback - reasoning trace - corrected proof Improvements in Error Correction: - Goedel 8B: 2x - Kimina 8B: 3x Paper: https://lnkd.in/gaybt4bd
117
2 Comments -
Milvus, created by Zilliz
13K followers
𝗧𝗵𝗲 𝗯𝗲𝘀𝘁 𝗔𝗜 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘀𝘀𝗼𝗻 𝗰𝗼𝘀𝘁 𝗼𝘂𝗿 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 $𝟲𝟬𝟬 𝗮𝗻𝗱 𝗮 𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲 𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁. Our VP of Engineering, Xiaofan(James) Luan, was supposed to buy his wife a Dior bag for their anniversary. Instead, he bought three Claude Code subscriptions and spent the holiday trying to cross-compile 2 million lines of C++. Every fix on one platform broke two others. $600 later, the only output was "git reset --hard" — and a very cold dinner table.😂 "Make it compile on Windows" is a trap. The real goal was "compile everywhere without hacks" — no AI is going to figure that out for you at 2 am. What worked: constraints before code, review tests not code, bottom-up, one layer at a time. Same task, two days. Then he ran six parallel Claude sessions across three machines with git worktree. The bottleneck stopped being intelligence and started being how fast one person can alt-tab. AI solves exactly the problem you give it. Engineering is in knowing which one to give. His wife is still waiting for that bag. Full story: https://lnkd.in/gtsW_Wvk ——— Follow Milvus, created by Zilliz, for everything related to unstructured data
9
-
Peter Rigby
Concordia University • 926 followers
Our paper on code reviewer recommendation demonstrates the sophistication of A/B testing at Meta through three randomized controlled experimental trials. In the paper, we develop and release improvements to our code reviewer recommender system and track goal and guardrail metrics in production. We improve accuracy and reduce latency of recommendations resulting in higher usage, conduct workload balancing, and reduce the bystander effect. The historical mining backtests of recommenders doesn’t always correspond to how well the recommender works in production A/B tests at scale. ACM TOSEM https://lnkd.in/ehK8SqdZ
91
4 Comments -
Nina Peñaflor
LLM Arena • 1K followers
👉 My Key Takeaways from Chip Huyen's Recent Interview on Lenny's Podcast Chip Huyen is the author of the widely recognized "AI Engineering: Building Applications with Foundation Models". Link to the podcast: https://lnkd.in/dxZ-tFWX 💡 Importance of post-training. Pre-training gives you raw capabilities (next token prediction on massive data), but post-training is what makes the model actually usable. SFT on high-quality examples + RLHF. Fine-tuning should be your last resort, not first. Most problems can be solved with better prompts, better data, or RAG. 💡 Evals. You can't improve what you can't measure. Need multiple types: unit tests (does this specific prompt work?), integration tests (does the whole pipeline work?), regression tests (did we break something?), and user feedback loops. The hardest part isn't writing evals; it's maintaining them as your product evolves. 💡AI products. Reliability and UX matter more than models. Most AI product failures aren't about bad models: they're about reliability (API limits, latency spikes, poor monitoring) and UX (users don't understand how to use it, doesn't fit workflow). Building reliable platforms and talking to users constantly beats chasing SOTA models. Most insights come from watching users, not from benchmarks. 💡How to improve AI-powered apps. What people think improves apps: staying current on AI news, chasing newest agentic framework, obsessing over vector database choice, constantly evaluating model benchmarks, fine-tuning models. What actually improves apps: talking to users, building reliable platforms, preparing better data, optimizing end-to-end workflows, writing better prompts. Better prompt engineering beats switching models 90% of the time. A well-crafted system prompt, clear instructions, good examples (few-shot), and proper output formatting can transform a mediocre experience into a great one. 💡 Advice for builders. Start with user problem, not with cool AI technique. Use the simplest solution that works (often that's a good prompt, not a fine-tuned model). Build evals early. Focus on end-to-end experience. Don't fine-tune unless you've exhausted everything else. Don't treat AI as deterministic (it's not, you need to handle variability). Don't ignore data quality (garbage in, garbage out).
1
1 Comment -
Zhide Wang
Southern Methodist University • 401 followers
Excited to share our new paper published in Human Factors: “Inferring Hidden Attentional States in Driving: A Bayesian Approach to Modeling Distraction and Secondary Task Engagement.” This work was led by Lekhapriya Dheeraj Kashyap as part of her PhD research, and I’m grateful to have collaborated on it. Many real-world systems face the same challenge: the most important states are hidden. In driving, we can observe signals such as speed, eye movements, or pupil dilation — but not the driver’s true attentional state. In this work, we develop a Bayesian decision framework based on a Partially Observable Semi-Markov Decision Process (POSMDP) to infer latent attentional states and model how drivers allocate attention between competing tasks. The model detects distraction earlier than common heuristic rules and reveals substantial heterogeneity in attention strategies. Beyond driving safety, problems like this appear in many AI systems where human states are latent and decisions unfold sequentially. Great collaboration with Lekhapriya Kashyap, Yanling Chang, Maryam Zahabi, and Alfredo Garcia. 📄 Paper: https://lnkd.in/g23ubzht
9
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content