🔔 𝐄𝐱𝐜𝐢𝐭𝐢𝐧𝐠 𝐔𝐩𝐝𝐚𝐭𝐞𝐬 𝐨𝐧 “𝐊𝐊𝐓-𝐇𝐚𝐫𝐝𝐍𝐞𝐭” 🔔 𝐊𝐊𝐓-𝐇𝐚𝐫𝐝𝐍𝐞𝐭 —a general physics-constrained ML tool that combines data and domain knowledge for scientific machine learning (SciML)—is now officially available as a Python package. We’ve significantly upgraded the framework with CUDA support and a more modular problem construction pipeline, making it faster and easier to use than before. 𝑲𝒆𝒚 𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒎𝒆𝒏𝒕𝒔: - 📈 𝘐𝘮𝘱𝘳𝘰𝘷𝘦𝘥 𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘪𝘰𝘯 𝘢𝘤𝘤𝘶𝘳𝘢𝘤𝘺 𝘢𝘯𝘥 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦 𝘵𝘪𝘮𝘦 - ⏱️ 𝘍𝘢𝘴𝘵𝘦𝘳 𝘢𝘯𝘥 𝘮𝘰𝘳𝘦 𝘦𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘵 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘰𝘯 𝘣𝘰𝘵𝘩 𝘊𝘗𝘜 𝘢𝘯𝘥 𝘎𝘗𝘜 - 🧩 𝘔𝘰𝘥𝘶𝘭𝘢𝘳 𝘥𝘦𝘴𝘪𝘨𝘯 𝘧𝘰𝘳 𝘧𝘭𝘦𝘹𝘪𝘣𝘭𝘦 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘴𝘦𝘵𝘶𝘱 𝑰𝒇 𝒚𝒐𝒖’𝒗𝒆 𝒃𝒆𝒆𝒏 𝒖𝒔𝒊𝒏𝒈 𝒆𝒂𝒓𝒍𝒊𝒆𝒓 𝒗𝒆𝒓𝒔𝒊𝒐𝒏𝒔, 𝒘𝒆 𝒔𝒕𝒓𝒐𝒏𝒈𝒍𝒚 𝒓𝒆𝒄𝒐𝒎𝒎𝒆𝒏𝒅 𝒔𝒘𝒊𝒕𝒄𝒉𝒊𝒏𝒈 𝒕𝒐 𝒕𝒉𝒊𝒔 𝒐𝒑𝒕𝒊𝒎𝒊𝒛𝒆𝒅 𝒊𝒎𝒑𝒍𝒆𝒎𝒆𝒏𝒕𝒂𝒕𝒊𝒐𝒏. 📄 Paper: https://lnkd.in/gD7p7G6Z 💻 Code: https://lnkd.in/gzqrEVgf 📦 Package: https://lnkd.in/gNrDxF3t ⚙️ Install via pip: CPU: 𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘬𝘬𝘵-𝘩𝘢𝘳𝘥𝘯𝘦𝘵 GPU (CUDA 12): 𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 "𝘬𝘬𝘵-𝘩𝘢𝘳𝘥𝘯𝘦𝘵[𝘤𝘶𝘥𝘢12]" We’d love to hear your feedback. Feel free to reach out with questions or thoughts on the documentation and examples. Bimol Nath Roy Rahul Golder Ashfaq Iftakher #PhysicsConstrainedMachineLearning #PCML #ConstrainedLearning #MachineLearning #Optimization #DeepLearning #JAX #CUDA #Research #AI #Engineering #PSE
KT-HardNet Officially Released as Python Package
More Relevant Posts
-
"Fine-tuning costs thousands." I hear this every week. It's wrong. Here's the real cost. This myth comes from traditional full fine-tuning, especially on large proprietary models like GPT-4, where costs can indeed be high due to compute and infrastructure requirements. But modern techniques have completely changed the game. LoRA (Low-Rank Adaptation) allows you to train only a tiny fraction of model parameters — often less than 0.1%. This dramatically reduces compute requirements while maintaining strong performance. Then comes QLoRA, which uses 4-bit quantization to reduce memory usage even further. This means you can fine-tune powerful models on much smaller hardware. In reality, you can fine-tune a 7B parameter model on Google Colab (free tier) with zero cost. What you actually need is simple: one GPU, a clean dataset, a few hours (4–8), and basic Python knowledge. If you are starting, try models like Mistral 7B or LLaMA 3. The real barrier is not money. It is understanding the workflow. #FineTuning #LoRA #LLM #GenerativeAI #HuggingFace #QLoRA #Python
To view or add a comment, sign in
-
-
SOTA Agent v1.4.1 is live. This ClawHub skill adds structure and reproducibility to computer-vision and data-science experiments. What it standardizes: - role separation: Scout, Reproducer, Ablator, Reviewer, Promoter - execution lanes for browser/GUI work, notebooks, GPU VMs, and local runs - campaign state tracking with bundled Python tools - promotion gates so results are reviewed before they become portfolio or deployment evidence Goal: fewer ad-hoc runs, more verifiable results. #MLOps #ComputerVision #DataScience #Reproducibility #AI
To view or add a comment, sign in
-
-
🧠 Built a Vanilla GAN from scratch using PyTorch — and it was one of the most rewarding deep learning projects I've done! Generative Adversarial Networks always felt like "black magic" to me — until I actually built one. So I rolled up my sleeves and implemented everything from the ground up, no shortcuts. 🔧 Here's what's under the hood: • Dataset: CelebA (celebrity faces), preprocessed with center crop → resize to 64×64 → normalized to [-1, 1] • Generator: 4-layer MLP (noise z=100 → 256 → 512 → 1024 → image), Tanh output • Discriminator: 4-layer MLP with LeakyReLU activations + Sigmoid output • Loss: Binary Cross-Entropy (BCELoss) • Optimizer: Adam (lr=0.0002, β₁=0.5) for both networks • Hardware: Auto-selects CUDA / Apple MPS / CPU The core idea is elegant — two networks locked in a constant game. The Generator learns to create realistic faces from pure noise, while the Discriminator learns to tell real from fake. Over 5 epochs, you can watch the generated faces slowly come to life. 🎨 If you're learning deep learning, building a GAN from scratch will teach you more about backprop and optimization than almost anything else. 🔗 GitHub: https://lnkd.in/gVV4FXz5 #DeepLearning #GAN #PyTorch #GenerativeAI #MachineLearning #ComputerVision #Python
To view or add a comment, sign in
-
Day 24/30 ML Challenge; Bahdanau Attention; Standard Encoder-Decoder architectures suffer from catastrophic amnesia because they force the Encoder to compress entire sequences into a single fixed-size vector. To solve this, I engineered an Attention bridge that dynamically calculates alignment scores, allowing the Decoder to "look back" at specific Encoder hidden states at every single generation step. Core Mechanics; 1. Architecture : A Bidirectional GRU Encoder paired with an Attention-driven Unidirectional GRU Decoder. 2. Tokenization : Strict character-level mapping to prevent the infinite vocabulary explosion inherent to mathematical domains. 3. Evaluation : Exact Match Accuracy (EMA). Character Error Rate is useless in calculus; a single hallucinated token invalidates the entire equation. 4. Data Pipeline : Engineered a deterministic synthetic generator using SymPy to build abstract syntax trees and exact ground-truth targets. The architecture works, but the mathematical engine is too slow to scale. Full Explanation, Math and Python Code in Repository. Repo : https://lnkd.in/gj-pd8dg #MachineLearning #PyTorch #DeepLearning #ArtificialIntelligence #SequenceModeling #Engineering #DataScience #AI
To view or add a comment, sign in
-
Excited to share the reproducible artifacts from my undergraduate thesis — now fully open-sourced on GitHub! 🎓 Over the past months, I've been working on something that sits at the intersection of two fields I find genuinely fascinating: quantum machine learning and predictive maintenance. The project is titled: "Hybrid Quantum-Classical Regression Model for Remaining Useful Life Prediction of Rolling Bearings" The core idea: instead of relying solely on classical deep learning, I built a hybrid architecture that routes vibration signal features through a classical FC encoder → a variational quantum circuit (VQC) → a classical decoder, all trained end-to-end using PyTorch and PennyLane. I benchmarked it against a standard 2-layer LSTM baseline on the XJTU-SY run-to-failure bearing dataset. The results were honestly surprising to me: → 10.5% reduction in MAE → 11.9% reduction in RMSE And I also ran a depolarizing noise sweep to probe how the circuit holds up under realistic quantum hardware conditions. I think it is a step that's easy to skip but really matters if this is ever going to move beyond simulation. The repo has everything needed to reproduce from scratch: notebooks for EDA and feature engineering, all source modules, a CLI entry point, and scripts to regenerate the publication figures. This work was supervised by Md Alamgir Kabir, PhD at Daffodil International University, and I'm really grateful for his guidance throughout. He was the one who suggested to publish this repository. If you're working on quantum ML, predictive maintenance, or just curious about where the two fields meet, I'd love to hear your thoughts. The repo is linked below 👇 https://lnkd.in/g7VyigxM #QuantumMachineLearning #PredictiveMaintenance #MachineLearning #QuantumComputing #OpenSource #UndergraduateResearch #PennyLane #PyTorch #DIU
To view or add a comment, sign in
-
I just open-sourced a PyTorch implementation of TurboQuant (ICLR 2026) as a drop-in vLLM plugin for KV cache compression. The problem it solves: on long-context inference, the KV cache becomes the GPU memory bottleneck faster than the model weights do. A 35B model at 32K context can easily hit 4–6 GB of KV cache alone. What this does: → 5x KV cache compression at 3-bit (289 MB → 58 MB on Qwen2.5-3B at 8K context) → 99.5%+ attention cosine similarity — mathematically unbiased inner products via QJL residual correction → Zero changes to prefill. Compression applies only during decode. → No custom CUDA kernels. Pure PyTorch, installs as a vLLM plugin via the official entry point. The two-stage algorithm (Lloyd-Max optimal quantization + 1-bit QJL correction) is the interesting part. Stage 1 gives you MSE-optimal compression after a random rotation. Stage 2 fixes the inner product bias that Stage 1 introduces — using just 1 bit per dimension to make attention scores mathematically unbiased. The result is that per-vector reconstruction error is 23–44%, but attention score accuracy stays at 99.5%+. You're compressing the cache, not the computation. Validated against Qwen2.5-3B-Instruct across 2K–8K contexts (72 layer-head checks) and used in production with Qwen3.5-35B-A3B-AWQ. Install: pip install "turboquant[vllm] @ git+https://github.com/BFinn /turboquant-vllm.git" Then just: VLLM_TURBOQUANT_BITS=3 vllm serve your-model Code, benchmarks, and algorithm walkthrough: https://lnkd.in/dJ6NqCj4 #LLM #MLEngineering #OpenSource #vLLM #Inference
To view or add a comment, sign in
-
Ever wonder what Q4_K_M, GPTQ, or GGUF actually mean in a model name? This 6-slide visual guide breaks it down: What quantization is and how it works (with the actual math) Real benchmarks: how a 140GB model becomes 40GB with barely any quality loss GPTQ vs AWQ vs K-Quants -- when to use which How to decode model names like "codellama-70b-python.Q4_K_M.gguf" A decision framework for picking the right quant for your hardware Whether you're running models locally on a MacBook or deploying on GPU servers, understanding quantization is the difference between "it doesn't fit" and "it runs great." Swipe through. Save for later. #quantization #genai #LLM
To view or add a comment, sign in
-
Sorting is a fundamental operation in computer science that significantly influences the efficiency of various algorithms and applications. Among the myriad sorting techniques, Bubble Sort and Quick Sort are two of the most commonly studied algorithms. Bubble Sort, while easy to understand and implement, becomes inefficient with larger datasets due to its quadratic time complexity. In contrast, Quick Sort employs a more sophisticated divide-and-conquer strategy that allows it to sort elements efficiently, with an average-case time complexity of O(n log n). This article provides a detailed exploration of both algorithms, examining their mechanics, time complexities, and offering a practical comparison through implementation. #SortingAlgorithms #BubbleSort #QuickSort #TimeComplexity #DataStructures #PythonProgramming
To view or add a comment, sign in
-
🚀 Dijkstra’s Algorithm — Finding the Shortest Path Efficiently In the field of graph theory and optimization, Dijkstra’s algorithm is a fundamental method for solving shortest path problems. 🔍 Concept: Developed by Edsger W. Dijkstra, this algorithm computes the shortest path from a source node to all other nodes in a weighted graph with non-negative edge weights. ⚙️ How it works: 1. Initialize the source distance to 0 and all others to ∞ 2. Select the unvisited node with the smallest distance 3. Update the distances of its neighbors 4. Repeat until all nodes are visited 📊 Complexity: * O(V²) with a simple implementation * O((V + E) log V) using a priority queue (heap) 💡 Real-world applications: → GPS and navigation systems → Network routing protocols → Game development (pathfinding) → Logistics and supply chain optimization 🧠 As a student in AI and optimization, mastering this algorithm is essential to understanding more advanced techniques such as weighted graphs, heuristics, and the A algorithm. 📌 Tip:Try implementing it in Python to fully grasp how it works! #Dijkstra #Algorithms #GraphTheory #Optimization #DataScience #ArtificialIntelligence #Python #OperationsResearch
To view or add a comment, sign in
-
-
Epoch AI: In MirrorCode, AIs get execute-only access to an existing program and visible test cases, but no access to the source code. They must design and implement their own codebase to replicate the original program's functionality. Recent AI models excel at MirrorCode tasks. Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with ~16,000 lines of Go and 40+ commands. We estimate this would take an unassisted human SWE 2–17 weeks. Even on harder tasks, AIs were still making progress when our experiments hit their token limits. With a higher budget, Opus 4.6 would plausibly solve the hardest task discussed in our post: reimplementing the Pkl programming language. Read more: https://lnkd.in/djgv_mtF
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
This is interesting. I developed something similiar for SAMSUNG called PEGRANN (physics-enforced graph neural networks) which are low-fidelity surrogate models trained from high-fidelity physics based simulators. This is mainly used for digitial twins of thermo-fluid systems, where despite the loss of fidelity, mass, energy and momentum balances are still enforced by affine projecting the Graph NN outputs to a linear system Ay=b . For example, mass balance constraint for a single tee-junction can be expressed as [1, 1, 1] x [port_a; port_b; port_c] = [0]. However, I found for very large systems (size of A matrix is 50000 x 45000), solving the projection math becomes very difficult due to closed loops in the physical system that translate to rank deficiency. Finding and addressing each singularity is simply not practical. Using larger Tikhonov regularization relaxes this somewhat, but as a consequence, the physics constraints are no longer "strictly" enforcing. I also got better results when I trained my PEGRANNs in two stages: First unconstrained, then with the affine projections. Stage 2 training usually converges to a lower loss compared to training in a single stage with the physics constraints. I would love to discuss more.