Python Collaborative Filtering for Implicit Feedback Datasets

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash

Recommender Systems using implicit #machinelearning #datascience #recommendersystems #implicit Collaborative Filtering for Implicit Feedback Datasets Fast Python Collaborative Filtering for Implicit Datasets. This project provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets : Alternating Least Squares as described in the papers Collaborative Filtering for Implicit Feedback Datasets and Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering. Bayesian Personalized Ranking. Logistic Matrix Factorization Item-Item Nearest Neighbour models using Cosine, TFIDF or BM25 as a distance metric. All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU's. Approximate nearest neighbours libraries such as Annoy, NMSLIB and Faiss can also be used by Implicit to speed up making recommendations. https://lnkd.in/gwJJr2PS

GitHub - benfred/implicit: Fast Python Collaborative Filtering for Implicit Feedback Datasets github.com

To view or add a comment, sign in

More Relevant Posts

Linde Center for Science, Society, and Policy (LCSSP) at Caltech

628 followers
1w Edited
Report this post
🔊 New paper ("Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python") published in The Journal of Open Research Software (JORS): 🔶Jacob Morrier, Analysis Group 🔶Sulekha Kishore, Massachusetts Institute of Technology 🔶Michael Alvarez, Caltech Paper available here: https://lnkd.in/g7uPvHvE

Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python | Journal of Open Research Software openresearchsoftware.metajnl.com
Like Comment
To view or add a comment, sign in
Jacob Morrier
1w
Report this post
I’m excited to share that our software metapaper on Fast-ER—our Python package for GPU-accelerated record linkage and deduplication—has been published in the Journal of Open Research Software. Huge thanks to Sulekha Kishore and Michael Alvarez for their outstanding work in making this package a reality. If you’re working on record linkage and deduplication problems, I encourage you to check out the paper and give Fast-ER a try! 🔗 Link to the paper: https://lnkd.in/eVRyeMW2 🔗 Link to the package documentation: https://lnkd.in/eJbYdAuY

Linde Center for Science, Society, and Policy (LCSSP) at Caltech

628 followers
1w Edited

🔊 New paper ("Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python") published in The Journal of Open Research Software (JORS): 🔶Jacob Morrier, Analysis Group 🔶Sulekha Kishore, Massachusetts Institute of Technology 🔶Michael Alvarez, Caltech Paper available here: https://lnkd.in/g7uPvHvE

Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python | Journal of Open Research Software openresearchsoftware.metajnl.com

1 Comment
Like Comment
To view or add a comment, sign in
Saidathu Nisa S
1w
Report this post
Recently, I started exploring Python more deeply, and honestly, it’s one of the easiest languages to get comfortable with. What I like about Python is how simple and readable it is. You don’t have to struggle with complex syntax, so you can focus more on solving problems. That’s probably why it’s widely used in areas like data science, machine learning, and automation. While learning Python, I also came across some interesting tools and languages built around it: Hy – It lets you write Python using a Lisp-style syntax. Felt a bit different at first, but it shows how flexible Python really is. Coconut – This one adds functional programming features to Python. Things like pattern matching make the code cleaner in some cases. MyHDL – This was something new for me. It uses Python for hardware design and can convert code into Verilog or VHDL. Pretty interesting to see Python used beyond software. What I understood from all this is that Python is not just a single language—it’s a whole ecosystem that keeps evolving. Still learning, still exploring 🙂 If you’re also learning Python or working in data science, would love to connect and share ideas! #Python #LearningJourney #DataScience #Programming #Tech
Like Comment
To view or add a comment, sign in
Abhishek Prasad
1w
Report this post
NumPy scored 100,000 fraud transactions per second on a single CPU. The naive Python loop did 800. Same machine. Same model. 127x difference. Day 12 of 30 -- NumPy Internals and Broadcasting Phase 2 -- Performance and Concurrency -- Final Day NumPy is not a Python library. It is a C library with a Python interface. When you write a @ b + bias, Python never touches the numbers. NumPy dispatches directly to BLAS. The CPU uses SIMD instructions on contiguous float32 memory. The bias broadcasts across all rows without allocating a single extra byte. That is the entire secret. Today's topic covers: Why NumPy is 100x faster than Python lists -- contiguous typed C memory explained Strides -- how NumPy navigates a 2D array with just two numbers Why transpose is free -- same memory buffer, just different strides Broadcasting 3 rules with a visual (3,4) + (4,) matrix-vector example 6 ufuncs -- from np.maximum for ReLU to np.einsum for complex contractions Annotated syntax -- strides, views, broadcasting, fancy indexing, einsum Real fraud scorer -- 100k TPS vectorized neural network in pure NumPy 5 mistakes including the view vs copy trap and wrong broadcast axis 5 best practices including float32 for all batch workloads Phase 2 complete. All 6 days of Performance and Concurrency done. #Python #NumPy #DataEngineering #MachineLearning #Performance #100DaysOfCode #PythonDeveloper #TechContent #BuildInPublic #TechIndia #BackendDevelopment #PythonProgramming #LinkedInCreator #LearnPython #OpenToWork #PythonTutorial

1 Comment
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
4w
Report this post
Graph Data libraries for graph processing, clustering, embedding and machine learning tasks Machine Learning Graph Data using networkx #machinelearning #datascience #graphdata #networkx NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks Software for complex networks : Data structures for graphs, digraphs, and multigraphs Many standard graph algorithms Network structure and analysis measures Generators for classic graphs, random graphs, and synthetic networks Nodes can be "anything" (e.g., text, images, XML records) Edges can hold arbitrary data (e.g., weights, time-series) Open source 3-clause BSD license Well tested with over 90% code coverage Additional benefits from Python include fast prototyping, easy to teach, and multi-platform https://lnkd.in/gxVxbYm5

GitHub - networkx/networkx: Network Analysis in Python github.com
Like Comment
To view or add a comment, sign in
Aditi Sikarwar
3w
Report this post
A Python loop. 662 nanoseconds per iteration. Add two characters. Same loop. Same algorithm. 50–200× faster. That's @jit, and understanding why it works is a systems-level education. I break it down here , it covers: ▸ Why Python is structurally slow (not just "interpreted" it's the boxing, type dispatch, and GC pressure on every single loop iteration) ▸ What Numba actually is under the hood -a 5-stage compilation pipeline: Python bytecode → type inference → Numba IR → LLVM IR → machine code or CUDA PTX. The same backend Clang uses. ▸ A real benchmark breakdown -pure Python (662 ns) vs Numba (193 ns) vs built-in C (128 ns) and why Numba doesn't always win, and when it does win massively ▸ The HPC memory hierarchy explained - registers, L1/L2 cache, DRAM, PCIe, GPU HBM and why the most common GPU bottleneck isn't compute, it's the data transfer ▸ CUDA C++ vs pyCUDA vs Numba - a side-by-side comparison of when to use which, with no fluff ▸ The Monte Carlo Pi exercise - how adding @jit to a 1M-iteration loop gives 50–200× speedup, and why this is the sweet spot Numba was built for ▸ The core architectural insight: Python is a control plane, not a compute plane the same pattern behind PyTorch, TensorFlow, and JAX #Python #Numba #GPU #CUDA #HPC #DataScience #MachineLearning #ScientificComputing #PerformanceEngineering #NumPy #SoftwareEngineering

The Speed of Light in Python: How Numba Brings GPU Acceleration to Scientific Computing medium.com
Like Comment
To view or add a comment, sign in
Sadaqat Ali
2w
Report this post
Most LLM agents struggle with limited context windows and can’t handle large documents effectively. I built an agentic RAG assistant for large PDF Q&A that overcomes this by retrieving only the most relevant context from large PDFs before generating answers. ⚙️ Tech: Python, LangChain, OpenAI Embeddings, Qdrant 🔹 Features: Handles large PDFs via chunking + vector search Semantic retrieval for precise context Hallucination-resistant responses 🔗 GitHub: https://lnkd.in/gZd3wHgP #AI #RAG #LangChain #OpenAI

GitHub - DevSadaqat/pdf-knowledge-agent: An agentic Retrieval-Augmented Generation (RAG) assistant that answers questions from a large PDF by extracting document text, splitting it into semantic chunks, storing embeddings in a vector database, and generating grounded responses from retrieved context. github.com

4 Comments
Like Comment
To view or add a comment, sign in
Niels Cautaerts
4w
Report this post
My former colleague Hossein Ghorbanfekr and I recently wrote a book on GPU computing in Python. While many Python programmers, data scientists, and researchers rely on GPU acceleration through high-level frameworks like PyTorch, we noticed that few grasp what’s happening under the hood. Historically, low-level GPU programming was the domain of C/C++ developers, leaving Python users dependent on high-level libraries that wrap low-level code written by someone else. These days, tools like the Numba JIT compiler and the Numba-CUDA backend enable Python developers to write high-performance, low-level GPU code without switching languages. Our book, GPU-Accelerated Computing with Python 3 and CUDA, aims to make CUDA accessible to Python programmers who want to dig one level deeper or need more control over their GPU-accelerated code. We start with the fundamentals: writing and executing CUDA kernels, managing streams, profiling performance, and understanding memory hierarchies. Everything is taught through Python, using Numba-CUDA. We then try to connect these concepts to high-level libraries like CuPy and RAPIDS, which integrate seamlessly with the scientific Python ecosystem. We also included JAX as a flexible framework for differentiable programming and machine learning on GPUs and other accelerators. In the last third of the book, everything is combined to address practical applications, including solving the heat equation, detecting objects in images, simulating atomic interactions, and building + training a small transformer-based language model. This project took a lot of evenings, weekends, and holidays over 1.5 years, but we hope we managed to make something that will benefit other researchers, data scientists, and engineers. We’re grateful to Packt for the opportunity to bring this book to life. The e-book is available now on Amazon (https://a.co/d/03VXXelq), and the print version will be out in a few weeks. This is not an April fool's joke. #gpu #hpc #python #CUDA #numba #scientificcomputing #machinelearning #RAPIDS #cupy #JAX

GPU-Accelerated Computing with Python 3 and CUDA: From low-level kernels to real-world applications in scientific computing and machine learning amazon.com
Like Comment
To view or add a comment, sign in
Muhammad Usman
1w
Report this post
🚀 𝐈𝐦𝐩𝐨𝐫𝐭𝐢𝐧𝐠 & 𝐔𝐬𝐢𝐧𝐠 𝐏𝐲𝐭𝐡𝐨𝐧 𝐌𝐨𝐝𝐮𝐥𝐞𝐬 Another step forward in my Python learning journey 🐍 — exploring how to make code more efficient, reusable, and powerful using modules. 📚 𝐖𝐡𝐚𝐭 𝐈 𝐥𝐞𝐚𝐫𝐧𝐞𝐝: 📦 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐚 𝐌𝐨𝐝𝐮𝐥𝐞? • A file that contains functions, variables, and reusable code • Helps organize and simplify large programs ⚙️ 𝐈𝐦𝐩𝐨𝐫𝐭𝐢𝐧𝐠 𝐌𝐨𝐝𝐮𝐥𝐞𝐬 • import math → perform mathematical operations • from math import sqrt → import specific functions • Cleaner and more efficient coding 🧰 𝐂𝐨𝐦𝐦𝐨𝐧 𝐁𝐮𝐢𝐥𝐭-𝐢𝐧 𝐌𝐨𝐝𝐮𝐥𝐞𝐬 • math → calculations • random → random values • os → system operations 💡 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭: Using modules allows us to avoid rewriting code and build scalable, professional applications. 📈 Step by step, learning these concepts is helping me move from basic coding to real-world problem solving. #Python #Programming #DataScience #AI #Coding #LearningJourney #TechSkills
Like Comment
To view or add a comment, sign in
Gaddala Anjani
2w
Report this post
PYTHON SERIES GENERATORS 🔹 What are Generators? Generators are functions that return values one at a time using yield instead of returning all at once. 👉 In simple terms: Generate values on the fly, not all at once. 🔹 Why use Generators? ✔ Saves memory ✔ Works efficiently with large data ✔ Faster execution for big datasets. 🔹 Example: def count_up(n): for i in range(n): yield i for num in count_up(5): print(num) 🔹 Output: 0 1 2 3 4 🔹 Generator vs List: ✔ List → stores all values in memory ✔ Generator → produces values one by one. 🔹 Real-world example: Reading large files, streaming data, handling big datasets. 💡 Key Idea: Use generators when working with large data to improve performance #Python #Generators #Coding #Programming #LearnPython #Developer #SoftwareEngineering #100DaysOfCode #Tech
Like Comment
To view or add a comment, sign in

3,604 followers

View Profile Connect

Python Collaborative Filtering for Implicit Feedback Datasets

More from this author

Database phpMyAdmin

GDP dashboard 2012

Covid-19 Vaccines in Indonesia

Explore content categories

Python Collaborative Filtering for Implicit Feedback Datasets

More Relevant Posts

More from this author

Database phpMyAdmin

GDP dashboard 2012

Covid-19 Vaccines in Indonesia

Explore related topics

Explore content categories