Anton Bryzgalov’s Post

⛓️💥 A silent patch in Airflow metaclass breaks Python MRO. This is something I was not expecting as an Airflow plugin maintainer. Recently I received a pull request to my airflow-clickhouse-plugin GitHub repo. A contributor found out that some plugin functionality is unusable because of a strange error: >>> ClickHouseSQLExecuteQueryOperator(task_id='id', sql='SELECT 1') ❌ TypeError: Invalid arguments were passed to ClickHouseSQLExecuteQueryOperator (task_id: id). Invalid arguments were: **kwargs: {'sql': 'SELECT 1'} Like, whaaaat? 😱 The only purpose of this operator is to literally execute SQL. The class definition is: class ClickHouseSQLExecuteQueryOperator( ClickHouseBaseDbApiOperator, sql.SQLExecuteQueryOperator, ): pass __init__ method is not defined in ClickHouseBaseDbApiOperator. So, it was pretty expected that it should call __init__ of SQLExecuteQueryOperator which has that sql parameter! It turned out that the reason was in BaseOperatorMeta. A metaclass that is used for every Airflow operator class to be created. Because of a dirty way to check “Does this class override __init__?” it was injecting an unexpected __init__ into the chain. Funny thing, this issue can be resolved as simply as adding a boilerplate __init__ method calling super().__init__(**kwargs). I have shared details of this investigation in my Medium post. What a sneaky bug it was! See it in the first comment. 💭 Any sneaky bugs you remember? #Airflow #Python #PythonMRO

1 Comment

Anton Bryzgalov 2w

For the detailed investigation report and how the fix works, see my article on Medium: https://medium.com/@bryzgaloff/how-airflows-metaclass-silently-breaks-python-mro-9f3d1c8629f8 I will really appreciate your claps there! 👏❤️

To view or add a comment, sign in

More Relevant Posts

Rudransh mishra
2w
Report this post
Today I learned about lambda functions in Python A lambda is just a small, anonymous one-liner function — no name, no `return`, just pure logic. Basic syntax: ``` lambda arguments: expression ``` Instead of writing: ```python def add(a, b): return a + b ``` You can write: ```python add = lambda a, b: a + b ``` But the real power shows up when you pair it with `map()`, `filter()`, and `sorted()`: ```python # Double every number list(map(lambda x: x * 2, [1, 2, 3, 4])) # → [2, 4, 6, 8] # Keep only even numbers list(filter(lambda x: x % 2 == 0, [1, 2, 3, 4, 5])) # → [2, 4] # Sort by second element sorted([(1,3),(2,1),(4,2)], key=lambda x: x[1]) # → [(2, 1), (4, 2), (1, 3)] ``` Key rule I'll remember: Use lambda when the logic is small and used once Avoid it when the logic gets complex — just write a proper `def` Small concept, but it shows up everywhere in Python backend code. #Python #Backend #LearningInPublic #100DaysOfCode #Django
Like Comment
To view or add a comment, sign in
Mustafa Abdelrahim Hassan
1w Edited
Report this post
🐍 Python List Operations – The Only Cheat Sheet You'll Need Master lists with these 25+ essential operations: 🔍 Accessing & Finding • list[i] → Get single item by index • list[start:end] → Get multiple items (slicing) • a, b, c = list → Unpack all items into variables • list.index(x) → Find position of first item with value x • x in list → Check if value x exists (True/False) 📊 Analyzing & Counting • len(list) → Total number of items • list.count(x) → Count how many times value x appears • max(list) / min(list) → Find highest/lowest values ✏️ Modifying Lists • list.append(x) → Add item x to the end • list.insert(i, x) → Insert item x at index i • list.extend(other_list) → Add items from another list • list[index] = new_value → Change item at specific index 🗑️ Removing Items • list.pop(i) → Remove and return item at index i (default last) • list.remove(x) → Remove first occurrence of value x • list.clear() → Remove all items 🔄 Sorting & Copying • list.sort() → Sort list in place (ascending) • list.reverse() → Flip order in place • new_list = sorted(list) → Get sorted copy • copy_list = list.copy() → Create a shallow copy ⚙️ Iteration & Processing • enumerate(list) → Iterate with index and value • [fn(x) for x in list if condition] → List comprehension (filter + transform in one line) • zip(list_a, list_b) → Pair items from two lists 💡 Pro tip: List comprehension is the most elegant Python feature. Master it and you'll write cleaner, faster code. #Python #PythonLists #CodingCheatSheet #DataStructures #LearnPython
Like Comment
To view or add a comment, sign in
Rajinder Verma
3w
Report this post
I had a Python UDF that was slow. Everyone told me to switch to a Pandas UDF. I switched. It got faster. I didn't stop there, which is where this gets interesting. I spent a weekend benchmarking the Arrow serialization overhead across different schema widths and batch sizes because I wanted to actually understand what I was paying for. Here is what I found. On a narrow schema, 4 columns, a Pandas UDF with default batch size of 10,000 records was 6.2x faster than the Python UDF. The serialization cost was trivial relative to the computation savings. On a wide schema, 180 columns, the Pandas UDF at default batch size was 2.1x faster. Still better. But the Arrow conversion was now a meaningful fraction of total execution time because converting 180 columns per batch is not free. When I dropped the batch size on the wide schema to 2,000 records, peak memory per conversion dropped and the job stopped spilling to disk on the executor with the largest partition. Total job time: 1.7x faster than the wide-schema default. A 23% improvement just from tuning spark.sql.execution.arrow.maxRecordsPerBatch. The configuration nobody sets: spark.sql.execution.arrow.pyspark.enabled=true. This is separate from Pandas UDFs. It accelerates toPandas() and createDataFrame() globally. Every time you collect to pandas interactively, you are either paying the Arrow overhead or the row-by-row serialization overhead. Arrow is always cheaper. It is not on by default in all environments. I set that flag. I set it in every cluster config I control. I set it so reflexively now that I had to think to remember whether it was a default or a choice. The point is not to memorize my numbers. Your schema is different. My point is that I ran the experiment and found a 23% improvement by changing one integer. You have not run the experiment. Run it. The number is different for your schema. Find it.
Like Comment
To view or add a comment, sign in
Rohit Kumar
3w
Report this post
✅ *Python Basics: Part-4* *Functions in Python* 🧩⚙️ 🎯 *What is a Function?* A function is a reusable block of code that performs a specific task. 🔹 *1. Defining a Function* Use the `def` keyword: ```python def greet(name): print(f"Hello, {name}!") ``` 🔹 *2. Calling a Function* ```python greet("Alice") ``` 🔹 *3. Return Statement* Functions can return a value using `return`: ```python def add(a, b): return a + b result = add(3, 5) print(result) # Output: 8 ``` 🔹 *4. Default Parameters* You can set default values for parameters: ```python def greet(name="Guest"): print(f"Hello, {name}") ``` 🔹 *5. Keyword Arguments* Arguments can be passed by name: ```python def info(name, age): print(f"{name} is {age} years old") info(age=25, name="Bob") ``` 🔹 *6. Variable-Length Arguments* - `*args`: Multiple positional args - `**kwargs`: Multiple keyword args ```python def show_args(*args): print(args) def show_kwargs(**kwargs): print(kwargs) ``` 💬 *Double Tap ❤️ for Part-5!*
Like Comment
To view or add a comment, sign in
Lucas Pereira
3w
Report this post
OrJSON looks like a small optimization. Until you realize how much time your API spends just serializing JSON. In many Python APIs, the bottleneck isn’t only the database or the LLM. Sometimes it’s the most invisible step: turning Python objects into JSON. What is OrJSON? A high-performance JSON library for Python, written in Rust. It replaces the default json module and focuses on one thing: speed. It: → serializes faster → deserializes faster → supports dataclass, datetime, numpy, UUID out of the box → returns bytes instead of str So what’s happening under the hood? The idea is simple: optimize the hottest path in your API. → less overhead per operation → less work per payload → faster UTF-8 writing And it shows. In its own benchmarks: → dumps() can be ~10x faster than json → loads() can be ~2x faster Where this actually matters: → large payloads → APIs returning a lot of JSON → RAG metadata, events, telemetry → long lists Now the part most people ignore: Trade-offs. → orjson.dumps() returns bytes, not str → no built-in file read/write helpers → not always a perfect drop-in replacement → holds the GIL during serialization So when should you use it? → large responses → heavy metadata → serialization shows up in profiling And when won’t it help? → DB is your bottleneck → LLM latency dominates → responses are small → network / I/O dominates OrJSON won’t magically make your API fast. But if serialization is on your hot path, it’s one of the highest ROI optimizations you can make.
Like Comment
To view or add a comment, sign in
Rajveer Rathod
3w
Report this post
Been building CallFlow Tracer, an open-source Python library that traces function call flows and visualizes them as interactive graphs. Just shipped v0.4.1 with some major improvements: Fixed critical security vulnerabilities (command injection, code injection in the old extension) Added Content Security Policy headers to all webviews What CallFlow Tracer does in this new version: OpenTelemetry export for production observability (Jaeger, OTLP) SLA/SLO tracking with error budgets and canary analysis Framework integrations: FastAPI, Flask, Django, SQLAlchemy Fixed imports and made it more modular and extensible. Would love feedback from anyone working on observability, profiling, or developer tooling. below is the link https://lnkd.in/drUQspvv #Python #OpenSource #DeveloperTools #Observability #OpenTelemetry #VSCode #TypeScript #SoftwareEngineering

Client Challenge pypi.org
Like Comment
To view or add a comment, sign in
Real Python

206,520 followers
1w
Report this post
🐍📰 pandas GroupBy: Your Guide to Grouping Data in Python In this tutorial, you'll learn to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, and summarize data with real-world datasets. #python

pandas GroupBy: Your Guide to Grouping Data in Python realpython.com
Like Comment
To view or add a comment, sign in
Mayar AL Khalili
3w
Report this post
Python has easy ways to make your text, numbers, and dates look clear and professional. Here are 4 tricks you should try: 1. f-Strings :The easiest way to put variables straight into your text. Fast, readable, perfect for quick outputs. name = "Mayar" age = 22 print(f"My name is {name} and I am {age} years old.") 2. Alignment & Width :Keep tables, reports, or lists neat by aligning text and numbers. Left, center, or right—your choice! print("{:<10} | {:^10} | {:>10}".format("Name", "Age", "Score")) print("{:<10} | {:^10} | {:>10}".format("Mayar", 22, 95)) 3. Template Strings :Create reusable text templates and fill in values later. Makes your code cleaner and easier to manage. from string import Template t = Template("Hello $name, your score is $score.") print(t.substitute(name="Mayar", score=95)) 4. Date & Time Formatting :Show dates and times in a clear, readable way. Useful for reports, logs, or messages. from datetime import datetime now = datetime.now() print(f"Date: {now:%d-%m-%Y} Time: {now:%H:%M:%S}") CodeAcademy_om Kulsoom Shoukat Ali Sultan AL-Yahyai #Python #Coding #PythonTips #Developer #LearnPython #TechSkills #CodeBetter #DateTime
Like Comment
To view or add a comment, sign in
KARTHIKEYAN M
1w
Report this post
You might want to take a look at this, Everybody says NumPy is faster than Python List. But, How fast ?? Well I looked into it !! So, here is the overhead of every value you use in Python. Let's say you use a value 42. Here is the detailed overhead. ob_refcnt - 8 bytes (For garbage collection, if reference count is zero, then python just deletes it from RAM) ob_type - 8 bytes (For storing what datatype that value belongs to, here that's integer) ob_digit - 8 bytes (For the actual value - 42). Therefore, 24 bytes for each value. Let's take it a step further. Say you have a 4 values to store just [1, 2, 3, 4] and Let's compare Python list vs NumPy array. Python List : Stores Pointers not the actual values and Hence, you need a list of pointers first, each pointer in the "pointer list" points to the actual value that is scattered across different locations of RAM. So, in-order to store 4 elements. 4 x 8 = 32 (pointer list) 4 x 24 = 96 (actual values) Therefore, 32 + 96 = 128 Bytes. NumPy arrays : It's contiguous and also homogeneous. Also, we don't have pointers model. Here we store actual values next to each other. Thus, giving us a easy traversal using strides. 4 x 8 = 32 Bytes. NumPy can store raw values directly because it enforces a single dtype. Since every element is the same size, it can locate any element using simple math (base + index × itemsize) instead of pointers. Python lists allow mixed types and that's exactly what forces the pointer model. Note: I am only comparing the storage models here. Both Python lists and NumPy arrays have their own object overhead which I've intentionally left out to keep the comparison clean. Apart from storage models. There is another reason why NumPy is so powerful in numerical computations and that is vectorization Vectorization in NumPy : When you do np.sum(a), NumPy runs optimized C code across the entire array in one shot no Python interpreter involved. A Python loop hits interpreter overhead on every single element. That's the real reason NumPy can be 10-100x faster for numerical operations. There is reason why this guy is named as "Numerical Python" !!
Like Comment
To view or add a comment, sign in
Vaishnavi Mutagi
3w
Report this post
Day 30 of #60DaysOfMiniProjects Today I stepped into the world of Web Scraping using Python I built a Webpage Analyzer that extracts and summarizes key details from any website using Python. At first glance, it looks simple—but this project helped me understand how real-world data extraction works behind the scenes. What this project does: • Takes a website URL as input • Fetches webpage content using requests • Parses HTML using BeautifulSoup • Extracts important insights from the page What it analyzes: • Webpage title • Total number of links • Total number of images • Total number of paragraphs Concepts I worked with: • Web scraping fundamentals • HTTP requests handling • HTML parsing • DOM structure understanding • Exception handling in Python This project gave me a clear idea of how websites are structured and how data can be programmatically extracted and analyzed. Next step: Building a more advanced scraper with filtering + data storage Learning step by step. Building consistently. Improving every day. #Python #WebScraping #BeautifulSoup #Requests #MiniProjects #BuildInPublic #CodingJourney #DeveloperGrowth #LearningInPublic #100DaysOfCode

4 Comments
Like Comment
To view or add a comment, sign in

1,632 followers

42 Posts

View Profile Follow

Anton Bryzgalov’s Post

More Relevant Posts

Explore content categories