Web Scraping with Python: Get Started with Requests and BeautifulSoup

4d Edited

Here's how to get started with web scraping. This is one of the more fun things to do with Python You'll need two libraries: -> requests — to fetch the webpage -> BeautifulSoup — to parse and extract the data Install them first: ``` pip install requests beautifulsoup4 ``` --- Now let's say you're a fitness freak (guilty 🙋🏻♂️) and want to scrape the titles of articles from a fitness blog. Here's what that looks like: ```python import requests from bs4 import BeautifulSoup url = https://example-fitness-blog dot com" response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") titles = soup.find_all("h2", class_="post-title") for title in titles: print(title.text.strip()) ``` Breaking it down: -> requests.get() fetches the raw HTML of the page -> BeautifulSoup parses that HTML so you can navigate it like a tree -> find_all() searches for every element matching your tag and class -> .text.strip() grabs the text content and removes extra whitespace (gotta make it look pretty, right?) --- A few things to keep in mind before you start scraping: 1. Check the site's robots.txt file because some sites explicitly forbid scraping. 2. Don't hammer a server with rapid requests. Instead, add a small delay between them 3. Some sites load content with JavaScript, which requests can't handle. For those you'll need Selenium or Playwright Web scraping is one of the more fun things you can do with Python. You write 10 lines of code and suddenly you can pull data from almost anywhere on the internet. Have you ever used web scraping for a project? Let me know what you built 👇🏻👇🏻👇🏻 #python #webdevelopment #softwaredeveloper #webscraping

1 Comment

William Arrington 4d

That would be so much more concise in perl

To view or add a comment, sign in

More Relevant Posts

Paul Crinigan
3w
Report this post
I ran some tests about how to make tools that claude code could call locally, so in the first tests I gave it a choice: a python script I wrote, or its own internal web search/fetch tool, or a python script using a local offline filesystem. The filesystem is the obvious choice, so it always picks it, so we skip to test 3 where I made the local filesystem script require 3 calls to get the information. Each call, searched a massive block of information, about 100 times as much content as a direct web call would pull, including useless html tags, css etc. I was shocked when claude picked the local script when it knew it needs to call it 3 times to do what 1 web call would do. At 4 local calls, claude picked the web tool (finally). I did 10 sessions testing this, it preferred 3 local calls 8 times to 1 web search 8 times, and the other 2 times it picked the web tool once it hit 3 (so preferred 2 local). This might sound boring lol, it was, but I made a local embeddings server, it does take 3 tries, it takes 1, and you can search it and pull tiny blocks of info, no html, no css, not even the entire text of a web page, so the context doesn't get full of useless characters which was very nice obviously, but once claude used it, it told me it liked the tool. Big deal right? well I didn't ask it what it thought, it was just all like "this tool is very effective, it takes 0.051 seconds to find what I want to know". So I told it to "play around with it, invent random queries and see how much data you can find", it called it 150 times in the next turn, 150 is nuts. It started telling me all types of things that it found, with invented (like not going to be relevant) queries. So that was already good enough, huge all by itself. But I couldn't stop there, so I'm now up to multi layered parallel searches in 1 call. AI already has the reasoning power of a human (more in some cases). What it's missing is a complex shared memory structure between processes, like our brains have. If you think of an AI call not being on human level, it means you don't realize how many "turns" we process per second. Should we make a program that does all the types of thinking we do at once and merged them together, with a shared memory, you'd see claude can already do it. I've written the hardest uses for this that I can without some kind of specific project. I'm out of hard things I can build for my own company so if anyone has a really complex (either large data, vague/broad searches, many databases, etc) I'm looking to find something larger scale to work on. What are you working on? Any new AI memory ideas, let's hear them below.

2 Comments
Like Comment
To view or add a comment, sign in
Mustaqeem Siddiqui
1w
Report this post
Python Series – Day 24: Web Scraping (Collect Data from Websites!) Yesterday, we learned Data Visualization📊 Today, let’s learn how to collect data automatically from websites using Python: 👉 Web Scraping 🧠 What is Web Scraping? 👉 Web Scraping means extracting data from websites using code. Instead of copying data manually, Python can collect it automatically. 📌 Example Uses: ✔️ Product prices ✔️ News headlines ✔️ Job listings ✔️ Reviews & ratings ✔️ Stock / Sports data Why It Matters? Imagine collecting 1000 product names manually 😵 Python can do it in seconds ⚡ 💻 Popular Libraries for Web Scraping ✔️ `requests` → Get webpage HTML ✔️ `BeautifulSoup` → Read & extract data ✔️ `pandas` → Save data in table format 💻 Example: Get Website Title import requests from bs4 import BeautifulSoup url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") print(soup.title.text) 🔍 Output: Example Domain 💻 Example: Get All Headings for h1 in soup.find_all("h1"): print(h1.text) 🎯 Why Web Scraping is Important? ✔️ Saves time ✔️ Collects large data fast ✔️ Used in Data Science projects ✔️ Useful for market research ⚠️ Pro Tip 👉 Always respect website rules (`robots.txt`) and terms of use. 🔥 One-Line Summary 👉 Web Scraping = Automatically collecting website data using Python 📌 Tomorrow: APIs in Python (Get Live Data Easily!) Follow me to master Python step-by-step 🚀 #Python #WebScraping #BeautifulSoup #DataScience #Automation #Coding #Programming #LearnPython #MustaqeemSiddiqui
Like Comment
To view or add a comment, sign in
Thrinod K R
1w
Report this post
🚀 Scrapling: A Game-Changer in Web Scraping I explored D4Vinci/Scrapling and it stands out as a modern, adaptive web scraping framework built for real-world use cases. 💡 Why it matters: 🧠 Auto-adapts to website structure changes 🕷️ Supports static + dynamic + anti-bot pages ⚡ Built for scalable crawling 🤖 AI-ready for RAG and agent workflows 🔥 It bridges traditional scraping with modern AI data pipelines. https://lnkd.in/gpzAZNP8 #WebScraping #AI #Python #Automation #DataEngineering #OpenSource

GitHub - D4Vinci/Scrapling: 🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl! github.com
Like Comment
To view or add a comment, sign in
Dewi Anggraini

Early-Career Software Developer | JavaScript, Ruby on Rails, Python Essential | Healthcare-to-Tech-Career-Pivot | Building Full Stack Solutions
1mo
Report this post
Part 1. 5 Challenges in Web Scraping for Beginners (My First Web Scraping Experience). My first experience with web scraping was chaotic. Simply put, web scraping involves accessing a website’s underlying structure and extracting data from it. When I tried it for the first time, I thought it would be simple because I already knew basic HTML, but it turned out to be more confusing than I expected. I struggled to find the right HTML tags. I didn’t know where to look, it took me quite a while, and I ended up opening tags one by one until I finally found what I was looking for. Here are 5 challenges I encountered during my first web scraping experience that taught me a lot: > Complex HTML structures > Dynamic content loading > Pagination > Collecting duplicate data by accident > Large volume of data At first, these problems felt overwhelming. But after exploring more, I found some simple ways to deal with them. I’ll share those in the next post. #WebScraping #Python #HTML #Programming #LearningExperience #CodingChallenges
Like Comment
To view or add a comment, sign in
Faizan khan
2w
Report this post
🐍 Simplifying Web Data Extraction with BeautifulSoup Recently, I explored how to use BeautifulSoup to quickly extract structured data from websites—and it’s one of the easiest ways to get started with web scraping. Here’s a simple approach: 🔹 Send a request to a webpage using Python 🔹 Parse the HTML content using BeautifulSoup 🔹 Locate elements (tags, classes, IDs) 🔹 Extract useful data (text, links, prices, etc.) 🛠 Tools Used: • Python • BeautifulSoup • Requests library 💡 Key Takeaway: With just a few lines of code, you can turn unstructured web pages into usable datasets—perfect for building data-driven apps, research tools, or automation workflows. ⚠️ Always respect website terms and use scraping responsibly. A great starting point for anyone getting into data extraction and automation. #Python #WebScraping #BeautifulSoup #DataEngineering #Automation #OpenSource
Like Comment
To view or add a comment, sign in
Mahadev Yelure
1w Edited
Report this post
🚀 𝐃𝐚𝐲 𝟐: 𝐌𝐚𝐬𝐭𝐞𝐫𝐞𝐝 𝐑𝐨𝐮𝐭𝐢𝐧𝐠 & 𝐏𝐚𝐭𝐡 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐢𝐧 𝐅𝐚𝐬𝐭𝐀𝐏𝐈! ⚡ The journey into FastAPI continues! Today was all about how we handle data directly within the URL. Coming from a Django background, I’m loving how clean and intuitive the routing feels here. 𝙃𝙚𝙧𝙚’𝙨 𝙬𝙝𝙖𝙩 𝙄 𝙩𝙖𝙘𝙠𝙡𝙚𝙙 𝙩𝙤𝙙𝙖𝙮 : 📍 𝙋𝙖𝙩𝙝 𝙋𝙖𝙧𝙖𝙢𝙚𝙩𝙚𝙧𝙨 & 𝙃𝙏𝙏𝙋 𝙈𝙚𝙩𝙝𝙤𝙙𝙨 : I explored how to capture dynamic values from the URL using {curly_brackets} and how they interact with standard HTTP methods like GET and POST. 🔢 𝙋𝙖𝙩𝙝 𝙋𝙖𝙧𝙖𝙢𝙚𝙩𝙚𝙧𝙨 𝙬𝙞𝙩𝙝 𝙏𝙮𝙥𝙚𝙨 : This is a game-changer! By using Python type hints (like : int or : str), FastAPI automatically handles: 𝗗𝗮𝘁𝗮 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻:It returns a clear error if the wrong type is sent. 𝐃𝐚𝐭𝐚 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐢𝐨𝐧: It automatically converts the URL string into the correct Python type. 🔄 𝘿𝙤𝙚𝙨 𝙊𝙧𝙙𝙚𝙧 𝙈𝙖𝙩𝙩𝙚𝙧? (𝙋𝙖𝙩𝙝 𝙋𝙖𝙧𝙖𝙢𝙚𝙩𝙚𝙧 𝙊𝙧𝙙𝙚𝙧𝙨) : I learned that in FastAPI, the order of your route functions matters. If you have a static path like /users/me and a dynamic path like /users/{user_id}, the static one must come first to avoid being "caught" by the dynamic parameter! 📋 𝙋𝙧𝙚𝙙𝙚𝙛𝙞𝙣𝙚𝙙 𝙑𝙖𝙡𝙪𝙚𝙨 : Using Python’s Enum, I learned how to restrict a path parameter to a specific set of valid options. This makes APIs incredibly robust and self-documenting. 🛠️ 𝙋𝙖𝙩𝙝 𝘾𝙤𝙣𝙫𝙚𝙧𝙩𝙚𝙧𝙨 : I dived into using :𝗽𝗮𝘁𝗵 to capture entire file paths (like files/images/photo.jpg) within a single parameter. 𝐂𝐮𝐫𝐫𝐞𝐧𝐭 𝐒𝐭𝐚𝐭𝐮𝐬:Feeling more confident with every line of code. The way FastAPI handles documentation and validation simultaneously is a massive productivity boost! 🛠️💻 #FastAPI #Python #BackendDevelopment #WebAPI #LearningJourney #Coding #SoftwareEngineering #PythonDeveloper #Day2
Like Comment
To view or add a comment, sign in
Visnu S
1w
Report this post
Stop babysitting broken Python web scrapers! If you've ever spent hours perfecting an XPath only for the website to change a single <div> and break your whole pipeline, you know the pain. As someone constantly self-learning new tools in the Python and data science ecosystem, dealing with brittle data pipelines is a frustration I know too well. That's why I'm incredibly excited about Scrapling. In my latest Medium post, I dive deep into why this massive open-source update is dethroning BeautifulSoup and legacy tools. Here is why it is the ultimate toolkit for modern data extraction: ▪️Adaptive Parsing: It saves an "element fingerprint." If the website layout changes next month, the scraper dynamically heals itself and finds your data anyway. ▪️Built-in Stealth: Bypasses Cloudflare and anti-bot systems right out of the box without massive proxy setups. ▪️Blazing Fast: Up to 240x faster text extraction than BeautifulSoup, complete with a brand new Async Spider Framework. ▪️AI-Ready: Integrated Model Context Protocol (MCP) to feed clean, structured data straight to your LLMs while saving on token costs. If you are building ML datasets, training models, or just need reliable web data without the endless hotfixes, this library is an absolute game-changer. 📖 Read my full breakdown and see the code snippets here: https://lnkd.in/gkEfHzuG 👇 Let's normalize the trauma: Have you ever had a scraper catastrophically break in production? Tell me your worst data extraction nightmare in the comments! #Python #WebScraping #DataScience #MachineLearning #OpenSource #DeveloperLife #DataEngineering #Scrapling #TechNews

Stop Babysitting Broken Selectors: Why ‘Scrapling’ is the New King of Python Web Scraping medium.com
Like Comment
To view or add a comment, sign in
Ricky Farmer
1w
Report this post
The Python Scraping Templates just crossed 40 downloads. Ten production scripts. $47. People are using them to build lead lists in under 2 hours instead of 2 days. I built them because I kept rewriting the same BeautifulSoup logic. Parser. Error handling. Rate limiting. CSV export. Same work, different websites. So I documented the patterns and shipped them as templates instead. A freelancer ran one of the scripts against a construction directory yesterday and pulled 183 leads before lunch. Another person used the email extractor to build a prospect list for cold outreach. Neither had to understand web scraping—they just pointed the script at a URL and got clean data. This is zero marginal cost. I spent the hours once. Now 40 people avoid spending 40 hours each. The scripts live on Gumroad at $47. Which websites are you scraping manually right now that shouldn't be?
Like Comment
To view or add a comment, sign in
HackNow Python Cafe

166 followers
1mo
Report this post
📘 #𝗣𝘆𝘁𝗵𝗼𝗻 𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼 𝗕𝗮𝘀𝗲𝗱 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 | 𝗥𝗲𝗮𝗹 𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 | 𝗚𝗼𝗼𝗴𝗹𝗲 | 𝗔𝗺𝗮𝘇𝗼𝗻 | 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁-𝗣𝗮𝗿𝘁 𝗜 Python interviews don’t test syntax alone. They test how you reason through real‑world code. Here are 10 real Python scenarios that interviewers love to ask 👇 👉 The pass Statement — An empty function and an empty class both contain pass. Why is it necessary, and what happens if you omit it? 👉 List Comprehension One‑Liner — Given [2, 33, 222, 14, 25], subtract 1 from every element in a single line. How would you write it? 👉 Flask vs Django — Your team is building a lightweight microservice. Why would you choose Flask over Django? 👉 Callable Objects — What does it mean for an object to be “callable”? Give examples beyond just functions. 👉 List Deduplication Preserving Order — [1,2,3,4,4,6,7,3,4,5,2,7] → produce unique values in order. One‑liner? 👉 Function Attributes — Attach a custom attribute to a function and access it later. Why would this be useful? 👉 Bitwise XOR on Strings — Perform XOR on two binary strings of equal length (without using ^ directly on strings). Write the logic. 👉 Statements vs Expressions — Is if a statement or an expression? Can you assign it to a variable? Explain with examples. 👉 Python Introspection — How can you inspect an object’s attributes and methods at runtime? Name at least three built‑in tools. 👉 List Comprehension with Condition — Generate all odd numbers between 0 and 100 inclusive in one line. 😥 “I knew the syntax… but I couldn’t explain why it works that way” — sound familiar? 𝗧𝗵𝗮𝘁 𝗴𝗮𝗽 𝗶𝘀 𝘄𝗵𝗲𝗿𝗲 𝗛𝗮𝗰𝗸𝗡𝗼𝘄 𝗣𝘆𝘁𝗵𝗼𝗻 𝗰𝗮𝗳𝗲 𝗳𝗼𝗰𝘂𝘀𝗲𝘀. We train scenario thinking, not memorization. 💬 𝗪𝗵𝗶𝗰𝗵 𝗼𝗳 𝘁𝗵𝗲𝘀𝗲 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗲𝗱 𝘆𝗼𝘂 𝗺𝗼𝘀𝘁 𝘄𝗵𝗲𝗻 𝘆𝗼𝘂 𝗳𝗶𝗿𝘀𝘁 𝗲𝗻𝗰𝗼𝘂𝗻𝘁𝗲𝗿𝗲𝗱 𝗶𝘁? --------------------------------------------------------------------------------- 𝗙𝗿𝗼𝗺 𝗡𝗼𝘁𝗵𝗶𝗻𝗴 ▶️ 𝗧𝗼 𝗡𝗼𝘄 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗝𝗼𝗯 𝗿𝗲𝗮𝗱𝘆 𝗣𝘆𝘁𝗵𝗼𝗻 𝗣𝗿𝗼𝗳𝗲𝘀𝘀𝗶𝗼𝗻𝗮𝗹𝘀 ...✈️ ---------------------------------------------------------------------------------
Like Comment
To view or add a comment, sign in

694 followers

5 Posts

View Profile Connect

Web Scraping with Python: Get Started with Requests and BeautifulSoup

More Relevant Posts

Explore content categories