Name: Python Web Scraping with Pandas and BeautifulSoup | Shashank Singh posted on the topic | LinkedIn
Uploaded: 2026-04-10T18:27:10.149Z
Duration: 6 min 55 s
Channel: Shashank Singh

Shashank Singh

🚀 Task 1 Completed: Web Scraping using Python I’m excited to share my first step in the Data Analytics journey — extracting real-world data directly from the web! 🌐 🎥 In this video, I explained my Python code for web scraping where I collected country population data from a public webpage. 🔍 What this project covers: ✔ Fetching webpage data using Python ✔ Extracting HTML tables efficiently ✔ Understanding website structure ✔ Converting raw data into a structured dataset 🛠 Tools Used: Python 🐍 Pandas Requests BeautifulSoup 💡 Key Learning: Web scraping is a powerful skill that allows us to collect real-world data, which is the foundation of any data analysis project. 📊 This dataset will be further used for data cleaning, analysis, and visualization in the next steps. 👉 Check out the video to see how I transformed raw web data into a usable dataset! #WebScraping #Python #DataAnalytics #Pandas #DataScience #Projects #LearningJourney #LinkedInLearning

Transcript

Myself so something. And today we discuss about our Task One web scrapping project. So basically as a part of my data analytics learning journey, I work on extracting real-world data from web using Python. The main objective is that to collect the population data of countries from a public web page and convert it into a structured data set for analysis. We discuss about the tools and technologies which we use. The first is Python programming language for coding and 2nd is beautiful SOAP for parsing the HTML. 3rd is requests for fetching web data. And the fourth one is pandas for data handling. Now we discuss what I did so extract data from a Wikipedia page. Stable structure to locate the required table, collected country wise population and percentage data. Converted raw data into a structural data frame and export the data set into a CSV file. OK, now we discuss about the our. Project code so. You can see here this is our. Coding. Now we're just now we discuss about our steps, how report. So first of all, we import the libraries. The first library is beautiful so which is used to parse the HTML content. The second one is import request used to fetch the website data and third one is pandas which is used to store and process data. And this is a URL which you can see. OK, so this is a URL for which website you want to scrap. I scrapped the Wikipedia so you can clearly see that the Wikipedia list of countries dependency by populations. And if you want to scrap any other website so you paste the link here OK second one. The second thing is send request to website. Our second task is send request to website. So a request is send. Request is sent to a Wikipedia page containing population data. Here you can see the header. So header added to mimic a real browser and avoid blocking the parse HTML content. The steel content of the web page is converted into a structure format using Beautiful Soup Parse HTML. And second one is to find our required table fine table. OK the code search for a table with class Vegetable. The table contain country population data. So and then we extract rows. Rose is equal to table. Find all gas, extract the. Almost R collected. Then we escape first row. The first row is escape because it contains the header extract data from the columns. Each row contains the multiple columns, you know KD and the code extract country name. Population. And. Wasn't this OK? You can see four win row one escape here we skip the header file. And the column is equal to rho fonty which he mentions. If length of column. Is greater than or equal to five. Then we print country as equal to column one. Population is equal to column two and percent is equal to column three. Steps OK and now then data dot append country population percentage. Now we create the data frame. And F is equal to period dot data frame data columns. We convert it into a data frame. Because the list is converted into a pandas data frame, columns are labeled properly. OK, then we export as a CSV file. Container population dot CSV This file can be used for analysis, further analysis, cleaning, EDA visualization system and that's the final steps when code runs successfully. So if it print the file created successfully and also print the. Rose length. What is the length of our code? OK, so we run our code. You can see. Take your time. OK, so here you can see it print the file created successfully and the length of row is equal to 240. Now we see our task. Yeah, this is the old file and this is a new file which we run. So. Quick process and here you can see. OK. Here you can see easily. It went. Continuum. Country populations and the percentage. When we talk about the. It's a messy data and it contained 241 rows. The first row contains the header files, which is columns you see. So. Umm in this project. I use Python And beautiful SOAP to scrap country population data from Wikipedia. I felt the web page using request, parse the HTML, extract relevant table data and convert it into a structured data set using pandas. Finally, export the data into a CSV file which can be used for further analysis and visualizations. Here we use the 2 codes OK. Plus issue. And. This oil switch you can see and this is another file which got the population percentage. So. Thank you everyone. We will discuss the next task in the next video. Thank you.

To view or add a comment, sign in

More Relevant Posts

Abhinay Mulinti
1w Edited
Report this post
Project Overview: - This project focuses on extracting data from websites using Web scraping method to converting unstructured data into a structured format for analysis. Key Features: - Web scraping using Python libraries - Data extraction and cleaning - Structured data storage (CSV/EXCEL) - Handling missing data - Analysis on data Technologies Used: - Python - BeautifulSoup / Requests - Pandas 🎯 What I Learned: • Handling real-world data from websites • Parsing HTML and extracting useful information • Data cleaning and transformation • Improving problem-solving skills in automation #innomatics #EDA-webscraping-project #Python #Datascience #webscrapping link:- https://lnkd.in/gDmqZTJf
Like Comment
To view or add a comment, sign in
Harmanpreet Singh
3w
Report this post
🚀 Last month, I built and published my first Python package — Pristinizer I wanted to solve a simple but real problem in data science: 👉 Cleaning and understanding raw datasets takes way too much time. So I built Pristinizer, a lightweight Python package that helps streamline data cleaning + EDA in just a few lines of code. 🔍 What Pristinizer does: • Cleans messy datasets (duplicates, missing values, column formatting) • Generates structured dataset summaries • Visualizes missing data (heatmap, matrix, bar chart) ⚙️ Tech Stack: Python • pandas • matplotlib • seaborn 📦 Try it out: >> pip install pristinizer >> import pristinizer as ps df = ps.clean(df) ps.summarize(df) ps.missing_heatmap(df) 🧠 What I learned while building this: • Designing a clean and intuitive API • Structuring a real-world Python package • Publishing to PyPI • Writing proper documentation for users 📌 Next, I’m planning to add: • Outlier detection • Automated preprocessing pipelines • Advanced EDA reports Would love to hear your thoughts or feedback! #Python #DataScience #MachineLearning #OpenSource #Pandas #EDA #Projects
Like Comment
To view or add a comment, sign in
Preethi Ravula
2w
Report this post
Most analysts know SQL. Most analysts know Python. Very few know how to combine them efficiently. That’s why many stay average. Here are a few things I wish I learned earlier: In SQL: → WHERE cannot filter aggregated results If you're filtering grouped data, use HAVING. → Window functions save messy subqueries Use RANK(), ROW_NUMBER(), SUM() OVER() for ranking and running totals. → LAG() and LEAD() beat self-joins Comparing current vs previous period? One line does what multiple joins often can’t. In Python: → Do not load unnecessary data Filter in SQL before bringing it into pandas. → Avoid for loops in pandas Vectorized operations and apply functions are significantly faster. → Stop hardcoding dates Use datetime so your scripts stay dynamic and reusable. The real power comes when you combine both: → Pull data with SQL → Transform it in Python → Push results back with to_sql() That workflow alone will make you more efficient than most analysts around you. Knowing SQL or Python is useful. Knowing how to use both together is what separates strong analysts from average ones. #DataAnalytics #SQL #Python #AnalyticsEngineering #CareerGrowth

4 Comments
Like Comment
To view or add a comment, sign in
Sathiyanarayanan A
2w
Report this post
Week 14(notes) Python Pandas Essentials for Data Analysis ✨ 🐍 Python + Pandas = Powerful Data Analysis some fundamental Pandas operations that every data analyst should know: 📌 1. View First Rows Use head() to display the first 5 rows of a dataset. df.head() 📌 2. View Last Rows Use tail() to display the last 5 rows. df.tail() 📌 3. Statistical Summary Get quick insights like count, mean, std, min, max using: df.describe() 📌 4. Select Single Column df['Name'] 📌 5. Select Multiple Columns df[['Name', 'Age']] 📌 6. Add New Column df['Salary'] = df['Age'] * 1000 📌 7. Basic Filtering Filter rows based on a condition: df[df['Age'] > 25] 💡 Pandas makes data cleaning and analysis fast, simple, and efficient. #Python #Pandas #DataAnalysis #Data #Aspiring #LinkedInLearning #100DaysOfCode #Analytics #CareerTransition #Techdatacommunity #LearningJourney.
Like Comment
To view or add a comment, sign in
Osman Mohd
3w
Report this post
I’m excited to share my latest project: a comprehensive Descriptive Statistics Suite built in Python! 🚀 Before jumping into complex Machine Learning models, every great data story starts with a deep dive into the data's "personality." This project automates that process using the industry-standard stack: NumPy, Pandas, and SciPy. Key highlights of what I’ve built: 🔹 Central Tendency: Automated calculation of Mean, Median, and Mode to find the "heart" of the data. 🔹 Dispersion Analysis: Measuring Variance, Standard Deviation, and IQR to quantify data spread and volatility. 🔹 Distribution Shape: Using Skewness and Kurtosis to identify symmetry and the likelihood of extreme outliers. 🔹 Visualizations: Clean, publication-ready Histograms, Frequency Polygons, and Pie Charts for intuitive storytelling. This repository is designed to be a "one-click" solution for anyone performing initial Exploratory Data Analysis (EDA). 📂 Check out the full code and documentation on GitHub: https://lnkd.in/gBPsc95s I’d love to hear your thoughts or any suggestions for future statistical features! #DataScience #Python #DataAnalytics #Statistics #GitHub #Pandas #NumPy #DataVisualization #MachineLearning #Coding

GitHub - aahilali12/Statistics_Descriptive_Stats: This Python repository provides a robust framework for Descriptive Statistical Analysis using NumPy, Pandas, and SciPy. It automates the transition from raw data to meaningful narratives by offering numerical summaries and visual insights. Key features include Central Tendency, Dispersion (IQR/Variance), and Distribution Shape (Skewness) github.com
Like Comment
To view or add a comment, sign in
Mitali Jain
1w
Report this post
𝗗𝗮𝘆 𝟲 𝗼𝗳 𝘀𝗵𝗮𝗿𝗶𝗻𝗴 𝗺𝘆 𝗷𝗼𝘂𝗿𝗻𝗲𝘆 ✨ After working with Python in data analysis, one thing became clear: 𝗬𝗢𝗨 𝗗𝗢𝗡’𝗧 𝗡𝗘𝗘𝗗 𝗧𝗢 𝗞𝗡𝗢𝗪 𝗘𝗩𝗘𝗥𝗬𝗧𝗛𝗜𝗡𝗚. 𝗬𝗢𝗨 𝗡𝗘𝗘𝗗 𝗧𝗢 𝗞𝗡𝗢𝗪 𝗪𝗛𝗔𝗧 𝗔𝗖𝗧𝗨𝗔𝗟𝗟𝗬 𝗚𝗘𝗧𝗦 𝗨𝗦𝗘𝗗. Here are the Python concepts I rely on regularly: 🔹 𝗣𝗮𝗻𝗱𝗮𝘀 (𝘁𝗵𝗲 𝗯𝗮𝗰𝗸𝗯𝗼𝗻𝗲) → Filtering & slicing data → groupby() for aggregations → Handling missing values 🔹 𝗪𝗿𝗶𝘁𝗶𝗻𝗴 𝗰𝗹𝗲𝗮𝗻𝗲𝗿 𝗰𝗼𝗱𝗲 → List Comprehensions → Functions (reusable logic) → Lambda functions 🔹 𝗗𝗮𝘁𝗮 𝗖𝗹𝗲𝗮𝗻𝗶𝗻𝗴 (𝗺𝗼𝘀𝘁 𝘁𝗶𝗺𝗲 𝗴𝗼𝗲𝘀 𝗵𝗲𝗿𝗲) → fillna() → dropna() → Fixing messy data 🔹 𝗕𝗮𝘀𝗶𝗰 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 → Matplotlib & Seaborn → Spotting trends & patterns 💡 𝗕𝗶𝗴 𝗿𝗲𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: 𝗜𝘁’𝘀 𝗻𝗼𝘁 𝗮𝗯𝗼𝘂𝘁 𝗺𝗮𝘀𝘁𝗲𝗿𝗶𝗻𝗴 𝗮𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗣𝘆𝘁𝗵𝗼𝗻. 𝗜𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝘂𝘀𝗶𝗻𝗴 𝘀𝗶𝗺𝗽𝗹𝗲 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲𝗹𝘆. That’s where the real impact comes from. What do you use the most in your workflow? 👇 #Python #DataAnalytics #Pandas #CareerGrowth #DataScience
Like Comment
To view or add a comment, sign in
Neha Kumari
2w
Report this post
From Raw Websites to Structured Data I recently worked on a project where I extracted real-time data from websites using Python. What I did: - Collected data using BeautifulSoup - Parsed HTML content - Converted unstructured data into a clean dataset using Pandas Why it matters: Data collection is the first step in any data analysis process. Without data, there are no insights! Curious — what kind of data would you scrape? #DataAnalytics #Python #WebScraping #Learning
Like Comment
To view or add a comment, sign in
Shafiq Ahmed
1mo
Report this post
🚀 Project Spotlight: Data Analysis with Python I recently worked on a data analysis project where I explored data using Python libraries. 🧰 Tools I used: ✔ Pandas ✔ NumPy ✔ Matplotlib ✔ Seaborn 📊 Key Highlights: ✅ Cleaned and processed raw data ✅ Performed statistical analysis ✅ Created meaningful visualizations ✅ Identified patterns and trends 💡 This project helped me understand how data can be transformed into insights. 🔗 More projects coming soon on my GitHub! #DataScience #Python #DataAnalysis #Projects #Learning
Like Comment
To view or add a comment, sign in
Vinai Prakash
1mo
Report this post
This data tweak saved us hours: leveraging Python libraries like Pandas and NumPy can transform your data analysis process. In a fast-paced world, professionals often grapple with massive datasets and must find insights swiftly. The right tools can make all the difference. Pandas, with its intuitive data manipulation capabilities, allows you to clean datasets effortlessly. Imagine reducing hours of manual work to just a few lines of code. Paired with NumPy’s powerful numerical operations, you'll be equipped to handle both simple and complex analyses with ease. Visualization is where the magic happens. By using these libraries, you can quickly turn raw data into impactful visual stories, making your insights not only understandable but also compelling. Data-driven decision-making becomes a breeze. Why limit your potential? The synergy of Python, Pandas, and NumPy is a game-changer for anyone looking to elevate their data skills. Want the full walkthrough in class? Details: https://lnkd.in/gjTSa4BM) #Python #Pandas #DataAnalysis #DataScience #DataVisualization
Like Comment
To view or add a comment, sign in

178 followers

30 Posts

View Profile Follow

Transcript

More Relevant Posts

Explore related topics

Explore content categories