Data Science Journey: Web Scraping Techniques with Python

3mo

Day 16-20 Data Science Journey 📊 | Data Collection Techniques (16-20/45) From day 16 to day 20 of my Data Science journey, I focused on understanding how real-world data is collected and prepared for analysis. These days were dedicated to learning data collection techniques, with a strong emphasis on web scraping using Python. 1- Data Collection Techniques Overview Importance of data in the data science pipeline Types of data: structured and unstructured primary vs secondary data sources Manual vs automated data collection 2- Introduction to Web Scraping What web scraping is and where it is used Use cases of web scraping in data science Ethical considerations and responsible scraping 3- HTML for Web Scraping Basic structure of HTML Tags, attributes, classes, and IDs Understanding DOM and inspecting elements 4- Using requests Module for Data Collection Sending HTTP GET requests Fetching HTML content from websites Understanding response status codes 5- Using Beautiful Soup for Data Collection Parsing HTML documents Extracting text and elements Navigating and cleaning scraped data Web scraping is a powerful skill when used responsibly. #DataScience #WebScraping #Python #LearningJourney #Day16to20

To view or add a comment, sign in

More Relevant Posts

Syed Abdurrehman
3mo Edited
Report this post
🚀 Starting my hands on journey in Data Cleaning and Preprocessing Today, I worked on a small but realistic project where I: ✔ Scraped raw data from a public website ✔ Converted unstructured web data into a structured dataset ✔ Inspected the data for missing values and duplicates ✔ Identified real-world patterns (e.g., repeated authors, tag structures) ✔ Performed safe cleaning and preprocessing to make the data analysis-ready One important thing I’m learning is that data cleaning is not about deleting data blindly, but about understanding context and preserving meaning. Tools used: Python Pandas BeautifulSoup I’ll be continuing to work on more real-world style datasets (including scraping, cleaning, and preprocessing) and documenting everything along the way. If you’re also learning data science or data analysis, feel free to connect always happy to learn and grow together. #DataCleaning #DataPreprocessing #Python #Pandas #WebScraping #LearningInPublic #DataScienceJourney
Like Comment
To view or add a comment, sign in
Divyansh Gulyani
2mo
Report this post
💡 Pandas Basics: loc vs. iloc – Which one should you use? If you're just starting with Python for Data Science, one of the first hurdles is mastering how to select data from a Pandas DataFrame. Two of the most essential methods you'll use are .loc[] and .iloc[]. They look similar, but they behave very differently! 🔎 🔹 1. .loc[]: Label-Based Selection Think of .loc as searching by NAME. You use it when you know the specific labels of your rows and columns. Syntax: df.loc[row_label, column_label] Key Feature: It is inclusive of the endpoint. Example from image: df.loc[1:2, "Name":"Age"] returns rows with labels 1 and 2, including the "Age" column. 🔹 2. .iloc[]: Integer-Based Selection Think of .iloc as searching by POSITION. It stands for "integer location". Syntax: df.iloc[row_index, column_index] Key Feature: It is exclusive of the endpoint (just like standard Python slicing). Example from image: df.iloc[1:3, 0:2] returns rows at index 1 and 2 (3 is excluded) and the first two columns. 🚀 Pro-Tip for your workflow: Use .loc when you have meaningful labels and want readable code. #Python #Pandas #DataScience #DataAnalytics #MachineLearning #CodingTips #TechEducation # Abhishek kumar # Harsh Chalisgaonkar # SkillCircle™
2 Comments
Like Comment
To view or add a comment, sign in
Niharika Kavati
2mo
Report this post
📊𝗗𝗔𝗬 𝟭𝟲 | 𝗗𝗔𝗧𝗔 𝗦𝗖𝗜𝗘𝗡𝗖𝗘 & 𝗗𝗔𝗧𝗔 𝗔𝗡𝗔𝗟𝗬𝗧𝗜𝗖𝗦 𝗟𝗘𝗔𝗥𝗡𝗜𝗡𝗚 𝗝𝗢𝗨𝗥𝗡𝗘𝗬 🚀 Today’s focus: Functions in Python 🐍 On Day 16, I explored one of the most powerful and important concepts in Python — Functions. 🔹 What is a Function? A function is a reusable block of code that performs a specific task. Instead of writing the same code again and again, we can simply call a function. 🔹 What I Learned Today: How to define a function using def ✅ Function parameters and arguments ✅ Return statement ✅ Calling a Function ✅ Arbitrary Arguments (*args) ✅ Keyword Arguments ✅Arbitrary Keyword Arguments (**kwargs) ✅Recursion 🔹 Why Functions Matter in Data Science? In Data Science & Data Analytics, we often: Clean data Perform calculations Apply transformations Build reusable logic Functions help us write clean, modular, and efficient code — which is very important when working with large datasets. Every small concept I learn is building a strong foundation for my journey in Data Science & Data Analytics. Consistency is the key 🔑 Day by day, step by step — growing and improving. 📈 Stay tuned for Day-17🤗♥️
2 Comments
Like Comment
To view or add a comment, sign in
Nibedita Nanda
3mo
Report this post
🚀 Day 5 | Python Collection Data Types 🧩 Collections are where Python really starts to feel powerful — they help us structure, organize, and manipulate data efficiently. Every Data Scientist must be comfortable here. In today’s carousel / notebook, I covered: ✔ String (str) – Indexing, slicing (all 5 syntaxes) – Forward & backward slicing – Palindrome checks – Complete overview of built-in string methods ✔ List (list) – Creation (empty & non-empty) – Indexing & slicing – Mutability and in-place modification – List methods (append, extend, pop, sort, etc.) – Shallow copy vs deep (reference) copy ✔ Tuple (tuple) – Immutable collections – Indexing & slicing – Tuple methods (count, index) – Sorting tuples using sorted() ✔ Set (set) – Unique elements – No indexing or slicing – Set operations: union, intersection, difference – Practical set methods (add, remove, discard, etc.) ✔ Dictionary (dict) – Key–value data structure – Insertion order – Dictionary methods (get, update, pop, items, etc.) This notebook helped me clearly understand when to use which collection, how Python handles mutability, and how built-in methods simplify real-world data manipulation. 🙏 Grateful to my mentor, Nallagoni Omkar Sir, for guiding me through these concepts with clarity and strong fundamentals. 📌 Part of my learning-in-public journey — building Python step by step, the right way. 👉 Next up: Control Flow (if–else, loops) & problem-solving 🚀 #Python #CorePython #CollectionDataTypes #LearningInPublic #StudentOfDataScience #ProgrammingFundamentals #DataScienceJourney #NeverStopLearning
Like Comment
To view or add a comment, sign in
Komal Dhanjal
2mo
Report this post
𝗙𝗮𝗸𝗲 𝗡𝗲𝘄𝘀 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗦𝘆𝘀𝘁𝗲𝗺 | 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗣𝗿𝗼𝗷𝗲𝗰𝘁: I developed a Fake News Detection System using Python and Machine Learning to identify whether a given news article is Real or Fake 📰✅❌ In this project, users can enter news text into a GUI-based application 🖥️. The system processes the text, converts it into numerical features using a trained vectorizer, and then applies a Machine Learning model to predict the result 📊. Along with the prediction, it also displays a confidence percentage, helping users understand the reliability of the output 🎯. To make the system more practical, I integrated a MySQL database 🗄️. Each prediction (news text, result, confidence, and timestamp) is securely stored and can be viewed later in the prediction history window 📜. 🔍 What I learned from this project: 👉 Text preprocessing and vectorization ✍️ 👉 ML model implementation for text classification 🤖 👉 Desktop GUI development using Tkinter 🖥️ 👉 Python–MySQL database integration 🔗 🛠️ Technologies Used: Python 🐍 | Machine Learning | Pandas | Tkinter | MySQL 🔗 GitHub Repository: 👉 https://lnkd.in/gcY97evF ✨ Open to feedback and suggestions! #MachineLearning #PythonProject #FakeNewsDetection #AI #StudentProject #DataScience

1 Comment
Like Comment
To view or add a comment, sign in
Anuj MaGre
3mo
Report this post
Series Title: 🚀 Starting My Data Analytics Journey from Scratch! Post #13: 🔢 Introduction to NumPy for Data Analytics As I continue my learning journey in Data Analytics, today I started learning NumPy, which is one of the most important Python libraries for numerical and array-based computations. I learned that NumPy is widely used because it allows faster data processing and efficient handling of large datasets, making it a core library for data analysis. ~ What is NumPy? NumPy (Numerical Python) is a powerful library that helps perform mathematical, numerical, and statistical operations on data. It also acts as the foundation for advanced libraries like Pandas, Matplotlib, and Scikit-learn. ~ NumPy Arrays I learned about NumPy arrays and how they are different from Python lists: - They are faster in performance - They use less memory - They are designed specifically for numerical operations ~ Core Concepts I Learned Today • Creating NumPy arrays using different methods • Understanding array attributes like shape, size, and data type • Indexing to access specific elements • Slicing to extract subsets of data • Reshaping arrays to change dimensions • Iterating through arrays efficiently These concepts help in organizing and managing data in a structured way. Working with Data Using NumPy I also explored how NumPy simplifies data manipulation: • Joining multiple arrays • Splitting arrays into smaller parts • Sorting data for better analysis • Searching values inside arrays • Filtering data using conditions • Performing arithmetic operations on arrays • Applying statistical operations like mean, min, max, and sum Key Takeaway Learning NumPy helped me understand how numerical data is handled efficiently in Data Analytics. It is an important step before working with real-world datasets and advanced analysis. Step by step, I’m building a strong foundation in Python for Data Analytics 🚀 If you’re learning Data Analytics, what was your first experience with NumPy? #DataAnalytics #NumPy #Python #LearningJourney #Upskilling #DataAnalysis #Post13 #LinkedInSeries
Like Comment
To view or add a comment, sign in
Keith Mphahama
2mo
Report this post
Lately, I’ve been deep in the world of Python data analytics libraries — exploring tools like Pandas, NumPy, and Matplotlib to strengthen my analytical toolkit. I’ll be honest: it feels different from when I was learning SQL. With SQL, I was building projects week in and week out — constantly querying, cleaning, transforming datasets. It felt very tangible and project-driven. Now, while diving into Python libraries, the learning feels more foundational. Less “big project every week” and more understanding how things truly work under the hood. And that’s okay. Not every phase of growth needs to look the same. Sometimes you build. Sometimes you sharpen. Sometimes you slow down to go deeper. This phase is about strengthening fundamentals — mastering data manipulation, understanding performance, writing cleaner code, and thinking more analytically. Projects will come. Progress is still happening. The journey isn’t about speed — it’s about depth and consistency. #DataAnalytics #Python #LearningJourney #ContinuousImprovement #AspiringDataAnalyst #Data #DataAnalyst
Like Comment
To view or add a comment, sign in
Lalith sai Meesala
2mo
Report this post
Day 3 of my Python learning journey in Data Analytics & Data Science 🐍 Today was all about understanding how data actually works in Python using CRUD operations (Create, Read, Update, Delete) and exploring some important data structures. I worked with: 👉 Lists – updating, deleting, and modifying values 👉 Tuples – learning about immutability and how mutable elements inside tuples can still change 👉 Sets – storing unique values and performing union & removal operations 👉 Dictionaries – managing data using key-value pairs 👉 Nested data – accessing detailed student information inside dictionaries It was great to see how Python handles different types of data in real-world scenarios rather than just theory. Along with Python, I’m also practicing SQL daily, and for this month I’m focusing deeply on Data Analytics before moving into Data Science. Slow progress is still progress 💪 Excited to keep going — Day 4 coming up! 🚀 #PythonLearning #Day3 #DataAnalytics #DataScience #SQL #LearningEveryday #ProgrammingJourney #Consistency #10kcoders
Like Comment
To view or add a comment, sign in
Yasser Mustafa, PhD
2mo Edited
Report this post
🚀 After many years building production AI systems, I got tired of writing Pandas code that worked but nobody could read. So, I built 𝐏𝐢𝐩𝐞𝐅𝐫𝐚𝐦𝐞; an open-source Python library where data manipulation reads like plain English, using simple verbs like 𝑓𝑖𝑙𝑡𝑒𝑟(), 𝑑𝑒𝑓𝑖𝑛𝑒(), and 𝑎𝑟𝑟𝑎𝑛𝑔𝑒() chained together with a ">>" operator that flows exactly the way you think. I just published a full article covering the story behind it, the design philosophy, and a beginner-friendly tutorial with real examples. Link: https://lnkd.in/ey8bqWbN and if it resonates, a ⭐ on GitHub means the world 🙏 ↳ https://lnkd.in/eBRyHBna #Python #DataScience #OpenSource #Pandas #DataEngineering #PipeFrame #MachineLearning

GitHub - Yasser03/pipeframe: A modern, intuitive data manipulation library for Python that makes your data workflows read like natural language. Built on pandas' robust foundation with a clean, pipe-based syntax inspired by R's dplyr and tidyverse. Built with ❤️ for data scientists who value readability. Make your data speak naturally with PipeFrame 🔄. github.com

4 Comments
Like Comment
To view or add a comment, sign in

278 followers

24 Posts

View Profile Follow

Data Science Journey: Web Scraping Techniques with Python

More Relevant Posts

Explore related topics

Explore content categories