📅 Day 23/30 — Databases & SQL Basics for Data Science Kicking off my database adventure on this 30-day data science journey! Today, I dove into the world of structured data storage and why it's a game-changer for handling real-world datasets. What I worked on today: 🗄️ Explored databases and their crucial role in data science—like efficient storage, querying, and scalability for massive datasets 💻 Got hands-on with SQL: Installed MySQL on my system 🛠️ Created my first database and tables to organize data like a pro This foundational step feels powerful—now I can store and retrieve data systematically before analyzing it! ➡️ Next step: SQL Queries & Data Manipulation #LearningInPublic #Python #DataScience #Anaconda #JupyterNotebook #SQL #MySQL #Databases #30DaysOfLearning #ProgrammingJourney
Database Basics for Data Science with MySQL
More Relevant Posts
-
DAY 137 – DATA SCIENCE JOURNEY: SQL FUNDAMENTALS & DATABASE OPERATIONS Today, I focused on understanding how databases work behind the scenes by practicing SQL on a Student Database. WHAT I LEARNED ▪ Creating Tables Designed structured tables using CREATE TABLE with proper data types and primary keys. ▪ Inserting Data Used INSERT INTO to add records and understand row-based data storage. ▪ Querying Data Practiced SELECT queries with conditions (WHERE, >, <, =) to extract meaningful insights. ▪ Updating Records Applied UPDATE to modify specific data based on conditions. ▪ Deleting Data Learned to remove records safely using DELETE with filters. ▪ Altering Tables Worked with ALTER TABLE to add, modify, and remove columns. KEY TAKEAWAY SQL is not just about writing queries — it is about thinking logically and managing data efficiently. Each command (CREATE, INSERT, SELECT, UPDATE, DELETE, ALTER) is a fundamental building block for Data Science. NEXT STEP Joins and Advanced Queries #Day137 #DataScienceJourney #SQL #LearningInPublic #Database #Python #GrowthMindset #FutureDataScientist
To view or add a comment, sign in
-
-
Day 67 of my Data Engineering journey 🚀 Today I worked with Spark SQL combining the simplicity of SQL with the power of distributed processing. 📘 What I learned today (Spark SQL with Apache Spark): • Creating temporary views from DataFrames • Running SQL queries on large datasets • Using spark.sql() for querying • Performing joins and aggregations using SQL • Comparing DataFrame API vs SQL approach • Writing cleaner and more readable queries • Leveraging SQL knowledge in Big Data systems • Choosing between SQL and DataFrame transformations Spark SQL makes it easier to work with data using familiar SQL syntax even at scale. Instead of writing complex code, you can express logic in simple SQL queries. Best part? It runs on distributed systems. SQL + Spark = powerful combination. Why I’m learning in public: • To stay consistent • To build accountability • To improve daily Day 67 done ✅ Next up: working with different file formats (Parquet, JSON, CSV) in Spark 💪 #DataEngineering #BigData #ApacheSpark #SQL #Python #LearningInPublic #CareerGrowth #Consistency
To view or add a comment, sign in
-
Master SQL step-by-step with this complete learning roadmap 📊💻 From basic queries and filtering to joins, subqueries, performance tuning, and advanced database concepts, this guide helps you build strong database skills for Data Analytics, Data Science, and Backend Development. Follow the path → Practice queries → Build real projects → Become SQL-ready for interviews 🚀 #SQL #SQLRoadmap #LearnSQL #Database #DataAnalytics #DataScience #SQLDeveloper #Programming #CodingJourney #TechSkills #BackendDevelopment #DataEngineer #CodingCommunity #TechLearning #100DaysOfCode Skillcure Academy Radhika Yadav Sanjana Singh Akhilendra Chouhan
To view or add a comment, sign in
-
-
💬 SQL Challenge of the Day 📝❓ Question Using the "Recursive CTEs" topic, write a SQL query to generate a Fibonacci sequence up to the 10th number. The Fibonacci sequence starts with 0 and 1, and each subsequent number is the sum of the two preceding numbers. 💡 Answer ```sql WITH RECURSIVE FibonacciCTE AS ( SELECT 0 AS n, 0 AS fib UNION ALL SELECT 1, 1 UNION ALL SELECT n + 1, CASE WHEN n = 0 THEN 0 WHEN n = 1 THEN 1 ELSE (SELECT fib FROM FibonacciCTE WHERE n = t.n - 1) + (SELECT fib FROM FibonacciCTE WHERE n = t.n - 2) END FROM FibonacciCTE t WHERE n < 10 ) SELECT fib AS Fibonacci_10th_Number FROM FibonacciCTE WHERE n = 9; ``` ✨ Explanation This query uses a recursive common table expression (CTE) to generate a Fibonacci sequence up to the 10th number. The CTE starts with the base cases of 0 and 1, then recursively calculates the Fibonacci numbers by summing the two previous numbers. The final result returns the 10th number in the Fibonacci sequence. 🛠️ Example (for ease of understanding) For the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 The query will output: ``` Fibonacci_10th_Number 34 ``` #Hashtags #PowerBIChallenge #PowerInterview #LearnPowerBi #LearnSQL #TechJobs #DataAnalytics #DataScience #BigData #DataAnalyst #MachineLearning #Python #SQL #Tableau #DataVisualization #DataEngineering #ArtificialIntelligence #CloudComputing #BusinessIntelligence #Data
To view or add a comment, sign in
-
#StudyClub Starting a Large Scale Database Kicking off this new phase of my SQL journey In this step, I designed and built a PostgreSQL database architecture that handles 71+ million records moving from theory into a more real world, large scale scenario. At this stage, the focus is still on the foundation: • Designing tables based on a structured Snowflake Schema • Implementing relationships across 24 tables • Loading massive datasets efficiently • Ensuring the data is clean and ready for further optimization This is just the beginning no indexing or performance tuning yet. Right now, it’s all about making sure the data is properly structured and loaded at scale. Next step I’ll start working on indexing strategies and query performance optimization on top of this dataset. Mentor : Hilmi . Team: Chelsea Ayu Adhigiadany Monika Hermiani Yolanda Simamora Reihan Nanda R. nadhira M. Dyah Ayu Goldy Aprida Sapitri Br Saragih Medina Uli Alba Somala Rintaldi Ghazian Hindami Siti Komala Hafsaninda MR #SQLDay #postgreSQL #EntityRelationshipDiagram #ManifoldDatafolk #DatabaseDesignPrinciple #StudyClub #Python #ETLProcess
To view or add a comment, sign in
-
🚀 Data Engineering Tip: Data Partition Pruning (Easy Performance Win) Want faster queries without changing much code? 👉 Learn Partition Pruning 👇 💡 What is it? Only the required partitions are scanned instead of the full dataset. 📊 Example: Table partitioned by date Query: 👉 Get data for 2026-04-01 ❌ Without pruning → scans full table ✅ With pruning → scans only that date partition 🎯 Why it matters: ✔️ Faster queries ✔️ Lower compute cost ✔️ Better performance in Spark, Hive, Snowflake 🛠️ Where it’s used: Spark | PySpark | Hive | Delta Lake | BigQuery 🚀 Tech Stack: Python | SQL | Spark | PySpark | Delta Lake | Kafka | Airflow 💡 Pro Tip: Always filter on partition columns in your queries 👉 Are you using partition pruning in your queries? Comment “YES” or “LEARNING” 👇 #DataEngineering #BigData #PerformanceTuning #Spark #SQL #DeltaLake #ETL #DataPipelines #TechLearning
To view or add a comment, sign in
-
Most SQL developers know the basics. The top 1% know how to use SQL to answer questions a business actually cares about. 🎯 𝐇𝐞𝐫𝐞'𝐬 𝐲𝐨𝐮𝐫 𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐒𝐐𝐋 𝐂𝐡𝐞𝐚𝐭𝐬𝐡𝐞𝐞𝐭 — 𝐭𝐡𝐞 𝐜𝐨𝐧𝐜𝐞𝐩𝐭𝐬 𝐭𝐡𝐚𝐭 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐠𝐨𝐨𝐝 𝐚𝐧𝐚𝐥𝐲𝐬𝐭𝐬 𝐟𝐫𝐨𝐦 𝐠𝐫𝐞𝐚𝐭 𝐨𝐧𝐞𝐬: ✅ Recursive CTEs — traverse hierarchies, org charts & category trees ✅ NTILE & PERCENT_RANK — rank and bucket your data like a pro ✅ ROWS BETWEEN — build sliding averages for any time window ✅ FIRST_VALUE — compare every row against the group's best ✅ Gaps & Islands — detect streaks and missing sequences in data ✅ Conditional Aggregation — pivot data without a PIVOT function Save this. You WILL need it. 💾 ♻️ Repost to help someone in your network level up their SQL. 📌 Follow Navya sri Kurapati🧑💻 for daily SQL, Python & Data content 🔗 Book a 1:1 mentorship session → https://lnkd.in/gfqXGEnq #SQL #AdvancedSQL #SQLTips #DataAnalytics #DataScience #DataEngineering #WindowFunctions #CTE #LearnSQL #DataAnalyst #TechCareer #SQLServer #Analytics #CareerGrowth #NavyaSriKurapati
To view or add a comment, sign in
-
-
Mastering SQL from Beginner to Advanced in one structured roadmap! � Covered key concepts like Joins, Subqueries, CTEs, Window Functions, Indexing, Query Optimization, Transactions, ACID, Stored Procedures, Triggers, and Real-Time SQL Scenarios. Perfect for Data Analytics, SQL Interviews, DBMS revision, and placement prep 💡 Consistency + daily learning = growth 📈 Follow Rishabh Singh for more daily learning content on SQL, Excel, Python & Data Analytics. #SQL #DataAnalytics #DBMS #LearningJourney #SQLInterviewQuestions #QueryOptimization #WindowFunctions #CTE #DataScience #CareerGrowth #Students #TechLearning
To view or add a comment, sign in
-
Swipe through the slides first 👉 then read below 👇 🚀 Day 22 of 30 — Learning PySpark from Scratch Your data doesn't live in CSV files at work. It lives in databases. Here's how PySpark connects to them. 🔌 Here's what I learned on Day 22 👇 ⚡ JDBC — the universal database connector PySpark uses JDBC to connect to any SQL database. PostgreSQL, MySQL, SQLite — same syntax for all of them. jdbc_url = "jdbc:postgresql://localhost:5432/mydb" props = { "user": "your_user", "password": "your_pass", "driver": "org.postgresql.Driver" } df = spark.read.jdbc(url=jdbc_url, table="employees", properties=props) df.show(5) ⚠️ The most important thing I learned Never read a full table and filter in Spark. Push the filter to the database instead. # BAD — pulls ALL rows then filters df.filter(df.salary > 70000) # GOOD — filters at the DB level query = "(SELECT * FROM employees WHERE salary > 70000) AS emp" df = spark.read.jdbc(url=jdbc_url, table=query, properties=props) 💻 Parallel reads for big tables df = spark.read.jdbc( url=jdbc_url, table="employees", column="id", # partition column lowerBound=1, upperBound=100000, numPartitions=10, # 10 parallel reads! properties=props ) ✅ 3 things I didn't know before today → PySpark needs the JDBC driver JAR for your specific database → Pushing filters as SQL subquery = massive performance gain → numPartitions splits the table read across multiple workers 💡 My Day 22 takeaway PySpark isn't just for files. It's a full data platform that connects to your entire data ecosystem. ❓ What database does your team use most at work? Drop it in the comments 👇 Follow me for Day 23 tomorrow → Delta Lake explained simply 🔔 #PySpark #DataEngineering #BigData #Python #LearnInPublic #30DaysOfPySpark
To view or add a comment, sign in
-
Sessions 1–5 of the Data Analyst Bootcamp with SQL & Python in Google Platform — Done! In the first five sessions of the Data Analyst bootcamp organized by DQLab, I learned SQL fundamentals using Google BigQuery. The topics covered include database basics, core SQL concepts, and the differences between DDL & DML. I also practiced building basic queries (SELECT) and working with clauses such as WHERE, ORDER BY, and GROUP BY, along with aggregation functions and HAVING. Additionally, I explored date functions, number formatting, and advanced techniques like subqueries, CTEs, and JOINs. Excited to continue with the next sessions and keep growing in data analytics 🚀 Huge thanks to Kak Shella Theresya Pandiangan for the guidance and insightful explanations throughout this journey 🙏 #DataAnalyst #SQL #GoogleBigQuery #DQLab #LearningJourney
To view or add a comment, sign in
-
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development