Extracting Data from SPAs: 5-Step Escalation Ladder

2mo

Wrote a new blog post: "Extracting Data from SPAs: When BeautifulSoup Isn't Enough" You view-source on a page and see nothing but <div id="root"></div>. The data is there, it's just rendered by JavaScript. But here's what most people miss: you almost never need a headless browser. The post walks through a 5-step escalation ladder: 1. Find the hidden API (check Network tab, works 80% of the time) 2. Intercept responses at the network level with Playwright 3. Block unnecessary resources to speed up browser scraping 2-5x 4. Render JS server-side, then parse with BeautifulSoup 5. Reverse engineer the JavaScript when nothing else scales Plus a few tricks most tutorials skip: __NEXT_DATA__ extraction for Next.js apps, GraphQL endpoint discovery, and WebSocket interception. The goal is always the simplest approach that works. Start with HTTP, escalate to browser only when necessary. Full post: https://lnkd.in/d5d_FJ_9 #WebScraping #Python #ReverseEngineering #SoftwareEngineering #DataExtraction

To view or add a comment, sign in

More Relevant Posts

Deepak Sawant
2mo
Report this post
🔥 𝐓𝐡𝐞 𝐒𝐞𝐜𝐫𝐞𝐭 𝐋𝐢𝐟𝐞 𝐨𝐟 𝐉𝐚𝐯𝐚𝐒𝐜𝐫𝐢𝐩𝐭 — 𝐖𝐡𝐚𝐭 .𝐜𝐥𝐨𝐧𝐞() 𝐑𝐞𝐚𝐥𝐥𝐲 𝐌𝐞𝐚𝐧𝐬🤯 Most devs know how to copy data in JS. But few realize how each method behaves under the hood. Understanding this can save performance, prevent bugs, and improve clarity. There’s more to cloning than just “duplicate this object.” Here’s what you should know: • 📌 Reference vs Value — Not everything actually copies • 🧠 Shallow clone — Copies top level, not nested objects • 🪄 Deep clone — Copies whole structure • ✨ Spread operator (...) — Short but shallow • 🧱 Object.assign() — Also shallow • 🔁 JSON.parse(JSON.stringify()) — Deepish, but loses functions • 🌪️ StructuredClone — True deep clone with edge-case safety • 🧩 Lodash/utility clone — library tools that avoid common traps ➡️ Shallow clone without knowing deeper references leads to side effects. ➡️ JSON.parse loses types, dates, undefined, functions — beware. ➡️ Modern structured clone is the safest way for true deep copies. Knowing how to clone right improves code clarity and eliminates side effects that hide like ghosts. 👇 What’s your go-to way to clone complex objects in JS? #JavaScript #WebDevelopment #CodingWisdom #DeveloperLife #Frontend #SoftwareEngineering #Programming #CleanCode #Performance #JS2026 #TechTips #DevCommunity
1 Comment
Like Comment
To view or add a comment, sign in
Lalit Gujar
1mo
Report this post
📦 Variables & Data Types Link : https://lnkd.in/gN7c82-T Most beginners jump straight into writing code without understanding how programs actually store information. So I started from the very beginning. In this chapter I cover: → What variables are (with a simple labelled-box analogy) → How to declare with var, let, and const → All 5 primitive data types — string, number, boolean, null, undefined → The key difference between var, let, and const → What scope means and why it matters Plus a 4-part assignment at the end to make everything stick. The rule I wish someone told me earlier: ✅ Default to const ✅ Switch to let only when the value needs to change ❌ Avoid var entirely in modern code If you're learning JavaScript or teaching someone who is — this is where the series starts. #JavaScript #WebDevelopment #LearnToCode #chaicode #Frontend #JSFundamentals #hiteshchoudhary #piyushgargh
Like Comment
To view or add a comment, sign in
Shailesh Parmar
1mo
Report this post
𝗠𝗲𝗺𝗼𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗻𝗼𝘁 𝗮 "𝗳𝗿𝗲𝗲" 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝘄𝗶𝗻. ⚡ It’s tempting to wrap every calculation in useMemo or every function in useCallback. But in a large-scale React application, this can backfire. 𝗧𝗵𝗲 𝗖𝗼𝘀𝘁 𝗼𝗳 "𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻": Every time you use these hooks, you aren't just saving a calculation. You are: 1. 𝗜𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴 𝗺𝗲𝗺𝗼𝗿𝘆 𝘂𝘀𝗮𝗴𝗲: React must store the previous value and the dependency array in memory. 2. 𝗔𝗱𝗱𝗶𝗻𝗴 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗼𝘃𝗲𝗿𝗵𝗲𝗮𝗱: On every render, React must run a shallow comparison on every dependency. If you are memoizing a simple .filter() on a 50-item list, the "optimization" overhead is often more expensive than the re-calculation itself. 𝗪𝗵𝗲𝗻 𝘁𝗵𝗲 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳 𝗺𝗮𝗸𝗲𝘀 𝘀𝗲𝗻𝘀𝗲: ✅ 𝗛𝗲𝗮𝘃𝘆 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻: Expensive data processing (e.g., parsing large JSON or complex regex) that actually blocks the main thread. ✅ 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝘁𝗶𝗮𝗹 𝗜𝗻𝘁𝗲𝗴𝗿𝗶𝘁𝘆: When passing objects or functions to children wrapped in React.memo. Without it, the child re-renders on every parent update, defeating the purpose of React.memo. ✅ 𝗘𝗳𝗳𝗲𝗰𝘁 𝗦𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆: When the value is a dependency in a useEffect that triggers an API call or a heavy subscription. 𝗧𝗵𝗲 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆: Don't guess—measure. I’ve started using the 𝗥𝗲𝗮𝗰𝘁 𝗣𝗿𝗼𝗳𝗶𝗹𝗲𝗿 to identify "Wasted Renders" before reaching for a hook. Often, the better fix isn't memoization, but 𝘀𝘁𝗮𝘁𝗲 𝗰𝗼𝗹𝗹𝗼𝗰𝗮𝘁𝗶𝗼𝗻 or 𝗺𝗼𝘃𝗶𝗻𝗴 𝘁𝗵𝗲 𝘀𝘁𝗮𝘁𝗲 𝗱𝗼𝘄𝗻 the component tree. 𝗜𝗻 𝘆𝗼𝘂𝗿 𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝘀𝘁𝗮𝗰𝗸, 𝗮𝗿𝗲 𝘆𝗼𝘂 𝘀𝗲𝗲𝗶𝗻𝗴 𝗺𝗼𝗿𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗴𝗮𝗶𝗻𝘀 𝗳𝗿𝗼𝗺 𝗺𝗲𝗺𝗼𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗼𝗿 𝗳𝗿𝗼𝗺 𝗯𝗲𝘁𝘁𝗲𝗿 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 𝗰𝗼𝗺𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻? 👇 #ReactJS #WebPerformance #FrontendEngineering #JavaScript #ProgrammingTips #SoftwareDevelopment
Like Comment
To view or add a comment, sign in
Xayrulloh Abduvohidov 🇵🇸
2mo
Report this post
𝗧𝗵𝗲 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝘁𝗵𝗮𝘁 𝗡𝗲𝘃𝗲𝗿 𝗙𝗼𝗿𝗴𝗲𝘁𝘀: 𝗠𝗮𝘀𝘁𝗲𝗿𝗶𝗻𝗴 𝗖𝗹𝗼𝘀𝘂𝗿𝗲𝘀 🧠 Hi everyone! I’ve just released Part 5 of my JavaScript deep-dive series: 𝗧𝗵𝗲 𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝗰𝘀 𝗼𝗳 𝗖𝗹𝗼𝘀𝘂𝗿𝗲𝘀. Closures are often treated like a "magic trick" in JavaScript interviews, but they are actually a logical result of how the engine links scopes together. If you’ve ever struggled with why a setTimeout in a loop logs the "wrong" number, or how "private" variables actually work in a language without a private keyword, this article is for you. 𝗜𝗻 𝘁𝗵𝗶𝘀 𝗽𝗮𝗿𝘁, 𝗜 𝗱𝗶𝘃𝗲 𝗶𝗻𝘁𝗼: • 𝗧𝗵𝗲 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗦𝗹𝗼𝘁𝘀: Meet [[Environment]] and [[OuterEnv]], the two hidden connectors that make closures possible. • 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲𝘀 𝘃𝘀. 𝗩𝗮𝗹𝘂𝗲𝘀: Why closures link to "live" variables (and why that leads to the famous loop trap). • 𝗧𝗵𝗲 𝗜𝗜𝗙𝗘 𝘃𝘀. 𝗟𝗲𝘁: How we used to fix closures in the old days vs. the modern ES6 solution. • 𝗘𝗻𝗰𝗮𝗽𝘀𝘂𝗹𝗮𝘁𝗶𝗼𝗻: How to use closures to create "vaults" for your data. Stop guessing how your functions remember data and start understanding the architecture behind it! Read the full article here: https://lnkd.in/dR_TngGA 🔗 𝗡𝗲𝘅𝘁 𝘂𝗽: We tackle the "Grand Design" of JavaScript objects, 𝗣𝗿𝗼𝘁𝗼𝘁𝘆𝗽𝗲𝘀. #JavaScript #SoftwareEngineering #WebDevelopment #Coding #Closures #ProgrammingTips #TechCommunity #ScopeChain

Closures medium.com
Like Comment
To view or add a comment, sign in
Abhay Dhaneshwar
1mo
Report this post
Today I learned something interesting about fetching data in JavaScript. When we use fetch(), many beginners notice that it usually has two .then() methods. At first it looks unnecessary, but there is a clear reason behind it. The first .then() handles the HTTP response returned by fetch(). The second .then() is used to convert that response into actual JSON data using response.json(). Example: fetch("API_URL") .then(response => response.json()) .then(data => { console.log(data); }); While revising this concept, I also learned a cleaner and more modern approach using async/await, which makes asynchronous code easier to read and understand. async function getData() { const response = await fetch("API_URL"); const data = await response.json(); console.log(data); } Both approaches work the same way internally using Promises, but async/await makes the flow feel more natural. Small concepts like these help in writing cleaner and more maintainable JavaScript code. #JavaScript #WebDevelopment #AsyncJavaScript #FetchAPI #CodingJourney #LearningInPublic
Like Comment
To view or add a comment, sign in
Hafiz Abdullah Yasir
1mo
Report this post
🚀 𝗡𝗲𝘄 𝗣𝗥 𝗠𝗲𝗿𝗴𝗲𝗱 Just tackled a fun logical challenge: finding the intersection of two arrays. The goal was to identify elements present in both input arrays. I approached this using JavaScript. My strategy involved iterating through the first array and checking for the existence of each element in the second array. To optimize this lookup, I leveraged a Set data structure, which provides average O(1) time complexity for checking membership. During the 🐞 𝗗𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝗣𝗿𝗼𝗰𝗲𝘀𝘀, I found dry runs and visualizing the data flow particularly helpful. Stepping through the code with a debugger allowed me to pinpoint exactly where my logic was diverging from the expected output. A 📚 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 for me was the significant performance improvement gained by using a Set for lookups compared to nested loops or Array.prototype.includes within a loop. Check out the implementation and contribute to the discussion here: https://lnkd.in/dvQbUFGK How do you typically ⚙️ 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 array intersection problems? 📦 Repo: https://lnkd.in/dvQbUFGK #Algorithm #JavaScript #ProblemSolving #DataStructures #Set #CodingChallenge #Developer #Tech #InterviewQuestion #LogicalThinking
1 Comment
Like Comment
To view or add a comment, sign in
Raphaël Moutard
1mo
Report this post
I kind of miss Stack Overflow. I was debugging a JavaScript issue around this line: newArray.push(...oldArray) Claude found the issue and fixed it in 10 seconds. But I didn’t just want the fix. I wanted to understand why it worked. What the spread operator actually does under the hood ? So I googled. And I landed on this classic Stack Overflow thread: “Find the min/max element of an array in JavaScript” Asked 16 years ago 👴 Viewed 1.7 million times 🤯 Hundreds of developers arguing about the best solution to find the max in an array 🤣 Stack Overflow wasn’t just about answers. It was closer to a social media. It was about arguments: different mental models, trade-offs, edge cases, sometimes brutal, often enlightening. LLMs give you answers. Stack Overflow gave you thinking. Maybe that’s what I miss most: developers as social creatures, disagreeing in public. Is that nostalgia talking or do you feel it too?
3 Comments
Like Comment
To view or add a comment, sign in
Abusayed Shuvo
2mo Edited
Report this post
Data Structures & String Magic Today was all about how we store and manipulate data in JavaScript. I moved beyond simple variables and explored the power of Strings and Objects. It’s fascinating to see how similar Strings and Arrays can be, yet how unique they are in their behavior! My Learning Milestones Today: String Mastery: Explored slice, join, concat, and the importance of case normalization (toLowerCase/toUpperCase) for comparisons. The "Reverse" Challenge: Learned three different ways to reverse a string—a classic interview favorite! Object Essentials: Introduction to Properties and Values. I practiced multiple ways to "get" and "set" data using dot notation and bracket notation. Advanced Object Ops: Working with nested objects, using Object.keys() and Object.values(), and learning how to safely delete properties. Iteration: Mastering how to loop through objects to pull out the data I need. Objects are truly the backbone of JavaScript, and I'm feeling much more confident in how to structure my code. 🚀 #JavaScript #WebDevelopment #CodingJourney #StringsAndObjects #FrontendDev #TechGrowth
Like Comment
To view or add a comment, sign in
The New Stack

28,548 followers
2mo
Report this post
ICYMI: How much faster is WebAssembly than JavaScript for heavy data processing? We do a side-by-side test using an image processor built with Rust. By Jessica Wachtel

WebAssembly vs. JavaScript: Testing Side-by-Side Performance https://thenewstack.io
Like Comment
To view or add a comment, sign in

1,111 followers

View Profile Connect

Extracting Data from SPAs: 5-Step Escalation Ladder

More from this author

Web Vulnerabilities Hiding in Plain Sight

Explore content categories