Your dataset has 500 features. Your model only needs 20. The other 480 are noise, redundancy, or both — slowing down training and hurting accuracy. We broke down the 3 algorithms you actually need: Slide 1: PCA — linear, interpretable, fast. Your default. Slide 2: t-SNE — nonlinear, beautiful for visualization, slow on large data Slide 3: UMAP — modern, 10x faster than t-SNE, preserves local + global structure Slide 4: When to use which (decision tree with 4 questions) Slide 5: The common trap: t-SNE axes are NOT features. You can't use them as inputs to a model. Slide 6: Free notebook with all 3 on the same dataset — see the differences yourself Free notebook with side-by-side code for all three: https://lnkd.in/gcbS7m-m If you've been using PCA as a black box, this upgrades you. #MachineLearning #DataScience #PCA #UMAP #DimensionalityReduction #UnsupervisedLearning #Python #Sklearn
Topfolio’s Post
More Relevant Posts
-
🚀 𝗗𝗮𝘆 𝟯 : 𝗧𝗼𝗱𝗮𝘆 𝗜 𝗲𝘅𝗽𝗹𝗼𝗿𝗲𝗱 𝘀𝗼𝗺𝗲 𝗯𝗮𝘀𝗶𝗰 𝗯𝘂𝘁 𝘃𝗲𝗿𝘆 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗣𝗮𝗻𝗱𝗮𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 📊 🔍 1. head() Shows the first 5 rows of the dataset df.head() 🔍 2. tail() Shows the last 5 rows df.tail() 📏 3. shape Returns number of rows and columns df.shape ℹ️ 4. info() Provides summary of dataset (data types, null values) df.info() 📊 5. describe() Gives statistical summary (mean, min, max, etc.) df.describe() 📌 6. columns Shows all column names df.columns 💡 Key Learning: Understanding your dataset is the first step before doing any analysis. #Day3 #Pandas #Python #DataAnalytics #LearningJourney #DataExploration
To view or add a comment, sign in
-
✅ Day 89 of 100 Days LeetCode Challenge Problem: 🔹 #1480 – Running Sum of 1D Array 🔗 https://lnkd.in/gSTZrxF7 Learning Journey: 🔹 Today’s problem focused on computing the running (prefix) sum of an array. 🔹 Instead of using an extra array, I optimized the solution by modifying the input array in-place. 🔹 Starting from index 1, I updated each element as: • nums[i] += nums[i-1] 🔹 This way, each index stores the cumulative sum up to that point. 🔹 Finally, returned the modified array. Concepts Used: 🔹 Prefix Sum 🔹 In-place Computation 🔹 Array Traversal Key Insight: 🔹 The previous element already stores the prefix sum, so we can reuse it directly. 🔹 Eliminates the need for extra space while maintaining linear time. Complexity: 🔹 Time: O(n) 🔹 Space: O(1) #LeetCode #Algorithms #DataStructures #CodingInterview #100DaysOfCode #Python #ProblemSolving #LearningInPublic #TechCareers
To view or add a comment, sign in
-
-
Day 30/100 — Solved “Maximum Depth of N-ary Tree” . today, A simple problem on the surface, but a great exercise to reinforce recursive thinking and traversal strategies. Approach 1: Depth-First Search (DFS) Used recursion to explore each branch of the tree as deep as possible. For every node, I computed the depth of all its children and took the maximum among them. The final answer for each node becomes 1 + max depth of its children. Base cases handled: If the node is null → depth = 0 If the node has no children → depth = 1 This approach is intuitive and closely follows the definition of depth itself. Approach 2: Breadth-First Search (BFS) Traversed the tree level by level using a queue. Each iteration processes one full level of nodes, and the depth counter increments after finishing each level. This approach is useful when thinking in terms of layers rather than paths. Complexity: Both approaches run in O(n) time since every node is visited once. Space complexity differs based on recursion depth (DFS) vs queue size (BFS). Key takeaway: Understanding both DFS and BFS gives flexibility in tackling tree problems from different perspectives—path-based vs level-based thinking. #Day30 #100DaysOfCode #DSA #LeetCode #Python #CodingJourney #Algorithms #DataStructures #ProblemSolving
To view or add a comment, sign in
-
-
Days 68-69 of the #three90challenge 📊 Today I explored NumPy operations — specifically indexing and slicing arrays. After understanding NumPy basics, this step made it easier to access and manipulate data efficiently. What I practiced today: • Accessing elements using indexing • Extracting subsets of data using slicing • Working with multi-dimensional arrays • Performing operations on selected data Example thinking: Instead of looping through data manually, I can directly select and operate on specific parts of an array. Example: import numpy as np arr = np.array([10, 20, 30, 40, 50]) print(arr[1:4]) # Output: [20 30 40] This makes data manipulation faster and more intuitive. From handling data → to controlling it efficiently 🚀 GeeksforGeeks #three90challenge #commitwithgfg #Python #NumPy #DataAnalytics #LearningInPublic #Consistency #Upskilling
To view or add a comment, sign in
-
A Renko forecasting pipeline was tested on EURUSD using MetaTrader 5 tick/time-bar data converted into 11,578 Renko bars with ATR-derived brick size 0.00028. Training used 60 days of M5 history and 14 engineered features covering direction sequences and volume stats. CatBoost delivered 59.27% test accuracy. Feature importance was led by volume: last_volume (18.36), avg_volume (14.23), volume_ratio (12.81), followed by consecutive-move metrics, challenging price-pattern-heavy rule sets. Implementation stack: Python, numpy, pandas, MetaTrader5 API, catboost. Modules cover data ingest, Renko conversion, feature generation, cross-validated training, and probability-based signaling with a 75% confidence threshold. #MQL5 #MT5 #AlgoTrading #AITrading https://lnkd.in/dAGfEC7K
To view or add a comment, sign in
-
-
Day 14/100 – Data Structures & Algorithms Today, I worked on the problem “First Unique Character in a String.” Overview The task is to identify the first non-repeating character in a string and return its index. If no such character exists, the result is -1. Approach I used a two-pass strategy: • First pass to store character frequencies using a hashmap • Second pass to identify the first character with a frequency of one Complexity • Time Complexity: O(n) • Space Complexity: O(1) Key Takeaway This problem reinforces how effective hashmaps are for frequency-based problems and how a simple two-pass approach can lead to optimal solutions. Staying consistent and building problem-solving intuition step by step. #Day14 #100DaysOfCode #DSA #Python #LeetCode #ProblemSolving #SoftwareEngineering
To view or add a comment, sign in
-
-
A 10 million document RAG dataset occupies 31 GB of RAM at float32. turbovec fits it in just 4 GB - and now it searches it faster than FAISS. I just shipped a new release of turbovec: a Rust vector index with Python bindings, built on Google Research's TurboQuant algorithm. Data-oblivious 2-4 bit quantization that matches the Shannon lower bound on distortion - zero training and no rebuilds when the corpus grows. What's in the box: → Hand-written SIMD kernels - 12–20% faster than FAISS FastScan on ARM; match-or-beat on x86. → O(1) stable-id delete and save/load. The corpus is live and mutable, not a static snapshot. → Drop-in integrations for LangChain, LlamaIndex, and Haystack. → Published benchmarks (recall, speed, compression) at d=200/1536/3072 — every number reproducible from the repo. If you're building RAG where memory, latency, or privacy matters, give it a spin. GitHub: https://lnkd.in/e5M4dVRk Paper: https://lnkd.in/eHRmpYms #RAG #VectorSearch #OpenSource #Rust #Python #𝗟𝗟𝗠 #𝗢𝗽𝗲𝗻𝗦𝗼𝘂𝗿𝗰𝗲 #𝗚𝗲𝗺𝗺𝗮4
To view or add a comment, sign in
-
-
Day 22/100 – DSA Journey Problem: Find Mode in Binary Search Tree (BST) Today’s problem focused on understanding how Binary Search Trees (BST) behave and how we can efficiently extract useful insights from them. Understanding the BST A Binary Search Tree follows a structured property: Left subtree → values ≤ root Right subtree → values ≥ root Because of this, when we perform an Inorder Traversal (Left → Root → Right), the values are visited in sorted order. Why Inorder Traversal? Since duplicates appear consecutively in a sorted sequence, inorder traversal allows us to: Track frequency without using extra space Compare current value with previous value Efficiently determine the most frequent element (mode) Approach Used Traverse the BST using inorder traversal Maintain: Previous value Current count Maximum frequency Update result list whenever a new maximum frequency is found Key Learning This problem highlights how leveraging tree properties can help optimize solutions. Instead of using extra space (like hashmaps), we used traversal behavior to achieve an efficient solution. Conclusion Understanding the underlying structure of data (like BST properties) is often more powerful than brute-force approaches. Smart traversal choices can significantly reduce space complexity and improve performance. #Day22 #100DaysOfCode #DSA #BinarySearchTree #Python #CodingJourney #LeetCode #ProblemSolving
To view or add a comment, sign in
-
-
Built a complete PCA + ML pipeline on a student performance dataset (395 rows, 33 features). After cleaning, standardizing numeric variables, and encoding categorical fields, I explored relationships with correlation and study-habit vs grade visualizations. Then I implemented PCA end-to-end (covariance matrix → eigenvalues/eigenvectors, scree plots, biplots, and transformation dashboards) to understand variance and reduce dimensionality. Finally, I trained an SVM classifier on the top 5 principal components to predict Pass vs Fail, comparing kernels—best result: Linear SVM, 94.94% test accuracy. #Python #PCA #MachineLearning #SVM #DataScience #scikitlearn #AICadmey
To view or add a comment, sign in
-
LeetCode Problem 1448. Count Good Nodes in Binary Tree: "Given a binary tree root, a node X in the tree is named good if in the path from root to X there are no nodes with a value greater than X. Return the number of good nodes in the binary tree." Approach: make use of pre-order traversal i.e., visit root->left->right child, in each visit, keep track of the max element upto that point by maintaining a stack, top of the stk should represent the max element upto that point in the path, pop() elements from stack when both the children (left,right) are explored. Time Complexity: O(n) Space Complexity: O(n) b/c of maintaining a stack #Python #LeetCode #DSA #DataStructures #Algorithms #Stack #Recursion #DFS #BinaryTree #ProblemSolving
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development