Data Driven - Yet Again!

Good morning, LinkedIn!

(If you're hearing that in Robin Williams' voice from "Good Morning, Vietnam", we might be on the same wavelength!)

I took the plunge and started looking into the data about the learning journey(s) I capture more deeply...

LinkedIn

Well, that's very meta. No, not that Meta!

I started down the rabbit hole of tracking which LinkedIn folks are influencing my thinking about three months ago. Sadly, the URLs that I've captured in that time don't tell a consistent story. There are around 100 of you, but you may not know who you are. Yet!

Screen capture of Excel with some LinkedIn URLs and the XLOOKUP that references good data even when the URLs don't give it.

I have no clue (so far) what separates the "activity-#" URLs from all the rest of the "/posts/". Using an XLOOKUP against the data I capture, I can still get some decent information about the who and the what of it ... assuming I typed in something coherent in the first place.

For the most part, a "/posts/" URL contains the LinkedIn ID of the original poster as the first part of the page reference before an underscore character. Huzzah! That gets me to a name of some kind at least. Some are more obvious than others. Some have one word with no dashes. Some have multiple words with dashes. Some have multiple words and a trailing hexadecimal string with dashes. A few have "escaped" Unicode characters in them. Fun times!

Of course, "/posts/" are not all there is, and the format of the other URLs gets tougher and tougher.

Another Excel screen capture of XLOOKUPs against other things than "/posts/" since their URL formats are all different and harder to reliably parse.

So, the "/advice/", "/events/", "/feed/", "/pulse/", "/video/", etc. all have different URL formats. Grumble! The first three types don't appear to contain any information about the poster / originator. "Pulse" URLs do contain information about the poster / originator, but it's at the end and no longer separated by an underscore from the rest of the URL. Grumble, grumble!

Since the poster / originator's LinkedIn ID could be (at least) one to three words separated by dashes and could include "escaped" Unicode characters, uniquely figuring out who wrote it from just the URL is difficult. If nothing else, it's a problem for another day.

For now, I'll fall back on my own data inputs and still more XLOOKUPs to collocate some meaningful data in that sheet.

If you're wondering where I get the URLs, pretty much every post, article, etc. has a "three dots" icon in the upper right if you're browsing LinkedIn on the web. Click that, and the drop-down menu contains some form of:

Screen capture of the "copy link" and "embed" options from the "three dots" popup menu on LinkedIn on the web.

Sadly, the "Copy link to post" comes with a bunch of tracking "stuff" in the URL. At some point, I'll automate filtering that out. For now, I just remove it as I paste it into the spreadsheet. Grumble!

YouTube

Lo, and behold! There is an API quota for free accounts querying the details of YouTube videos by video ID. I'm not sure what the number or number per day is exactly, but I hit it while testing these upgrades. Trying it again the next morning, much less than 24 hours later, I was below quota. So, that will take some more research.

On the flip side, that crashed my Python code properly with an un-handled exception since I didn't know which exception to handle. For that case, it was a googleapiclient.errors.HttpError, and the stack trace / details showed that it was explicitly a "reason: quotaExceeded". Huzzah!

There's still some data flow that results in adding a trailing space to the URLs I copy/paste or move into the spreadsheet, but the number of those is way down. Still looking for the pattern there.

In Excel, VLOOKUP is good, but XLOOKUP can solve a few more problems! I'm up to 40+ YouTube links that no longer allow lookups by video ID. Fortunately, I capture more data in my to do list / status sheet that I can mine back out with an XLOOKUP.

Why an XLOOKUP? In this case, the value against which I'm matching is in Column I (Comments), but what I want to retrieve is in Column H (Description). Going the other way (H->I) would work just fine for a VLOOKUP. Going from I->H means a little extra work.

Screen capture of an Excel sheet with the XLOOKUP highlighted.  Using it to lookup something in Column I of another sheet but return the value in Column H because an API call failed.

I still don't have a handle on how to deal with this particular situation in my Python code. The API query returns a nice, happy 200 like it (almost!) always does, but it returns no data. Now that I've seen an API failure, I know they're possible and come back as custom exception types.

Formatting

There may be some pivot tables in the future here, but for the recent sites versus old sites lists, I've got "conditional formatting" adding a light green fill for new site names (versus my first stab at this data).

Similarly, I've got some light red fill on the old site list versus the new site list. There are only a few, and they're not general-purpose URLs (Teams, Zooms, etc.), so that doesn't worry me much.

I definitely got the Python code wrong when I put quotes around the "=VLOOKUP" and "=XLOOKUP" functions. Oopsy! Those came through as normal text strings and had to be massaged a bit to work properly. We'll see if my workaround fixes that next month.

Conclusions

From April through June, I found work (or industry) related information on dozens more web sites than my first data capture. I started tracking work / industry-related learning bits from LinkedIn, and I upgraded the Python to do more of the data processing work. There are plenty of opportunities left for API calls, more formatting, etc.

Making this some kind of rolling, monthly update will take more thought. Comparing against the first list of URLs is easy. Comparing against URLs, posters / creators, etc. is tougher.

Futures

At some point, I may get back in the groove of using the "openpyxl" Python library that I have loved in a few jobs. A mashup of old code from here and there would probably let me up-load the current Excel file, have the Python code extract the data from the "Comments" column, do all of the work to update the "sites" sheet, then send back an updated Excel file with the changes in one shot. Future plans!

To view or add a comment, sign in

More articles by Chris Petersen

  • The Not-So-AI Not-So-Discussion

    I spend very little time on the social media site formerly known as Twitter these days, so it’s rare to see much that…

  • What Does Officer Frog Portend for AI in Healthcare?

    The Background I don’t mean this as #AI doomerism. I’m just constructing some plausible scenarios and asking the folks…

  • The ROI of AI Question

    This seems to be one of the big questions of the last few years. How many #AI "projects" reach break even or show a…

  • The Story of an AIX Workaround

    Victor Moreno’s post about things you probably won’t see in a Big Tech environment sent me down a rabbit hole of memory…

  • More Thoughts on SpaceX

    In that sort of "don't bring me problems, bring me solutions" vein, I've been thinking (again) about how I'd do things…

    4 Comments
  • 20 Ways GenAI Will / Won’t / Might Support Stronger Cybersecurity

    Let’s be charitable and call this a “reaction piece” rather than an unhinged rant, shall we? 20 Ways GenAI Will Support…

  • They’re All Connected – Part 2 of Many

    LinkedIn really couldn't do much worse with the heading graphic..

  • The Fortune of IBM - Thoughts

    Got to love LinkedIn’s limits on comments and posts… Wimps!!! Here are some long form thoughts in response to Libby…

    6 Comments
  • They’re All Connected – Part 1 of Many

    The overall theme of this post series has been percolating in my brain for quite a while. I’ve wondered where and how…

  • Data-Driven Part 3

    It’s been a month (ish) since Data Driven Part 2. What have I learned about what I have learned in the time since? The…

Explore content categories