Using Knime to Call Github Search API

Using Knime to Call Github Search API

If you have not yet heard of it, KNIME Analytics platform is an open source, free, fast, and versatile data workflow tool. Check it out at https://www.knime.com .

There are quite a few data integration and workflow tools available for analysts and engineers (see a small sampling of them here: Github repo: awesome-etl (Gui tools section)). Recently I have been using Knime a fair amount, and it is a fantastic tool. I have found it to be most competitive with expensive proprietary software packages in this space.

OK, So It Has Drag and Drop ETL. What Else Can It Do? **

I'm glad you asked! Another very helpful thing Knime can do is help you build low- or no-code REST API workflow integrations. Integrating with REST APIs (along with GraphQL and gRPC APIs) has become a critically important part of business data integration.

Scenario: Get a List of Github Repositories By Topic Search

The goal here is to search for some public Github repositories and get metrics associated with the repos. If the Topic I want to search for is "data integration," I will follow these four steps to make this happen using Knime and Github REST API:

  • Configure my required Github API search parameters within Knime. To do this, one easy way is to use the "Table Creator" Component. I name the 3 columns "search_url," "search_query," and "search_type." Then I populate them with values which I can use in the next step.

Article content
Table Creator Component

  • Next, build the Github REST API Search operation endpoint. There's a Knime Component for that: "String Manipulation." As the name suggests, I can create, edit, replace, and transform strings with this component. The two Knime built-in functions I need for this part are "join" and "urlencode."

The "recipe" for building the REST API Search URI is:

"search_url" concatenated with "search_type," then "?q=", and then "search_query"


Article content
String Manipulation Component - build & format the URI

When I execute the String Manipulation Component, the concatenated and properly formatted REST API URI result is shown in the screenshot:

Article content
uri_joined result


  • Third, configure the REST "Get Request" Component. Notice in the screenshot below that instead of using the URL configuration dialog, the URL is populated by the workflow column "url_joined" output from the previous step. I also bumped up the connection Timeout to 10 seconds. Note that the response will be saved to the new Knime workflow column named "body" (although you can rename this if you like).

Article content
Get Request - Configure Connection

Also, I need to add two HTTP Request header values (per the Github REST API documentation at the end of this post) before this request is ready to send to the Github REST endpoint.

Accept : "application/vnd.github+json"

Content-Type : "application/json; charset=utf-8"

Article content
Get Request - Configure Request Header


  • Last of all, execute the GET Request and inspect the Search API Response Body.

Article content
Execute the Request, Commander!

We can then take a look at the JSON Response body as it is populated in the Knime flow:

Article content
API Search Response - 78 Records returned

At the time the Github API search was run, it returned a count of 78 records associated with the Topic search with parameter "data integration." When that output is saved to a local file, the file contains 3,167 lines of text. If I'm going to share these results with people who don't like reading JSON-encoded data (which == most people), then I will need to follow up by doing some parsing and data wrangling.


Summary

  • At this point, I've been able to construct and execute a REST API request to search for repositories matching my query criteria.
  • Knime has enabled me to do this without writing any code.
  • The REST API has returned the results in a large nested JSON document structure. If I want to present these Github Repo search results for further analysis or reporting, I'll need to do some more work to parse that JSON output.
  • A future article will dive into "shredding" the JSON and preparing it for downstream consumption.


** By the way, the Drag and Drop ETL features of Knime are quite excellent.


Reference

Github REST API Documentation (Search-Repositories)


#DataEngineer #Knime #OpenSource

To view or add a comment, sign in

More articles by Michael Boyle

Others also viewed

Explore content categories