COSC 419: Learning Analytics

A3: Mining GitHub Statistics [30 pts]

Due date: Feb 16, 2020, 11:59pm

In this assignment, you will grab GitHub data to generate a graph that shows all the collaborations among a specific group of users. Specific user logins will be provided to you.

What to submit:

Submit the following on Connect:


Specific Instructions

Use any programming language you want. (My solution uses Ruby and GraphViz.)

  1. Get comfortable with basic cURL commands (browse through the resources below and try them out)
  2. Connect to GitHub via its API to get user and repo information - try to get info from your own GitHub account first, then try a friend of yours
    Use the list of users here for your final solution: user names
  3. Understand the API output in JSON format
  4. Indirectly get collaborator information via commits
    For whatever reason, even if you can grab someone's repo info, you cannot grab that repo's list of collaborators directly if you are not the onwer. But: you can grab the repo's list of commits, and from there, see who made the commit and learn that person is a collaborator.
  5. Parse API output to obtain a condensed list of users, list of repos per user, and list of collaborators per repo
    For the above list of users, the condensed form I have is this: listcollabs.txt. Note that this might change if the users have changes in their repos recently.
  6. Create an adjacency matrix of collaboration based on parsed info. My adjacency matrix looks like: this.
  7. Visualize the collaboration matrix. I converted my matrix to a .dot format and then visualized it as a graph. My outputs look like this: graphcollabs.dot file and the actual graph.
Grading Criteria

For example, I wrote the following code in Ruby: My scripts are between 20-100 lines each. The sample users I used were "bohuie" and "mbojey". These two users collaborated on some repos in the past (but not all of them). After running all the scripts, a resulting list of collaborations I got from all the JSON files is shown here a3ex-listcollabs.txt. To visualize the info in a graph, I converted it to a3ex-graphcollabs.dot which looks like:


Resources: