Level-Up using Gists with Python

4 min readDec 7, 2018

Get the Gist

A less known feature of Github are “gists” (code chunks, and micro-repos used for quick reuse ⚡️). These bite-sized bits are searchable in the open community or accessible in secret. They can be stored indefinitely for free. Beware, the secret gists can be scraped by 3rd parties and publicly accessible by their discovered links from bots. This pseudo-private feature makes gists great for sharing one-on-one, but they lack the deep privacy which the commercial repositories (“repos”) provide.

Through the years of working as a Data Scientist, I’ve relied heavily on my gists. I’ve created many useful snippets for financial time series in both R / Python, and have found them to be a way to quick way to share code among peers on Twitter and with teams, even for personal and less fun repeat tasks such as updating LINUX distros. I’ve been using gists since 2014, and love them!

In the gist ecosystem, where half completed code chunks across every programming language can be searched by title — this is a place where seasoned coders go to quickly store useful chunks of code. We all use StackOverflow and other version control systems (e.g., Github, Bitbucket, Google Cloud Source), and most of us have used gists for good reason. For those who rely on their gists, this post is for you. For those who want to explore gists on a deeper level, this post is also for you.

The Crux: I’ve noticed through time, how searching for gists is limited, in that the dynamic search only returns part of the gist title, which makes it difficult to find code which is similarly named. If anyone knows someone working for Github, I hope they fix this someday with a hover title after searching via keywords. When many similarly named gists exist in my own gist global repository, it makes it nearly impossible to filter the good from the useful. Therefore, I decided to make a multi-step blog post to download gists using Python code, and explore them openly, using the gists’ metadata in a simplified and practical way.

Step 1: Download all Public or Secret Gists from a User (Python 3.5)

## Import all Gists from a Github users account

import subprocess
import sys
import pandas as pd
import json 

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

# Used for both public and private
username = input("Username:") 
# Use password if you want both private and public gists, otherwise use "".
password = input("Password (blank for public only):") 

    
proc = subprocess.Popen(["curl", "--user", username + ":" + 
                        password, "https://api.github.com/users/" + username + "/gists"], 
                        stdout = subprocess.PIPE)
      
output = StringIO(proc.communicate()[0].decode('utf-8'))
# print(output.getvalue())

# Convert json to dataframe
data = pd.DataFrame(json.loads(output.getvalue()))
data

Note the nested dict after ‘description’ column. We’ll unpack that later.

Step 2: Choose Multiple Terms to Search by Title

Type in “Py” when prompted, and you’ll get all gists with “Python” in title.

## Filter gists by terms
import numpy as nptext = input("Search terms:").lower()
searchfor = text.split()
filteredData = data[data.description.str.lower()
   .str.contains('|'.join(searchfor))[['description','files','url']]
for i in filteredData.description:
    print(i)
filteredData

Step 3: Get the Link to a Particular Gist

## Input index of filtered gists from table above
indexNum = int(input("Choose index # from Above:"))
filteredFiles = list(filteredData[filteredData.index ==   indexNum].files)[0]
keys = list(filteredFiles.keys())## Print each sub gist
for key in keys:
    print(filteredFiles[key]['raw_url'])

Step 4: Executing Code Directly from Gists

Here we’ll download a dataframe with stock spanning a few years from the tech industry via calling our gist from above.

## Executing code directly fromm Gists# Confirm Jupyter Notebook uses Conda env Python version
    # Install: pip install fix_yahoo_finance --upgrade --no-cache-dir
    # Install: pip install git+https://github.com/pydata/pandas-datareader.gitimport requests
url = "https://gist.githubusercontent.com/ZeccaLehn/9d4946bf1fe07c27c3ad10ef83093413/raw/f2d6f8c1c183a8c676cb2f797f3482c7989fce0e/multipleStocks.py"
response = requests.get(url)
exec(response.text)
allData## Note: For JSON Direct downloads
    # proc = subprocess.Popen(["curl", url], stdout = subprocess.PIPE)
    # output = StringIO(proc.communicate()[0].decode('utf-8'))
    # output.getvalue()
    # output = json.loads(output.getvalue())
    # output

Summary:

The above code should give you a unique view into other user’s working code, and types of projects they focus on. It may help you find code more easily in the gist ecosystem if collected via iterating through trees. There are other attributes of gist dicts which I may explore later — e.g., followers, stars, forks, NLP language statistics for users, topic networks shared by gist followers.

Please feel free to use and share the above code, which I’ve also linked via a jupyter notebook here — auto rendered when viewed on your laptop browser.