How can we know where AI talent is located? In this project, I use various data sources to get to an answer. The map below shows one example: the locations (cities) of individuals who have forked the important machine-learning repositories tensorflow and scikit-learn from github. As a comparison, I also plot the locations of individuals who have forked the repository for ruby on rails, which is not related to machine-learning.

You can switch the different data source on and off using the buttons in the upper-right corner. The full-sized map is here.

A repository is a collection of computer code for a specific project – in this case, code for machine-learning applications. People fork repositories if they want to copy and modify the code. However, forking a repository does not mean that one is actually using its code. Still, it seems safe to assume that only people with at least an interest in machine-learning are going to fork these repositories.

Here is an example of users who have forked tensorflow.

To collect the data, I use github’s api to download information about who has forked a certain repository. I then use an individual’s self-provided location, geocode it, and display it on a map, using mapbox.