Developer Documentation

A DataSource is basically a Javascript function which gathers data. This can be done via eventListener functions, after regular time intervals, or any other way you can think of. The created datapoints are then passed to the Metro Client which we provide in the code and automatically sent off to Metro.

Assuming you have a DataSource attached to a Project and some users have enabled the DataSource, the only other thing you need to do as a developer is to pick up your data at the other end of the system by clicking the "Download Data" button in your Project's page.


Here we will do a full walkthrough of creating a DataSource which calculates the dwell time of the user on a website. It's one of the more basic types of DataSource possible but hopefully it gives a clear indication of what kind of things are possible.

Getting the repo and setup

The first step is to fork our DataSource repository, which can be found here. Then clone your fork down on to your machine.

You can copy the example DataSource directory and re-name it - e.g. cp -R example-datasource my-datasource. Move into this directory and you will see the three files you need to create to make your datasource: manifest.json, schema.json and plugin.js:

Structure of the files

manifest.json

This file contains metadata concerning your new datasource.

  • name

    DataSource name. Must match the directory name of your DataSource in the repository.

  • author

    Must match your Metro username or else the merge will silently fail.

  • description
    • This should be a description of what exact data your datasource gathers.
    • It needs to be easily understandable, containing no technical terms that an average user wouldn't understand. Think about how you would explain what the datasource does to your grandparents!
    • See the example datasources for some examples.
  • version

    The release version of the DataSource. General structure of major_release_number.minor_release_number.maintenance_release_number applies - e.g. 1.3.1

  • sites

    This is a list of regex-like strings which describes the URLs on which the DataSource will be active.

schema.json

This file describes the JSON object which the DataSource will be giving to the MetroClient to send.
It must be a plain (i.e. not nested) JSON dictionary where the keys match the keys of the data point's JSON Object.

plugin.js

  • This file is where the code to gather the data lives. It must contain one function, initDataSource(metroClient) which is the entry point of your code.
  • The metroClient object is detailed below.

First Steps

After making your schema and manifest files, you want to start on your plugin. First you should start with a simple initDataSource with a log statement to verify your plugin is running:

function initDataSource(metroClient) {
    console.log("Datasource running!");
}

You have two options at this point, you can push this initial branch of code to your fork of the project or you can run a simple HTTP server locally. You just need this code to be accessible via a HTTP call.

Testing a fork

  • This is the easiest way. First commit your code and push it to somewhere publicly accessible. In this case, github.
  • Then navigate to the RAW version of your source file. e.g. https://raw.githubusercontent.com/RoryOfByrne/MetroDataSources/wikipedia-dwell-time/datasources/wikipedia-dwell-time/plugin.js.
  • Take this URL and remove the plugin.js part.
  • Paste this base URL into the input box in the Metro settings dev mode and hit enter.
  • Your Datasource should now be running!

Testing from localhost

  • You shouldn't need any help if you want to do it this way! Make the datasource directory accessible over HTTP. Using python, you can do this by running: python -m http.server 8000 from the directory containing your datasource.
  • Now open the Metro settings dev mode view and type the URL to access it followed by enter. In my example, that's http://127.0.0.1:8000/

Open up the developer console by hitting ctrl + shift + i and you should see that your datasource is loading and printing to the console when you visit a page allowed by your manifest file.

Gathering Some Data

For this simple example, we just want to gather dwell time information on wikipedia pages. The specific datapoints we want are:

  • Page Load - The Unix time in ms of when the page was first loaded.
  • Page Leave - The Unix time in ms of when the page was left.
  • URL - The URL of the wikipedia page the user was on.

First we get the page load time:

function initDataSource(metroClient) {
    let loadTime = (new Date).getTime();
    console.log("loadTime: "+loadTime);
}

Then we want to get the time when the user leaves the page:

window.addEventListener("beforeunload", function() {
    let leaveTime = (new Date).getTime();
    console.log("leaveTime: " + leaveTime);
});

Now we want to get the URL of the page:

let URL = window.location.href;
console.log(URL);

Now we have all of our individual components, we want to make the datapoint that Metro will send for us. To do this, we just need to make an object matching the structure of the object we have made in the schema.json file:

let datapoint = {
    "loadTime": loadTime,
    "leaveTime": leaveTime,
    "URL": URL
}

and use the metroClient to send it off!

metroClient.sendDatapoint(datapoint);

Metro should tell you that the datapoint is not being published as you are in dev mode, but it should print out the datapoint that would be sent if the datasource was approved. It should be a JSON escaped string similar to:

{
    "projects":"test-user",
    "timestamp":1518027785026,
    "data":"{\"loadTime\":1518027785021,\"leaveTime\":1518027785024,\"URL\":\"https://en.wikipedia.org/wiki/Falcon_Heavy\"}"
}

Congratulations! You have made your first datasource! If you want to submit it to the Metro website, you can make a pull-request on GitHub and we will review it.