IAR Stats Gatherer

Installation

Add the gatherstats application to your INSTALLED_APPS configuration as usual.

Application configuration

class gatherstats.apps.Config(app_name, app_module)

Configuration for IAR Stats Gatherer application.

name = 'gatherstats'

The short name for this application. This should be the Python module name since Django uses this to import things.

ready()

Perform application initialisation once the Django platform has been initialised.

verbose_name = 'gatherstats'

The human-readable verbose name for this application.

Management commands

The following new management commands are provided by the gatherstats application.

gatherstats

Gather statistics from an IAR endpoint and write records to the DB.

class gatherstats.management.commands.gatherstats.Command(stdout=None, stderr=None, no_color=False, force_color=False)

Implementation of gatherstats management command.

add_arguments(parser)

Entry point for subclassed commands to add custom arguments.

handle(*args, **options)

The actual logic of the command. Subclasses must implement this method.

Models

class gatherstats.models.Statistic(*args, **kwargs)

Statistics from the IAR stats endpoint look like the following:

{
    "asset_counts": {
        "all": {
            "total": 1234,
            "completed": 1234,
            "with_personal_data": 1234
        },
        "by_institution": {
            "INSTA": {
                "total": 123,
                "completed": 123,
                "with_personal_data": 123
            }
            // ... etc
        }
    }
}

In order to allow this schema to change in the future, we convert this structured table into a series of rows recording the JavaScript-style key path and the value. For example, one row from the above would be created as:

from django.utils import timezone
from gatherstats.models import Stat

Statistic(
    endpoint='http://iar-backend.invalid/stats',
    key='asset_counts.by_institution.INSTA.completed',
    numeric_value=123,
    fetched_at=now(),
)

Sufficiently clever SQL can be used to query various things from this DB. For example, to get a list of all institutions with available statistics:

SELECT DISTINCT
    SUBSTRING(key FROM '^by_institution\.([^\.]+)\.[^\.]+$') AS institution
FROM
    gatherstats_statistic
WHERE key ~ '^by_institution\.([^\.]+)\.[^\.]+$'
ORDER BY
    institution;
exception DoesNotExist
exception MultipleObjectsReturned
endpoint

URL for the endpoint this statistic was fetched from.

fetched_at

By design, this schema “flattens” statistics into rows and multiple rows may represent a single query to the stats endpoint. It is for this reason that we do not default this to now() and instead require that it be set explicitly.

key

Key path for statistic. E.g. “asset_counts.by_institution.UIS.total”.

numeric_value

This field is called “numeric_value” to allow for other types to be stored in the future. When that future comes, this field needs to have null and blank set to True and a custom validation will need to be added that checks that some value is set. In these simpler times, we can just rely on this being a non-NULL, non-blank field.

class gatherstats.models.StatisticManager

Custom object manager for Statistic. Accessed via Statistic.objects.

create_from_stats_response(endpoint, body, fetched_at=None)

Create Statistic instances from a dictionary representation of a stats endpoint response body.

The object creation is done within a database atomic transaction so other users of the database never see a partially processed object.

If fetched_at is None, timezone.now() is used.