With Django Datawatch you are able to implement arbitrary checks on data, review their status and even describe what to do to resolve them. Think of nagios/icinga for data.
Currently celery is required to run the checks. We'll be supporting different backends in the future.
$ pip install django-datawatch
Add django_datawatch
to your INSTALLED_APPS
Create checks.py
inside your module.
from datetime import datetime
from dateutil.relativedelta import relativedelta
from django_datawatch.datawatch import datawatch
from django_datawatch.base import BaseCheck, CheckResponse
from django_datawatch.models import Result
@datawatch.register
class CheckTime(BaseCheck):
run_every = relativedelta(minute=5) # scheduler will execute this check every 5 minutes
def generate(self):
yield datetime.datetime.now()
def check(self, payload):
response = CheckResponse()
if payload.hour <= 7:
response.set_status(Result.STATUS.ok)
elif payload.hour <= 12:
response.set_status(Result.STATUS.warning)
else:
response.set_status(Result.STATUS.critical)
return response
def get_identifier(self, payload):
# payload will be our datetime object that we are getting from generate method
return payload
def get_payload(self, identifier):
# as get_identifier returns the object we don't need to process it
# we can return identifier directly
return identifier
Must yield payloads to be checked. The check method will then be called for every payload.
Must return an instance of CheckResponse.
Must return a unique identifier for the payload.
A management command is provided to queue the execution of all checks based on their schedule. Add a crontab to run this command every minute and it will check if there's something to do.
$ ./manage.py datawatch_run_checks
$ ./manage.py datawatch_run_checks --slug=example.checks.UserHasEnoughBalance
A management command is provided to forcefully refresh all existing results for a check. This comes in handy if you changes the logic of your check and don't want to wait until the periodic execution or an update trigger.
$ ./manage.py datawatch_refresh_results
$ ./manage.py datawatch_refresh_results --slug=example.checks.UserHasEnoughBalance
$ ./manage.py datawatch_list_checks
Remove the unnecessary check results if you've removed the code for a check.
$ ./manage.py datawatch_delete_ghost_results
DJANGO_DATAWATCH_BACKEND = 'django_datawatch.backends.synchronous'
DJANGO_DATAWATCH_CELERY_QUEUE_NAME = 'django_datawatch'
DJANGO_DATAWATCH_RUN_SIGNALS = True
You can chose the backend to run the tasks. Supported are 'django_datawatch.backends.synchronous' and 'django_datawatch.backends.celery'.
Default: 'django_datawatch.backends.synchronous'
You can customize the celery queue name for async tasks (applies only if celery backend chosen).
Default: 'django_datawatch'
Use this setting to disable running post_save updates during unittests if required.
Default: True
We've included an example app to show how django_datawatch works. Start by launching the included vagrant machine.
vagrant up
vagrant ssh
Then setup the example app environment.
./manage.py migrate
./manage.py loaddata example
The installed superuser is "example" with password "datawatch".
Run the development webserver.
./manage.py runserver 0.0.0.0:8000
Login on the admin interface and open http://ddw.dev:8000/ afterwards. You'll be prompted with an empty dashboard. That's because we didn't run any checks yet. Let's enqueue an update.
./manage.py datawatch_run_checks --force
The checks for the example app are run synchronously and should be updated immediately. If you decide to switch to the celery backend, you should now start a celery worker to process the checks.
celery worker -A example -l DEBUG -Q django_datawatch
You will see some failed check now after you refreshed the dashboard view.