Django Elastic Migrations

django-elastic-migrations is a Django app for creating, indexing and changing schemas of Elasticsearch indexes.

Overview

Elastic has given us basic python tools for working with its search indexes:

elasticsearch-py, a python interface to elasticsearch's REST API
elasticsearch-dsl-py, a Django-esque way of declaring Elasticsearch schemas, built upon elasticsearch-py

Django Elastic Migrations adapts these tools into a Django app which also:

Provides Django management commands for listing indexes, as well as performing create, update, activate and drop actions on them
Implements concurrent bulk indexing powered by python multiprocessing
Gives Django test hooks for Elasticsearch
Records a history of all actions that change Elasticsearch indexes
Supports AWS Elasticsearch 6.0, 6.1 (6.2 TBD; see #3 support elasticsearch-dsl 6.2)
Enables having two or more servers share the same Elasticsearch cluster

Models

Django Elastic Migrations provides comes with three Django models: Index, IndexVersion, and IndexAction:

Index - a logical reference to an Elasticsearch index. Each Index points to multiple IndexVersions, each of which contains a snapshot of that Index schema at a particular time. Each Index has an active IndexVersion to which all actions are directed.
IndexVersion - a snapshot of an Elasticsearch Index schema at a particular point in time. The Elasticsearch index name is the name of the Index plus the primary key id of the IndexVersion model, e.g. movies-1. When the schema is changed, a new IndexVersion is added with name movies-2, etc.
IndexAction - a record of a change that impacts an Index, such as updating the index or changing which IndexVersion is active in an Index.

Management Commands

Use ./manage.py es --help to see the list of all of these commands.

Read Only Commands

./manage.py es_list
- help: For each Index, list activation status and doc count for each of its IndexVersions
- usage: ./manage.py es_list

Action Commands

These management commands add an Action record in the database, so that the history of each Index is recorded.

./manage.py es_create - create a new index.
./manage.py es_activate - activate a new IndexVersion. all updates and reads for that Index by will then go to that version.
./manage.py es_update - update the documents in the index.
./manage.py es_clear - remove the documents from an index.
./manage.py es_drop - drop an index.
./manage.py es_dangerous_reset - erase elasticsearch and reset the Django Elastic Migrations models.

For each of these, use --help to see the details.

Usage

Installation

pip install django-elastic-migrations; see django-elastic-migrations on PyPI
Put a reference to this package in your requirements.txt
Ensure that a valid elasticsearch-dsl-py version is accessible, and configure the path to your configured Elasticsearch singleton client in your django settings: DJANGO_ELASTIC_MIGRATIONS_ES_CLIENT = "tests.es_config.ES_CLIENT". There should only be one ES_CLIENT instantiated in your application.
Add django_elastic_migrations to INSTALLED_APPS in your Django settings file

Add the following information to your Django settings file:

DJANGO_ELASTIC_MIGRATIONS_ES_CLIENT = "path.to.your.singleton.ES_CLIENT"
# optional, any unique number for your releases to associate with indexes
DJANGO_ELASTIC_MIGRATIONS_GET_CODEBASE_ID = subprocess.check_output(['git', 'describe', "--tags"]).strip()
# optional, can be used to have multiple servers share the same
# elasticsearch instance without conflicting
DJANGO_ELASTIC_MIGRATIONS_ENVIRONMENT_PREFIX = "qa1_"

Create the django_elastic_migrations tables by running ./manage.py migrate

Create an DEMIndex:

from django_elastic_migrations.indexes import DEMIndex, DEMDocType
from .models import Movie
from elasticsearch_dsl import Text

MoviesIndex = DEMIndex('movies')


@MoviesIndex.doc_type
class MovieSearchDoc(DEMDocType):
    text = TEXT_COMPLEX_ENGLISH_NGRAM_METAPHONE

    @classmethod
    def get_queryset(self, last_updated_datetime=None):
        """
        return a queryset or a sliceable list of items to pass to
        get_reindex_iterator
        """
        qs = Movie.objects.all()
        if last_updated_datetime:
            qs.filter(last_modified__gt=last_updated_datetime)
        return qs

    @classmethod
    def get_reindex_iterator(self, queryset):
        return [
            MovieSearchDoc(
                text="a little sample text").to_dict(
                include_meta=True) for g in queryset]

Add your new index to DJANGO_ELASTIC_MIGRATIONS_INDEXES in settings/common.py

Run ./manage.py es_list to see the index as available:

./manage.py es_list

Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
|   Index Base Name    |         Index Version Name          | Created | Active | Docs  |    Tag    |
+======================+=====================================+=========+========+=======+===========+
| movies               |                                     | 0       | 0      | 0     | Current   |
|                      |                                     |         |        |       | (not      |
|                      |                                     |         |        |       | created)  |
+----------------------+-------------------------------------+---------+--------+-------+-----------+
Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.

Create the movies index in elasticsearch with ./manage.py es_create movies:

$> ./manage.py es_create movies
The doc type for index 'movies' changed; created a new index version
'movies-1' in elasticsearch.
$> ./manage.py es_list

Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
|   Index Base Name    |         Index Version Name          | Created | Active | Docs  |    Tag    |
+======================+=====================================+=========+========+=======+===========+
| movies               | movies-1                            | 1       | 0      | 0     | 07.11.005 |
|                      |                                     |         |        |       | -93-gd101 |
|                      |                                     |         |        |       | a1f       |
+----------------------+-------------------------------------+---------+--------+-------+-----------+

Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.

Activate the movies-1 index version, so all updates and reads go to it.

./manage.py es_activate movies
For index 'movies', activating 'movies-1' because you said so.

Assuming you have implemented get_reindex_iterator, you can call ./manage.py es_update to update the index.

$> ./manage.py es_update movies

Handling update of index 'movies' using its active index version 'movies-1'
Checking the last time update was called:
 - index version: movies-1
 - update date: never
Getting Reindex Iterator...
Completed with indexing movies-1

$> ./manage.py es_list

Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
|   Index Base Name    |         Index Version Name          | Created | Active | Docs  |    Tag    |
+======================+=====================================+=========+========+=======+===========+
| movies               | movies-1                            | 1       | 1      | 3     | 07.11.005 |
|                      |                                     |         |        |       | -93-gd101 |
|                      |                                     |         |        |       | a1f       |
+----------------------+-------------------------------------+---------+--------+-------+-----------+

Deployment

Creating and updating a new index schema can happen before you deploy. For example, if your app servers are running with the movies-1 index activated, and you have a new version of the schema you'd like to pre-index, then log into another server and run ./manage.py es_create movies followed by ./manage.py es_update movies --newer. This will update documents in all movies indexes that are newer than the active one.
After deploying, you can run ./manage.py es_activate movies to activate the latest version. Be sure to cycle your gunicorn workers to ensure the change is caught by your app servers.
During deployment, if get_reindex_iterator is implemented in such a way as to respond to the datetime of the last reindex date, then you can call ./manage.py es_update movies --resume, and it will index only those documents that have changed since the last reindexing. This way you can do most of the indexing ahead of time, and only reindex a portion at the time of the deployment.

Django Testing

Override TestCase to provide test isolation when search indexes are involved

from django_elastic_migrations.utils.test_utils import DEMTestCaseMixin

class MyTestCase(DEMTestCaseMixin, TestCase):
    """
    Set up and tear down temporary elasticsearch test indexes for each test
    """

Excluding from Django's `dumpdata` command

When calling django's dumpdata command, you likely will want to exclude the database tables used in this app:

from django.core.management import call_command
params = {
    'database': 'default',
    'exclude': [
        # we don't want to include django_elastic_migrations in dumpdata,
        # because it's environment specific
        'django_elastic_migrations.index',
        'django_elastic_migrations.indexversion',
        'django_elastic_migrations.indexaction'
    ],
    'indent': 3,
    'output': 'path/to/my/file.json'
}
call_command('dumpdata', **params)

An example of this is included with the moviegen management command.

Tuning Bulk Indexing Parameters

By default, /.manage.py es_update will divide the result of DEMDocType.get_queryset() into batches of size DocType.BATCH_SIZE. Override this number to change the batch size.

There are many configurable paramters to Elasticsearch's bulk updater. To provide a custom value, override DEMDocType.get_bulk_indexing_kwargs() and return the kwargs you would like to customize.

Development

This project uses make to manage the build process. Type make help to see the available make targets.

Elasticsearch Docker Compose

This will enable you to serve elasticsearch via docker:

docker-compose up

See docs/docker_setup for more info

Requirements

This project uses pip-tools. The requirements.txt files are generated and pinned to latest versions with make upgrade:

run make requirements to run the pip install.
run make upgrade to upgrade the dependencies of the requirements to the latest versions. This process also excludes django and elasticsearch-dsl from the requirements/test.txt so they can be injected with different versions by tox during matrix testing.

Populating Local `tests_movies` Database Table With Data

It may be helpful for you to populate a local database with Movies test data to experiment with using django-elastic-migrations. First, migrate the database:

./manage.py migrate --run-syncdb --settings=test_settings

Next, load the basic fixtures:

./manage.py loaddata tests/100films.json

You may wish to add more movies to the database. A management command has been created for this purpose. Get a Free OMDB API key here, then run a query like this (replace MYAPIKEY with yours):

$> ./manage.py moviegen --title="Inception" --api-key="MYAPIKEY"
{'actors': 'Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, Tom Hardy',
 'awards': 'Won 4 Oscars. Another 152 wins & 204 nominations.',
 'boxoffice': '$292,568,851',
 'country': 'USA, UK',
 'director': 'Christopher Nolan',
 'dvd': '07 Dec 2010',
 'genre': 'Action, Adventure, Sci-Fi',
 'imdbid': 'tt1375666',
 'imdbrating': '8.8',
 'imdbvotes': '1,721,888',
 'language': 'English, Japanese, French',
 'metascore': '74',
 'plot': 'A thief, who steals corporate secrets through the use of '
         'dream-sharing technology, is given the inverse task of planting an '
         'idea into the mind of a CEO.',
 'poster': 'https://m.media-amazon.com/images/M/MV5BMjAxMzY3NjcxNF5BMl5BanBnXkFtZTcwNTI5OTM0Mw@@._V1_SX300.jpg',
 'production': 'Warner Bros. Pictures',
 'rated': 'PG-13',
 'ratings': [{'Source': 'Internet Movie Database', 'Value': '8.8/10'},
             {'Source': 'Rotten Tomatoes', 'Value': '86%'},
             {'Source': 'Metacritic', 'Value': '74/100'}],
 'released': '16 Jul 2010',
 'response': 'True',
 'runtime': 148,
 'title': 'Inception',
 'type': 'movie',
 'website': 'http://inceptionmovie.warnerbros.com/',
 'writer': 'Christopher Nolan',
 'year': '2010'}

To save the movie to the database, use the --save flag. Also useful is the --noprint option, to suppress json. Also, if you add OMDB_API_KEY=MYAPIKEY to your environment variables, you don't have to specify it each time:

$ ./manage.py moviegen --title "Closer" --noprint --save
Saved 1 new movie(s) to the database: Closer

Now that it's been saved to the database, you may want to create a fixture, so you can get back to this state in the future.

$ ./manage.py moviegen --makefixture=tests/myfixture.json
dumping fixture data to tests/myfixture.json ...
[...........................................................................]

Later, you can restore this database with the regular loaddata command:

$ ./manage.py loaddata tests/myfixture.json
Installed 101 object(s) from 1 fixture(s)

There are already 100 films available using loaddata as follows:

$ ./manage.py loaddata tests/100films.json

Running Tests Locally

See README_TESTS.md for more information. High level summary:

Run make test. To run all tests and quality checks locally, run make test-all.

To just run linting, make quality. Please note that if any of the linters return a nonzero code, it will give an InvocationError error at the end. See tox's documentation for InvocationError for more information.

We use edx_lint to compile pylintrc. To update the rules, change pylintrc_tweaks and run make pylintrc.

Cutting a New Version

optional: run make update to update dependencies
bump version in django_elastic_migrations/__init__.py.
update CHANGELOG.rst.
make clean
python3 setup.py sdist bdist_wheel
twine check dist/django-elastic-migrations-*.tar.gz to see if there are any syntax mistakes before tagging
submit PR bumping the version
ensure test matrix is passing on travis and merge PR
pull changes to master
make clean
python3 setup.py sdist bdist_wheel
twine check dist/django-elastic-migrations-*.tar.gz to see if there are any syntax mistakes before tagging
twine upload -r testpypi dist/django-elastic-migrations-*.tar.gz
Check it at https://test.pypi.org/project/django-elastic-migrations/
python3 setup.py tag to tag the new version
twine upload -r pypi dist/django-elastic-migrations-*.tar.gz
Update new release at https://github.com/HBS-HBX/django-elastic-migrations/releases

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
.github/workflows		.github/workflows
django_elastic_migrations		django_elastic_migrations
docs		docs
requirements		requirements
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CHANGELOG.rst		CHANGELOG.rst
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
README_TESTS.md		README_TESTS.md
codecov.yml		codecov.yml
docker-compose.yml		docker-compose.yml
manage.py		manage.py
pylintrc		pylintrc
pylintrc_tweaks		pylintrc_tweaks
setup.cfg		setup.cfg
setup.py		setup.py
test_settings.py		test_settings.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Django Elastic Migrations

Overview

Models

Management Commands

Read Only Commands

Action Commands

Usage

Installation

Deployment

Django Testing

Excluding from Django's `dumpdata` command

Tuning Bulk Indexing Parameters

Development

Elasticsearch Docker Compose

Requirements

Populating Local `tests_movies` Database Table With Data

Running Tests Locally

Cutting a New Version

About

Releases 14

Packages

Contributors 4

Languages

License

HBS-HBX/django-elastic-migrations

Folders and files

Latest commit

History

Repository files navigation

Django Elastic Migrations

Overview

Models

Management Commands

Read Only Commands

Action Commands

Usage

Installation

Deployment

Django Testing

Excluding from Django's dumpdata command

Tuning Bulk Indexing Parameters

Development

Elasticsearch Docker Compose

Requirements

Populating Local tests_movies Database Table With Data

Running Tests Locally

Cutting a New Version

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Contributors 4

Languages

Excluding from Django's `dumpdata` command

Populating Local `tests_movies` Database Table With Data

Packages