Table of Contents
FX rates forecasting in ultra high frequency setting, using Deep Learning techniques. The main focus of the research is to predict the increments in the next few seconds for a set of different FX pairs.
The project is explained with more detail at documentation.
Execute the following command to start the container:
docker run -it --rm jpxkqx/dl-fx-forecasting:firsttry
In case, the data is already in processed in the host machine, the following command may be more appropriate.
docker run -it -v "/path/to/data:/app/data" --rm jpxkqx/dl-fx-forecasting:firsttry
The path /path/to/data refers to the directory containing the data as presented in the project organization below. In case all processed information is available, it is possible to execute all scripts.
Read, load, preprocess and save the data for the currency pair specified. To go through this pipeline, the ZIP files have to be in the host machine, and the path to the folder containing this data must be specified as an environment variable called PATH_RAW_DATA. The following command process the data available in the host machine for currency pair EUR/USD.
generate_datasets eur usd
In this case, the historical data has been extracted from True FX, whose first prices are shown below.
FX pair | Timestamp | Low | High |
---|---|---|---|
EUR/USD | 20200401 00:00:00.094 | 1.10256 | 1.10269 |
EUR/USD | 20200401 00:00:00.105 | 1.10257 | 1.1027 |
EUR/USD | 20200401 00:00:00.193 | 1.10258 | 1.1027 |
EUR/USD | 20200401 00:00:00.272 | 1.10256 | 1.1027 |
EUR/USD | 20200401 00:00:00.406 | 1.10258 | 1.1027 |
EUR/USD | 20200401 00:00:00.415 | 1.10256 | 1.1027 |
EUR/USD | 20200401 00:00:00.473 | 1.10257 | 1.1027 |
EUR/USD | 20200401 00:00:00.557 | 1.10255 | 1.10268 |
This data is processed by the following command, which computes the mid price and spread and filter some erroneus data points. The processed information is stored using Apache Parquet in order to achieve faster reading times.
Then, plot the currency pair EUR/USD for the period from 25 May, 202 to 30 May, 2020.
plot_currency_pair eur usd mid H T S --period 2020-05-25 2020-05-31
To get the following image,
There is also the possibility to plot the cumulative distribution function using the following command
plot_cdf eur usd increment --period 2020-04-01 2020-06-01
which gives the image shown below,
In order to plot the distribution of the main daily statistic of the spread, the following command can be used.
plot_stats eur usd spread D --period 2020-04-01 2020-06-01
In addition, the correlation between the different currency pairs aggregated by any timeframe can also be plotted for any given period of time.
plot_pair_correlations increment --period 2020-04-01 2020-06-01 --agg_frame H
Lastly, the correlation between currency pairs is represented as follows,
plot_pair_acf increment eur usd --agg_frame 'H' --period 2020-04-01 2020-06-01
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data
βΒ Β βββ external <- Data from third party sources.
βΒ Β βββ interim <- Intermediate data that has been transformed.
βΒ Β βββ processed <- The final, canonical data sets for modeling.
βΒ Β βββ raw <- The original, immutable data dump.
β
βββ docs <- A deafult MkDocs project.
| βββ index.md
β
βββ models <- Trained and serialized models, model predictions, or model summaries
| βββ configurations <- YAML files with model configurations
| βββ features <- Contains model selection results, test results and fitted models, under the path
| | models/features/{ model }/{ fx_pair }/{ aux_pair}/{ variables concat with _}
| | In particular, the models used EWMA's of a fixed number of past observations.
β βββ raw <- Contains model selection results, test results and fitted models, under the path
| models/features/{ model }/{ fx_pair }/{ aux_pair}/{ variables concat with _}
| In particular, the models used all the past observations.
β
βββ notebooks <- Jupyter notebooks. Containing the results for the training process of diffferent models
| βββ train...hmtl <- Output code to include in VC.
β βββ train...ipynb <- Python notebooks considered. Not included in VC.
|
|
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
βΒ Β βββ figures <- Generated graphics and figures to be used in reporting, README, and docs
βΒ Β βββ images <- Generated graphics and figures of EDA. Not included in VC.
βΒ Β βββ models <- Generated graphics and figures of model results. Not included in VC
β
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
β generated with `pip freeze > requirements.txt`
β
βββ setup.py <- makes project pip installable (pip install -e .) so src can be imported
βββ src <- Source code for use in this project.
βΒ Β βββ __init__.py <- Makes src a Python module
β β
βΒ Β βββ data <- Scripts to download or generate data
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ data_extract.py
βΒ Β βΒ Β βββ data_loader.py
βΒ Β βΒ Β βββ data_preprocess.py
βΒ Β βΒ Β βββ utils.py
βΒ Β βΒ Β βββ constants.py
β β
βΒ Β βββ features <- Scripts to turn raw data into features for modeling
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ get_blocks.py
βΒ Β βΒ Β βββ build_features.py
β β
βΒ Β βββ models <- Scripts to train models and then use trained models to make
β β β predictions
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ neural_network.py
βΒ Β βΒ Β βββ model_selection.py
βΒ Β βΒ Β βββ model_utils.py
βΒ Β βΒ Β βββ train_model.py
β β
βΒ Β βββ scripts <- Scripts to create CLI entrypoints
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ click_utils.py
βΒ Β βΒ Β βββ generate_datasets.py
βΒ Β βΒ Β βββ plot_currency_pair.py
βΒ Β βΒ Β βββ plot_pair_correlations.py
β β
βΒ Β βββ visualization <- Scripts to create exploratory and results oriented visualizations
| βββ __init__.py
| βββ line_plot.py
| βββ plot_correlations.py
| βββ plot_results.py
βΒ Β βββ currency_pair.py
βΒ Β
βββ tests
β Β βββ data <- Data needed to test the functionalities.
β Β βββ mocks.py
β Β βββ test_cli_scripts.py
β Β βββ test_dataset_generation.py
β Β βββ test_visualization.py
β
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
π¨ Mario Santa Cruz LΓ³pez
BSc in Mathematics at Universidad de Cantabria
MSc in Statistics at Imperial College London
GitHub: @jpxkqx
LinkedIn: @mariosanta-cruz
Software developer at Predictia Intelligent Data Solutions