- Introduction
- Adding a Graph
- Graph Types
- Other Features
- Graphing Cohorts
- Performance
- Exporting graph data from CLI
- Limitations
RDMP has a basic graphing system called 'Aggregate Graphs'. These generate queries that produce Bar or Plot data that can then either:
- Be visualized in the RDMP Windows GUI Client
- Be visualized in the RDMP Cross Platform Console TUI
- Be exported as CSV for graphing in other tools (e.g. Excel)
Graphs can be added to any Catalogue through the right click context menu:
This will take you to the 'Edit' page where you can specify which dimensions to use, Axis, Pivot etc.
The RDMP graphing engine supports Bar and Plot graphs. Not all DBMS implement all graph types. The following table shows what is implemented:
Graph Type | Sql Server | MySql | Postgres | Oracle |
---|---|---|---|---|
Bar 1 Dimension | yes | yes | yes | yes |
Bar 2 Dimensions | yes | yes | no | yes |
Plot 1 Dimension | yes | yes | yes | no |
Plot 2 Dimensions | yes | yes | no | no |
Graph query generation is handled in the FAnsiSql library.
The simplest graph you can create has only a single Dimension (column). The following is an example created from the Biochemistry test dataset that is optionally provided with the RDMP installation.
You can create a 'Side-By-Side' bar chart by adding two Dimensions and marking one as PIVOT. The following example
shows a Dimension of Year(SampleDate)
and a PIVOT on healthboard. A filter has also been added to only show years
after 1980.
Plot graphs depict a time axis and a line for the data. The axis is continuous and can be set to Year/Quarter/Month increments. If no data exists for a date then the bar will drop to 0 (unlike in bar graphs where 0 counts are not shown).
Plot graphs can have a PIVOT dimension. The following example shows an AXIS of SampleDate
and a PIVOT on healthboard:
Behind every graph RDMP generates is a GROUP BY
statement (hence the word 'Aggregate'). The following SQL features are
supported for customizing what data is calculated.
All graphs come with a single 'count' column. This column cannot be removed and defaults to count(*)
. You can change this to any valid SQL e.g. count(distinct StudyInstanceUID)
. You are not restricted to count and can use any other aggregation function e.g. avg(Result)
.
Filters can be applied to the records returned. These use the normal RDMP filter system and can be added via the right click context menu.
In SQL a GROUP BY
query can contain a HAVING
block. This discards records after performing the aggregation (e.g. count). HAVING
differs from WHERE
because it is applied after counting all the data while WHERE
is applied before. WHERE
cannot reference the count
(since it has not been calculated yet) while HAVING
can.
You can add a HAVING
block e.g. count(*)>100
to show only bars / plot values where the count is over 100.
Take care when combining HAVING
with PIVOT
as HAVING
will discard plot points/bars. This can lead to a graph which appears to show 0 records
Bars with values <100 are not shown due to the HAVING condition. This results in several orange 'F' bars disapearing, even when there is a 'T' bar over 100
You can apply a TOP
(or LIMIT
in the case of MySql/Oracle). This will reduce the number of bars in a Bar chart or the number of series in a Plot.
Applying a TOP to PIVOT graphs limits the number of series in the PIVOT
One of the core strengths of the RDMP graphing system is the ability to run graphs on Filters, Cohorts and/or ExtractionConfiguration datasets. This lets you rapidly confirm that the cohort you are building does not have data holes or missing trends.
To graph a cohort right click it and go to Graph. Make sure you have set up a cohort caching database.
There are 2 options for generating graphs:
Explanation | |
---|---|
Records | Graph will show only rows returned by your cohort query (e.g. prescriptions for drug X) |
Matching Patients | Graph of all records held by people that appear in your cohort result set (e.g. all prescriptions that people on drug X are collecting) |
In the above example graphing the records would give a graph showing only drug X. While graphing patients will show a graph with a heavy bias for drug X (those records contributed patients to the result set) but also any other drugs they are on. Graphing patients can be done cross dataset e.g. you can view a Demography graph for a cohort built using Prescribing.
When running a cohort graph you may benefit from also running the normal graph so you can compare the two:
RDMP generates a single SQL query which is sent to the server. This puts the workload on the DBMS and not the local computer. The SQL query can be viewed through the RDMP client. It is normal for graphs to take several hours to run if your dataset contains billions of records. One way to improve performance is to run the SQL generated by RDMP through the Query Optimizer of your DBMS.
Other things to consider include:
- Adding indexes to PIVOT and/or AXIS columns
- Changing AXIS to a larger increment (e.g. YEAR instead of MONTH)
- Avoiding using complicated transforms for Dimensions (which may invalidate index use)
The RDMP CLI supports automated running and extraction of CSV data for any graphs you have created in RDMP. To do this use the ViewData command. You will need to know the ID of the graph you want to run:
./rdmp viewdata AggregateConfiguration:1428 ALL ./out.csv
Alternatively you can provide the name of the graph:
./rdmp viewdata "AggregateConfiguration:Record count over time (by healthboard)" ALL ./out.csv
If you want to pipe the results to another process (e.g. grep) you can skip the file argument (e.g. ./out.csv
) and add the -q
(quiet) option to surprsess extra logging:
./rdmp viewdata "AggregateConfiguration:1428" ALL -q | grep 1992
RDMP graphs are relatively simple and should not be considered a substitute for dedicated data exploration tools such as Power BI or Tableau. The strengths of RDMP's graph system is the ability to rapidly visualize table data without leaving RDMP and the ability to combine with the other RDMP systems (e.g. cohort building).