-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog post: "Why should I use a database?" #186
base: master
Are you sure you want to change the base?
Conversation
description: | ||
type: text | ||
excerpt_separator: <!--more--> | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, you can now add:
image:
path: /assets/images/database-blog-post/scale.jpg
or similar, to specify an image for a social card when the page link is shared on fb/twitter etc.
|
||
## Convinced? | ||
|
||
Maybe. But why isn't everyone using databases all the time? Probably because of the skillset and time needed to set a database up. Microsoft Access is an option for many who want more than a spreadsheet, but it doesn't confer all of the benefits I've described "out of the box". Setting up database on a web server, even if you're not developing it from scratch is a skilled job and something you don't want to get wrong. A short term option is to contact your I.T. services or Research Software Engineering team to see if they can help. Often this is not part of "standard service" and costs therefore need to be picked up by individual projects. Longer term, there are some other things we could do. I.T. services within organisations doing research could maintain database servers that can be configured using a web interface for researchers to use - the delineation of who does what would need to be worked out, particularly if sensitive data is involved. Something that really should be happening is to provide researchers with better training not just on "data management plans" but providing them with the software skills they need to implement them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
|
||
## Availability | ||
|
||
If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't download an access database? Can upload an excel or csv (which others can download).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it feels VERY one-sided.
tldr; There are alot of perfectly good reasons to use flat files
Validation
Spreadsheets support validation, it's not unique to databases.
https://exceljet.net/excel-data-validation-guide
Audit Trails
It's just as easy to turn on track changes in microsoft office, similarly data bases don't have to track who made changes if they're not configured that way (e.g. sqlite).
You haven't considered performance.
e.g. dumping to csv flat file is thousands of times faster than sqlite (I once tried writing ~50k records per frame of a real-time model, didn't go very well).
I expect proper databases sit somewhere in the middle in terms of performance (outside of expensive commercial systems).
There's always the argument to dump raw data to flat file, then process it into a database afterwards.
Online Data Repositories
How do you upload an access database to something like Mendley data? https://data.mendeley.com/ You likely don't have the funding to host your database attached to your paper indefinitely, but your journals official data store can be assumed to exist for the lifetime of the journal.
Uploading a file structure with spreadsheets or similar is quite simple.
Ease
CSV export is the 101 of data logging, its alot easier to get started with 0 expertise
Everyone knows how to read a table/spreadsheet, some people may struggle with your database interface (e.g. if they don't know SQL).
There are probably other things I haven't considered too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like the idea of a post on these topics and think this is timely given recent projects, however:
- I think the post conflates the benefits of using a database with the benefits of using a database-backed web application.
- As I mentioned, some spreadsheet solutions do include means for setting up data validation rules. However, I think a key difference between spreadsheets and databases is that schemas are required and enforced with the latter, but are very much opt-in with the former
- Is it worth expanding a little on the power of databases for linking tables and requiring valid foreign key entries (without using that term)?
- I think you're right to touch on 'sharability' but I think it's also worth mentioning transactions here (again, possibly without mentioning that term) as being a key mechanism for facilitating safer concurrent access.
excerpt_separator: <!--more--> | ||
--- | ||
|
||
Figuring out how to store research data is potentially a bit of a headache - we have to balance data security, integrity and availability. Databases offer a means of doing research data management better, but setting them up is out of reach for many researchers. Especially those doing small or pilot studies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Databases offer a means of doing research data management better" - for all cases?
|
||
In a database, "validation rules" can be applied to each column. Some examples: | ||
|
||
- The data must be a value chosen from a pre-existing list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such things are possible with Google Sheets (e.g. the validation rules used in the sheet for capturing menu preferences for the team Xmas party)
|
||
## Audit trail | ||
|
||
Is there a word in English that conveys more joy and romance that "audit"? I doubt it. Databases are much better at keeping track of who altered what data, when and why than spreadsheets. If someone logs into a web interface to a database, their user name can be automatically tied to the edits and additions they make, and the facility to comment on data points can be made available. Data entries can be marked at "draft" or "verified" enabling quality and completeness of the data set to be reported on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Data entries can be marked at "draft" or "verified"' - true of database-backed web or desktop applications but not necessarily databases themselves.
|
||
## Availability | ||
|
||
If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's perfectly possible to use a SQLite database on a laptop drive and never back that up or one could use a Google Sheet or access an Excel Spreadsheet that resides on resilient, snapshotted/backed-up network storage.
Thanks for all the feedback! "Database" and "database backed web application" were treated as the same thing here, for simplicity, but clearly they're not and that leads to internal inconsistencies. I'll have another good look at the article, but I think the chances of keeping everybody happy are fairly low. Scoping my audience, the kind of data I'm talking about and what a database is up-front will help. Hopefully people will look at it again once I've made some changes. |
Anyone can review.
I got the pics from here https://unsplash.com/images/stock/public-domain