Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post: "Why should I use a database?" #186

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bobturneruk
Copy link
Collaborator

@bobturneruk bobturneruk commented Nov 20, 2019

Anyone can review.

I got the pics from here https://unsplash.com/images/stock/public-domain

description:
type: text
excerpt_separator: <!--more-->
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, you can now add:

image:
  path: /assets/images/database-blog-post/scale.jpg

or similar, to specify an image for a social card when the page link is shared on fb/twitter etc.


## Convinced?

Maybe. But why isn't everyone using databases all the time? Probably because of the skillset and time needed to set a database up. Microsoft Access is an option for many who want more than a spreadsheet, but it doesn't confer all of the benefits I've described "out of the box". Setting up database on a web server, even if you're not developing it from scratch is a skilled job and something you don't want to get wrong. A short term option is to contact your I.T. services or Research Software Engineering team to see if they can help. Often this is not part of "standard service" and costs therefore need to be picked up by individual projects. Longer term, there are some other things we could do. I.T. services within organisations doing research could maintain database servers that can be configured using a web interface for researchers to use - the delineation of who does what would need to be worked out, particularly if sensitive data is involved. Something that really should be happening is to provide researchers with better training not just on "data management plans" but providing them with the software skills they need to implement them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!


## Availability

If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't download an access database? Can upload an excel or csv (which others can download).

Copy link
Member

@Robadob Robadob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it feels VERY one-sided.

tldr; There are alot of perfectly good reasons to use flat files


Validation

Spreadsheets support validation, it's not unique to databases.

https://exceljet.net/excel-data-validation-guide

Audit Trails

It's just as easy to turn on track changes in microsoft office, similarly data bases don't have to track who made changes if they're not configured that way (e.g. sqlite).

You haven't considered performance.

e.g. dumping to csv flat file is thousands of times faster than sqlite (I once tried writing ~50k records per frame of a real-time model, didn't go very well).

I expect proper databases sit somewhere in the middle in terms of performance (outside of expensive commercial systems).

There's always the argument to dump raw data to flat file, then process it into a database afterwards.

Online Data Repositories

How do you upload an access database to something like Mendley data? https://data.mendeley.com/ You likely don't have the funding to host your database attached to your paper indefinitely, but your journals official data store can be assumed to exist for the lifetime of the journal.

Uploading a file structure with spreadsheets or similar is quite simple.

Ease

CSV export is the 101 of data logging, its alot easier to get started with 0 expertise
Everyone knows how to read a table/spreadsheet, some people may struggle with your database interface (e.g. if they don't know SQL).

There are probably other things I haven't considered too.

Copy link
Collaborator

@willfurnass willfurnass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the idea of a post on these topics and think this is timely given recent projects, however:

  • I think the post conflates the benefits of using a database with the benefits of using a database-backed web application.
  • As I mentioned, some spreadsheet solutions do include means for setting up data validation rules. However, I think a key difference between spreadsheets and databases is that schemas are required and enforced with the latter, but are very much opt-in with the former
  • Is it worth expanding a little on the power of databases for linking tables and requiring valid foreign key entries (without using that term)?
  • I think you're right to touch on 'sharability' but I think it's also worth mentioning transactions here (again, possibly without mentioning that term) as being a key mechanism for facilitating safer concurrent access.

excerpt_separator: <!--more-->
---

Figuring out how to store research data is potentially a bit of a headache - we have to balance data security, integrity and availability. Databases offer a means of doing research data management better, but setting them up is out of reach for many researchers. Especially those doing small or pilot studies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Databases offer a means of doing research data management better" - for all cases?


In a database, "validation rules" can be applied to each column. Some examples:

- The data must be a value chosen from a pre-existing list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such things are possible with Google Sheets (e.g. the validation rules used in the sheet for capturing menu preferences for the team Xmas party)


## Audit trail

Is there a word in English that conveys more joy and romance that "audit"? I doubt it. Databases are much better at keeping track of who altered what data, when and why than spreadsheets. If someone logs into a web interface to a database, their user name can be automatically tied to the edits and additions they make, and the facility to comment on data points can be made available. Data entries can be marked at "draft" or "verified" enabling quality and completeness of the data set to be reported on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Data entries can be marked at "draft" or "verified"' - true of database-backed web or desktop applications but not necessarily databases themselves.


## Availability

If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's perfectly possible to use a SQLite database on a laptop drive and never back that up or one could use a Google Sheet or access an Excel Spreadsheet that resides on resilient, snapshotted/backed-up network storage.

@bobturneruk
Copy link
Collaborator Author

Thanks for all the feedback! "Database" and "database backed web application" were treated as the same thing here, for simplicity, but clearly they're not and that leads to internal inconsistencies. I'll have another good look at the article, but I think the chances of keeping everybody happy are fairly low. Scoping my audience, the kind of data I'm talking about and what a database is up-front will help. Hopefully people will look at it again once I've made some changes.

@bobturneruk bobturneruk marked this pull request as draft June 9, 2020 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants