Blog post: "Why should I use a database?" #186

bobturneruk · 2019-11-20T11:06:01Z

Anyone can review.

I got the pics from here https://unsplash.com/images/stock/public-domain

davidwilby · 2019-11-20T11:19:30Z

_posts/2019-11-05-why-database.md

+description:
+type: text
+excerpt_separator: <!--more-->
+---


For reference, you can now add:

image: path: /assets/images/database-blog-post/scale.jpg

or similar, to specify an image for a social card when the page link is shared on fb/twitter etc.

davidwilby · 2019-11-20T11:20:46Z

_posts/2019-11-05-why-database.md

+
+## Convinced?
+
+Maybe. But why isn't everyone using databases all the time? Probably because of the skillset and time needed to set a database up. Microsoft Access is an option for many who want more than a spreadsheet, but it doesn't confer all of the benefits I've described "out of the box". Setting up database on a web server, even if you're not developing it from scratch is a skilled job and something you don't want to get wrong. A short term option is to contact your I.T. services or Research Software Engineering team to see if they can help. Often this is not part of "standard service" and costs therefore need to be picked up by individual projects. Longer term, there are some other things we could do. I.T. services within organisations doing research could maintain database servers that can be configured using a web interface for researchers to use - the delineation of who does what would need to be worked out, particularly if sensitive data is involved. Something that really should be happening is to provide researchers with better training not just on "data management plans" but providing them with the software skills they need to implement them.


Robadob · 2019-11-20T11:29:45Z

_posts/2019-11-05-why-database.md

+
+## Availability
+
+If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop.


Can't download an access database? Can upload an excel or csv (which others can download).

Robadob

In general it feels VERY one-sided.

tldr; There are alot of perfectly good reasons to use flat files

Validation

Spreadsheets support validation, it's not unique to databases.

https://exceljet.net/excel-data-validation-guide

Audit Trails

It's just as easy to turn on track changes in microsoft office, similarly data bases don't have to track who made changes if they're not configured that way (e.g. sqlite).

You haven't considered performance.

e.g. dumping to csv flat file is thousands of times faster than sqlite (I once tried writing ~50k records per frame of a real-time model, didn't go very well).

I expect proper databases sit somewhere in the middle in terms of performance (outside of expensive commercial systems).

There's always the argument to dump raw data to flat file, then process it into a database afterwards.

Online Data Repositories

How do you upload an access database to something like Mendley data? https://data.mendeley.com/ You likely don't have the funding to host your database attached to your paper indefinitely, but your journals official data store can be assumed to exist for the lifetime of the journal.

Uploading a file structure with spreadsheets or similar is quite simple.

Ease

CSV export is the 101 of data logging, its alot easier to get started with 0 expertise
Everyone knows how to read a table/spreadsheet, some people may struggle with your database interface (e.g. if they don't know SQL).

There are probably other things I haven't considered too.

willfurnass

I very much like the idea of a post on these topics and think this is timely given recent projects, however:

I think the post conflates the benefits of using a database with the benefits of using a database-backed web application.
As I mentioned, some spreadsheet solutions do include means for setting up data validation rules. However, I think a key difference between spreadsheets and databases is that schemas are required and enforced with the latter, but are very much opt-in with the former
Is it worth expanding a little on the power of databases for linking tables and requiring valid foreign key entries (without using that term)?
I think you're right to touch on 'sharability' but I think it's also worth mentioning transactions here (again, possibly without mentioning that term) as being a key mechanism for facilitating safer concurrent access.

willfurnass · 2019-11-20T11:06:46Z

_posts/2019-11-05-why-database.md

+excerpt_separator: <!--more-->
+---
+
+Figuring out how to store research data is potentially a bit of a headache - we have to balance data security, integrity and availability. Databases offer a means of doing research data management better, but setting them up is out of reach for many researchers. Especially those doing small or pilot studies.


"Databases offer a means of doing research data management better" - for all cases?

willfurnass · 2019-11-20T11:18:40Z

_posts/2019-11-05-why-database.md

+
+In a database, "validation rules" can be applied to each column. Some examples:
+
+- The data must be a value chosen from a pre-existing list.


Such things are possible with Google Sheets (e.g. the validation rules used in the sheet for capturing menu preferences for the team Xmas party)

willfurnass · 2019-11-20T11:20:29Z

_posts/2019-11-05-why-database.md

+
+## Audit trail
+
+Is there a word in English that conveys more joy and romance that "audit"? I doubt it. Databases are much better at keeping track of who altered what data, when and why than spreadsheets. If someone logs into a web interface to a database, their user name can be automatically tied to the edits and additions they make, and the facility to comment on data points can be made available. Data entries can be marked at "draft" or "verified" enabling quality and completeness of the data set to be reported on.


'Data entries can be marked at "draft" or "verified"' - true of database-backed web or desktop applications but not necessarily databases themselves.

willfurnass · 2019-11-20T11:23:03Z

_posts/2019-11-05-why-database.md

+
+## Availability
+
+If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop.


But it's perfectly possible to use a SQLite database on a laptop drive and never back that up or one could use a Google Sheet or access an Excel Spreadsheet that resides on resilient, snapshotted/backed-up network storage.

bobturneruk · 2019-11-20T16:30:04Z

Thanks for all the feedback! "Database" and "database backed web application" were treated as the same thing here, for simplicity, but clearly they're not and that leads to internal inconsistencies. I'll have another good look at the article, but I think the chances of keeping everybody happy are fairly low. Scoping my audience, the kind of data I'm talking about and what a database is up-front will help. Hopefully people will look at it again once I've made some changes.

bobturneruk added 5 commits November 12, 2019 09:28

some more text

d75f387

remove stuff on physical storage

a78e422

added more text

3c78763

formatting

b497a01

complete draft and pics

a5d6a13

davidwilby reviewed Nov 20, 2019

View reviewed changes

Robadob reviewed Nov 20, 2019

View reviewed changes

willfurnass reviewed Nov 20, 2019

View reviewed changes

bobturneruk marked this pull request as draft June 9, 2020 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog post: "Why should I use a database?" #186

Blog post: "Why should I use a database?" #186

bobturneruk commented Nov 20, 2019 •

edited

Loading

davidwilby Nov 20, 2019

davidwilby Nov 20, 2019

Robadob Nov 20, 2019

Robadob left a comment •

edited

Loading

willfurnass left a comment

willfurnass Nov 20, 2019

willfurnass Nov 20, 2019

willfurnass Nov 20, 2019

willfurnass Nov 20, 2019

bobturneruk commented Nov 20, 2019


		## Convinced?

		Maybe. But why isn't everyone using databases all the time? Probably because of the skillset and time needed to set a database up. Microsoft Access is an option for many who want more than a spreadsheet, but it doesn't confer all of the benefits I've described "out of the box". Setting up database on a web server, even if you're not developing it from scratch is a skilled job and something you don't want to get wrong. A short term option is to contact your I.T. services or Research Software Engineering team to see if they can help. Often this is not part of "standard service" and costs therefore need to be picked up by individual projects. Longer term, there are some other things we could do. I.T. services within organisations doing research could maintain database servers that can be configured using a web interface for researchers to use - the delineation of who does what would need to be worked out, particularly if sensitive data is involved. Something that really should be happening is to provide researchers with better training not just on "data management plans" but providing them with the software skills they need to implement them.


		## Availability

		If your database is on a web server that is secure and regularly backed up, your data is going to be much more available and reliable than if it's on a spreadsheet on your laptop.


		In a database, "validation rules" can be applied to each column. Some examples:

		- The data must be a value chosen from a pre-existing list.


		## Audit trail

		Is there a word in English that conveys more joy and romance that "audit"? I doubt it. Databases are much better at keeping track of who altered what data, when and why than spreadsheets. If someone logs into a web interface to a database, their user name can be automatically tied to the edits and additions they make, and the facility to comment on data points can be made available. Data entries can be marked at "draft" or "verified" enabling quality and completeness of the data set to be reported on.

Blog post: "Why should I use a database?" #186

Are you sure you want to change the base?

Blog post: "Why should I use a database?" #186

Conversation

bobturneruk commented Nov 20, 2019 • edited Loading

davidwilby Nov 20, 2019

Choose a reason for hiding this comment

davidwilby Nov 20, 2019

Choose a reason for hiding this comment

Robadob Nov 20, 2019

Choose a reason for hiding this comment

Robadob left a comment • edited Loading

Choose a reason for hiding this comment

Validation

Audit Trails

You haven't considered performance.

Online Data Repositories

Ease

willfurnass left a comment

Choose a reason for hiding this comment

willfurnass Nov 20, 2019

Choose a reason for hiding this comment

willfurnass Nov 20, 2019

Choose a reason for hiding this comment

willfurnass Nov 20, 2019

Choose a reason for hiding this comment

willfurnass Nov 20, 2019

Choose a reason for hiding this comment

bobturneruk commented Nov 20, 2019

bobturneruk commented Nov 20, 2019 •

edited

Loading

Robadob left a comment •

edited

Loading