-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always use an auto-generated doc values as a back-up for Avro doc-related metadata retrieval. #377
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #377 +/- ##
============================================
+ Coverage 91.47% 91.90% +0.42%
- Complexity 243 258 +15
============================================
Files 26 27 +1
Lines 927 963 +36
Branches 67 71 +4
============================================
+ Hits 848 885 +37
+ Misses 52 50 -2
- Partials 27 28 +1 |
d232488
to
bcd97b7
Compare
Hi. Sorry it took me a while to get here, as I am putting little time on this project... Is this PR still relevant? It has some conflicts with the just merged #380 . If so, can you elaborate a bit further on the need for these changes? Specifically: what do we mean by "more fault-tolerant"? And what problem does "less dependant on a user-supplied schema" solves? |
…en supplied schema (expected data format) and an actual data format, returned by a SQL query. Reorganize some code to make locations more logical. Always use generated Avro schema. Optional user provided schema used for `doc` fields retrieval.
bcd97b7
to
12989f1
Compare
Updated the description. |
Add more tests and updte docs.
…n by a SQL result).
This PR is meant to be a solution for issue #579 .
Also make a schema generation process less dependant on a user-provided schema and more fault-tolerant.
Current implementation
dbeam always generates an doc-related properties for a Avro schema based on input parameters and
ResultSet
value.Optionally a user can provide a custom "handwritten" schema.
A user-provided schema is only used for Avro
doc
values.Thus fields' names, types and type length are taken from an auto-generated schema.
Drawback(s)
One of drawbacks of this behaviour is that when a new field appears in a DB table and as consequence in a source SQL ResultSet (e.g.
SELECT *
is used), and a user-provided scheam doesn't contain this field, the process will throw an error.Solution
dbeam's auto-generated schema is always used as a back-up, if a new a user-provided schema doesn't contain the field in question.
Additional use-case
An unplanned positive side-effect can be that one can use a a user-provided schema as a dictionary of descriptions (
doc
s) for various fields, so one schema file can be used for muliple tables. We are going to use this side-effect.Checklist for PR author(s)
mvn com.coveo:fmt-maven-plugin:format org.codehaus.mojo:license-maven-plugin:update-file-header
)