Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KETTLE-63: Added an recursive file datasource #37

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kaspermarkus
Copy link
Contributor

No description provided.

@kaspermarkus kaspermarkus changed the title KETTLE-63: Added an asynchronous file datasource KETTLE-63: Added an recursive file datasource Sep 25, 2017
}
var i = 0;

(function next() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nuts and deeply unidiomatic!
This can all be vastly simpler than as implemented. We don't require for this to be backed by a filesystem which will change after startup - it is only ever applied for test data for unit or acceptance tests. We can simply do one grand scan at startup, build a table of which directory which file is in, and then for ever afterwards just look up that file in the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that we would need to index the entire filesystem on startup though, wouldn't it? Since we dont know which folder the request is gonna be for, until the request is made (i.e. how would we know that it's the testData/* folders for example that are relevant, until we get a request for that... it might as well be the /tmp folder or the like)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or am I missing something vital?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right - we would index the filesystem tree under the mount point on startup

}

// resolve path and filename
var resolvedPath = kettle.dataSource.URL.resolveUrl(that.options.path, that.options.termMap, directModel, true).replace("//", "/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take the opportunity to factor datasource-file properly - we can share a front-end which does this resolution, and share a back-end which actually sources the data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this comment is still relevant, since changing the code to do a scan a startup, the path is no longer used and there is generally very little overlap between the two components

@kaspermarkus
Copy link
Contributor Author

@amb26 This is ready for another round of review

@amb26 amb26 added the mothballed A PR that is indefinitely suspended, but not cancelled outright, and may resume label Aug 16, 2018
@idrc-cms-bot
Copy link

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jobara jobara changed the base branch from master to main October 26, 2020 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mothballed A PR that is indefinitely suspended, but not cancelled outright, and may resume
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants