Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDB-303 added decoupling between Key and fully typed 'CanonicalKey' #26

Merged
merged 21 commits into from
Sep 23, 2024

Conversation

danovaro
Copy link
Member

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 72.89377% with 148 lines in your changes missing coverage. Please review.

Project coverage is 64.12%. Comparing base (7f8e64c) to head (3bbb824).

Files with missing lines Patch % Lines
src/fdb5/database/Key.cc 76.77% 36 Missing ⚠️
src/fdb5/database/Catalogue.h 19.35% 25 Missing ⚠️
src/fdb5/database/Index.cc 52.94% 16 Missing ⚠️
src/fdb5/database/RetrieveVisitor.cc 0.00% 12 Missing ⚠️
src/fdb5/remote/RemoteFieldLocation.cc 0.00% 7 Missing ⚠️
src/fdb5/toc/AdoptVisitor.cc 0.00% 5 Missing ⚠️
src/fdb5/toc/TocStats.h 54.54% 5 Missing ⚠️
src/fdb5/tools/fdb-hide.cc 0.00% 4 Missing ⚠️
src/fdb5/tools/fdb-root.cc 0.00% 4 Missing ⚠️
src/fdb5/toc/TocCatalogueWriter.cc 80.00% 3 Missing ⚠️
... and 26 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #26      +/-   ##
===========================================
+ Coverage    63.84%   64.12%   +0.27%     
===========================================
  Files          237      238       +1     
  Lines        13583    13610      +27     
  Branches      1325     1313      -12     
===========================================
+ Hits          8672     8727      +55     
+ Misses        4911     4883      -28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/fdb5/database/Catalogue.h Outdated Show resolved Hide resolved
void Schema::expand(const Key &field, WriteVisitor &visitor) const {
Key full(registry());
std::vector<Key> keys(3);
void Schema::expand(const CanonicalKey& field, WriteVisitor &visitor) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels the wrong way around - we are expanding a canonicalised key, and producing a non-canonicalised key?

@@ -75,7 +75,7 @@ class FDB {
// warning: not high-perf API - makes sure that all the requested fields are archived and there are no data exceeding the request
void archive(const metkit::mars::MarsRequest& request, eckit::DataHandle& handle);
// disclaimer: this is a low-level API. The provided key and the corresponding data are not checked for consistency
void archive(const Key& key, const void* data, size_t length);
void archive(const CanonicalKey& key, const void* data, size_t length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the interface to use ApiKey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or just "Key" to make it nice and clean)

public: // methods

explicit CanonicalKey(const std::shared_ptr<TypesRegistry> reg = nullptr);
explicit CanonicalKey(eckit::Stream &, const std::shared_ptr<TypesRegistry> reg = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand (a) why CanonicalKey and ApiKey have the same constructors, (b) why they both accept knowledge of the TypesRegistry.

Ideally neither of them will have the TypesRegistry. We should have an uncanonicalised key, then a component (the schema?) which has the TypesRegistry and produces a canonicalised key.

@danovaro danovaro changed the title FDB-303 added decoupling between CanonicalKey and fully typed 'ApiKey' FDB-303 added decoupling between Key and fully typed 'TypedKey' Jun 25, 2024
Copy link

@tbkr tbkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comment, mostly things of which I'm sure I'm just lacking information, as well as some commented code which is probably not necessary anymore.

@@ -112,7 +112,7 @@ void FDB::archive(const Key& key, const void* data, size_t length) {
if (stepunit->second.size()>0 && static_cast<char>(tolower(stepunit->second[0])) != 'h') {
auto step = keyInternal.find("step");
if (step != keyInternal.end()) {
std::string canonicalStep = config().schema().registry()->lookupType("step").toKey("step", step->second + static_cast<char>(tolower(stepunit->second[0])));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the toKey method now is the same as the tidy method. We should get rid of one

@@ -35,13 +35,11 @@ const std::string &Type::type() const {
return type_;
}

std::string Type::tidy(const std::string&,
const std::string &value) const {
std::string Type::tidy(const std::string &value) const {
return value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@@ -380,27 +380,15 @@ Key::Key() :
// }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a lot of comments in the class. Is the code still needed? If not, remove it.

@@ -62,7 +62,7 @@ bool TocCatalogueWriter::selectIndex(const Key& idxKey) {
fdb5LustreapiFileCreate(indexPath.localPath(), stripeIndexLustreSettings());
}

indexes_[idxKey] = Index(new TocIndex(idxKey, indexPath, 0, TocIndex::WRITE));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole Catalouge is inserted here although only the schema (and maybe the key functionality/ dbKey) is needed. Is there any reason to insert the whole thing?

How are we treating raw pointer? Is this, per convention, interpreted as a transferred ownership? If so, we need a shared_pointer or whatever the convention is here.

@@ -1035,7 +1035,7 @@ std::vector<Index> TocHandler::loadIndexes(bool sorted,
s >> offset;
s >> type;
LOG_DEBUG(debug, LibFdb5) << "TocRecord TOC_INDEX " << path << " - " << offset << std::endl;
indexes.push_back( new TocIndex(s, r->header_.serialisationVersion_, currentDirectory(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remark above would probably get rid of this dynamic_cast, as well? Not entirely sure, whether this can be done like suggested.

@@ -34,7 +34,11 @@ TypeClimateMonthly::~TypeClimateMonthly() {

static int month(const std::string &value) {
if (isdigit(value[0])) {
eckit::Date date(value);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from? Could you add a comment highlighting why this differentiation is necessary?

@@ -19,6 +19,7 @@ if (HAVE_FDB_BUILD_TOOLS) # test scripts use the fdb tools
add_subdirectory(FDB-282)
add_subdirectory(FDB-291)
add_subdirectory(FDB-292)
add_subdirectory(FDB-303)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no FDB-303 directory, forgot to commit ?

@@ -48,8 +52,7 @@ static int month(const std::string &value) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is weird here (predates this PR)

@danovaro danovaro changed the title FDB-303 added decoupling between Key and fully typed 'TypedKey' FDB-303 added decoupling between Key and fully typed 'CanonicalKey' Sep 10, 2024

return ret;
}
BaseKey::~BaseKey() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer = default;

Comment on lines 98 to 101
for (eckit::StringDict::const_iterator i = keys_.begin(); i != keys_.end(); ++i) {
s << i->first << (registry ? canonicalise(i->first, i->second) : i->second);
s << i->first << canonicalise(i->first, i->second);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readable ?
for (const auto& [keyword, value] : keys_) { s << keyword << canonicalise(keyword, value); }

std::vector<std::string>::const_iterator k = p.begin();
std::vector<std::string>::const_iterator kend = p.end();
for (; k != kend; ++k) {
for (const auto& k : request.params()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming; k => keyword

bool found=false;
auto values = request.values(k);
std::string can = canonicalise(k, j->second);
for (auto it = values.begin(); !found && it != values.end(); it++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cbegin() would be better

it++ intentional ?

//----------------------------------------------------------------------------------------------------------------------

Key::Key() :
BaseKey() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant BaseKey() init

BaseKey(keys) {}

Key::Key(eckit::Stream& s) :
BaseKey() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant BaseKey() init

@@ -42,11 +42,11 @@ class RetrieveVisitor : public ReadVisitor {

// From Visitor

virtual bool selectDatabase(const Key &key, const Key &full) override;
virtual bool selectDatabase(const Key& dbKey, const TypedKey& fullComputedKey) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove virtual, also below

@@ -36,7 +36,7 @@ class Store {
virtual ~Store() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer = default;

@@ -36,7 +36,7 @@ class Store {
virtual ~Store() {}

virtual eckit::DataHandle* retrieve(Field& field) const = 0;
virtual std::unique_ptr<FieldLocation> archive(const Key &key, const void *data, eckit::Length length) = 0;
virtual std::unique_ptr<FieldLocation> archive(const Key& idxKey, const void *data, eckit::Length length) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing include memory

@@ -85,7 +85,7 @@ static std::vector<metkit::mars::MarsRequest> make_filter_requests(const std::st

if(str.empty()) return std::vector<metkit::mars::MarsRequest>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be return {};

@@ -85,7 +85,7 @@ static std::vector<metkit::mars::MarsRequest> make_filter_requests(const std::st

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can move make_filter_requests to anonym namespace and remove static

@@ -26,6 +26,7 @@ namespace metkit { class MarsRequest; }

namespace fdb5 {

class Key;
class Key;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be TypedKey


if ((*cur)->match(k))
expand(request, next, depth, keys, full, visitor);
if ((*cur)->match(k.canonical()))
Copy link
Contributor

@mcakircali mcakircali Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention, do we need k.canonical here, since visitor::values are canonical ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that match() takes an argument of type BaseKey. We should either:
i. Call canonical() inside the match() function, or
ii. Be incrementally canonicalising such that we don't fully canonicalise k (with lots of work and allocations) for each iteration of the for loop.

@@ -34,8 +34,8 @@ namespace fdb5 {

//----------------------------------------------------------------------------------------------------------------------

DaosIndex::DaosIndex(const Key& key, const fdb5::DaosName& name) :
IndexBase(key, "daosKeyValue"),
DaosIndex::DaosIndex(const Key& key, const Catalogue* catalogue, const fdb5::DaosName& name) :
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the policy on handling raw pointers? Is this owner ship transferal? Would a shared pointer be ok, otherwise?

@@ -27,9 +27,9 @@ class DaosIndex : public IndexBase {
public: // methods

/// @note: creates a new index in DAOS, in the container pointed to by 'name'
DaosIndex(const Key& key, const fdb5::DaosName& name);
DaosIndex(const Key& key, const Catalogue* catalogue, const fdb5::DaosName& name);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See raw pointer comment above.

@@ -121,6 +123,9 @@ class IndexBase : public eckit::Counted {

Indexer indexer_;

const Catalogue* catalogue_;
mutable std::shared_ptr<TypesRegistry> registry_;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consistency between the raw pointer and shared pointer concept.

Key::Key(const std::shared_ptr<TypesRegistry> reg, bool canonical) :
keys_(),
registry_(reg), canonical_(canonical) {}
// BaseKey::BaseKey(const std::shared_ptr<TypesRegistry> reg) :
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments can be deleted.

void Key::encode(eckit::Stream& s) const {
const TypesRegistry* registry = canonical_ ? nullptr : &this->registry();
void BaseKey::encode(eckit::Stream& s) const {
// const TypesRegistry* registry = (registry_ ? registry_.get() : nullptr);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment.


for (std::vector<Rule *>::const_iterator i = rules_.begin(); i != rules_.end(); ++i ) {
for (Rule* r : rules_) {
// eckit::Log::info() << "Rule " << **i << std::endl;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment.

std::vector<PathName> RootManager::allRoots(const Key& key)
{
eckit::StringSet roots;
// std::vector<PathName> RootManager::allRoots(const Key& key)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment.


/// Lists the roots that can be visited given a DB key
std::vector<eckit::PathName> allRoots(const Key& key);
// std::vector<eckit::PathName> allRoots(const Key& key);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation has been removed. So remove this.

@@ -74,20 +74,20 @@ bool TocCatalogueWriter::selectIndex(const Key& key) {

if (useSubToc()) {

if (fullIndexes_.find(key) == fullIndexes_.end()) {
if (fullIndexes_.find(idxKey) == fullIndexes_.end()) {

// TODO TODO TODO .master.index
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be rather important, if it is stated thrice. Or simply delete it.

@@ -151,11 +151,11 @@ void TocCatalogueWriter::reconsolidateIndexesAndTocs() {
writer_(writer) {}
~ConsolidateIndexVisitor() override {}
private:
void visitDatum(const Field& field, const Key& key) override {
void visitDatum(const Field& field, const TypedKey& datumKey) override {
// TODO: Do a sneaky schema.expand() here, prepopulated with the current DB/index/Rule,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this todo still valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Remove it.

@@ -129,7 +129,7 @@ void MultiRetrieveVisitor::values(const metkit::mars::MarsRequest &request,
}

for(auto l: list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid copy; for (const auto& l : list) {
naming; l -> value

@@ -21,6 +21,7 @@

#include "fdb5/database/DB.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused include
(in general, the DB-Catalogue-Store trio has include issues among each other)

auto step = keyInternal.find("step");
if (step != keyInternal.end()) {
std::string canonicalStep = keyInternal.registry().lookupType("step").toKey("step", step->second+stepunit->second);
std::string canonicalStep = config().schema().registry()->lookupType("step").toKey(step->second + static_cast<char>(tolower(stepunit->second[0])));
keyInternal.set("step", canonicalStep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elegant to move registry out of the key
can we pin the registry above and reuse it in the loop, such as;

    Key keyInternal(key);
    const auto registry = config().schema().registry();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which loop are you referring to here? I don't see a loop.

Comment on lines 22 to 25
#include "eckit/utils/StringTools.h"

#include "eckit/system/Plugin.h"
#include "eckit/system/LibraryManager.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused includes

@@ -26,7 +26,6 @@ class Catalogue;
class Store;
class FDBToolRequest;
class Index;
class Key;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypedKey ?

@@ -125,28 +128,38 @@ void IndexBase::encodeLegacy(eckit::Stream& s, const int version) const {
s << type_;
}

void IndexBase::put(const Key &key, const Field &field) {
void IndexBase::put(const Key& key, const Field &field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe align both refs to left for consistency

@@ -33,14 +33,14 @@ class AdoptVisitor : public BaseArchiveVisitor {
public: // methods

AdoptVisitor(Archiver &owner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align all refs to left ?

Comment on lines 42 to 46

virtual bool selectDatum(const Key &key, const Key &full) override;
virtual bool selectDatum(const TypedKey& datumKey, const TypedKey& fullComputedKey) override;

virtual void print( std::ostream &out ) const override;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing (forward) decl -> TypedKey
redundant virtuals

@@ -20,23 +20,23 @@ namespace fdb5 {

//----------------------------------------------------------------------------------------------------------------------

AdoptVisitor::AdoptVisitor(Archiver &owner, const Key &field, const PathName &path, Offset offset, Length length) :
BaseArchiveVisitor(owner, field),
AdoptVisitor::AdoptVisitor(Archiver &owner, const Key& initialFieldKey, const PathName &path, Offset offset, Length length) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good practice -> add const to offset and length

@@ -173,7 +173,7 @@ void DB::reconsolidate() {
cat->reconsolidate();
}

void DB::index(const Key &key, const eckit::PathName &path, eckit::Offset offset, eckit::Length length) {
void DB::index(const Key& key, const eckit::PathName &path, eckit::Offset offset, eckit::Length length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good practice -> const offset and length

Comment on lines +111 to 114
for (const auto& m : matching_) {
const Index& idx(m->first);
Key remapKey = m->second;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safer:

for (const auto* match : matching_) {
    *if (!match) { continue; }*
    const auto& [idx, remapKey] = *match;

better, matching_ -> replace with std::forward_list<const pair<>&> or use std::vector<std::reference_wrapper<>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention, matching_ contains ptr (not ref); guard as in my comment above or ASSERT()

@@ -151,11 +151,11 @@ void TocCatalogueWriter::reconsolidateIndexesAndTocs() {
writer_(writer) {}
~ConsolidateIndexVisitor() override {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off topic:
redundant dtor, avoid it! no use, on top it disables move semantics (if any) on inheritance

@@ -52,7 +51,7 @@ class EntryVisitor : public eckit::NonCopyable {

private: // methods

virtual void visitDatum(const Field& field, const Key& key) = 0;
virtual void visitDatum(const Field& field, const TypedKey& datumKey) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical decision to choose TypeKey here..
if all visitDatum implementations need Key (hence each calls canocinal), then leave it as Key and let parent do it ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, using TypedKey is correct.
This defers the decision on whether we need to do canonicalisation (which is expensive) to the class which does the using.

src/fdb5/toc/TocCatalogueWriter.cc Show resolved Hide resolved
Comment on lines 65 to 67
virtual bool selectIndex(const Key& idxKey) override;
virtual void deselectIndex() override;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chance to remove virtual

@@ -53,8 +53,8 @@ class IndexBase : public eckit::Counted {

public: // methods

IndexBase(const Key& key, const std::string& type);
IndexBase(eckit::Stream& s, const int version);
IndexBase(const Key& key, const std::string& type, const Catalogue* catalogue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not this PR but mismatched ordering

@@ -161,7 +161,7 @@ class TocStatsReportVisitor : public virtual StatsReportVisitor {

bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
void visitDatum(const Field& field, const std::string& keyFingerprint) override;
void visitDatum(const Field& field, const Key& key) override { NOTIMP; }
void visitDatum(const Field& field, const TypedKey& datumKey) override { NOTIMP; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention #26 (comment)

@@ -128,7 +128,7 @@ std::unique_ptr<FieldLocation> TocStore::archive(const Key &key, const void *dat

ASSERT(len == length);

return std::unique_ptr<TocFieldLocation>(new TocFieldLocation(dataPath, position, length, Key(nullptr, true)));
return std::unique_ptr<TocFieldLocation>(new TocFieldLocation(dataPath, position, length, Key()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while at it, fix -> std::make_unique(dataPath, position, length, Key())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do. It doesn't really matter (the only case where std::make_unique makes a difference is if you have a constructor to an object which owns multiple dynamically allocated objects, where the constructor of one of those objects throws after the other has been constructed, and you want to ensure there is no memory leak.

ASSERT(schema.expandFirstLevel(dbrequest.request(), dbkey));

std::unique_ptr<Catalogue> db = CatalogueFactory::instance().build(dbkey, conf, true);
std::unique_ptr<Catalogue> db = CatalogueFactory::instance().build(dbkey.canonical(), conf, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db and dbWrited (below) could share a const auto dbCanonicalKey = dbkey.canonical();

@@ -65,16 +65,16 @@ void FdbRoot::execute(const eckit::option::CmdArgs& args) {

Config conf = config(args);
const Schema& schema = conf.schema();
Key result;
TypedKey result{conf.schema().registry()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbReader and dbWriter (below) could share a const auto key = result.canonical();

Comment on lines +498 to +502
for (BaseKey::const_iterator j = other.begin(); j != other.end(); ++j) {
const std::string& keyword = (*j).first;
BaseKey::const_iterator k = find(keyword);
if (k == keys_.end()) {
missing.insert(keyword);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!
auto ?
keys_.end() -> end()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing but keys_.end() -> end()

Copy link
Contributor

@mcakircali mcakircali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well done. There's opportunity to fix some non-pr code quality issues. Couple of things need attention, but there's a major issue of segfault, which's why I cannot approve it. Please see the attention/critical comments.

Copy link
Contributor

@mcakircali mcakircali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an object slicing, pls see my comment above.

protected: // members

const Key& field_;
const Key& initialFieldKey() { return initialFieldKey_; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing const -> const Key& initialFieldKey() const { return initialFieldKey_; }

Comment on lines +498 to +502
for (BaseKey::const_iterator j = other.begin(); j != other.end(); ++j) {
const std::string& keyword = (*j).first;
BaseKey::const_iterator k = find(keyword);
if (k == keys_.end()) {
missing.insert(keyword);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing but keys_.end() -> end()

Comment on lines 128 to 129
const std::string& keyStr = uri.query("remapKey");
if (!keyStr.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply const std::string, no ref

Comment on lines 339 to 340
BaseKey(key), registry_(reg) {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention, object slicing will cause issues later
we can reduce contention as registry_(std::move(req))

BaseKey(key), registry_(reg) {}

TypedKey::TypedKey(std::shared_ptr<const TypesRegistry> reg) :
BaseKey({}), registry_(reg) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove BaseKey({}) since we have default ctor
we can reduce contention as registry_(std::move(req))

Comment on lines 33 to 34
virtual void getValues(const metkit::mars::MarsRequest &request,
const std::string &keyword,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove virtual


virtual std::string toKey(const std::string& keyword,
const std::string& value) const override;
std::string toKey(const std::string& value) const override;

virtual void getValues(const metkit::mars::MarsRequest &request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove virtual

@@ -28,7 +28,7 @@ class TypeParam : public Type {

TypeParam(const std::string &name, const std::string &type);

virtual ~TypeParam() override;
~TypeParam() override;

virtual void getValues(const metkit::mars::MarsRequest &request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more virtual

@@ -28,22 +28,21 @@ class TypeStep : public Type {

TypeStep(const std::string &name, const std::string &type);

virtual ~TypeStep() override;
~TypeStep() override;

virtual void getValues(const metkit::mars::MarsRequest &request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove virtual

Comment on lines +111 to 114
for (const auto& m : matching_) {
const Index& idx(m->first);
Key remapKey = m->second;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention, matching_ contains ptr (not ref); guard as in my comment above or ASSERT()

Copy link
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look in Schema.h/cc at the ownership of the TypesRegistry. It is currently a shared_ptr - this was because the types registry could be accessed by the Key object, and we could not guarantee that the lifetime of Key objects was less than the lifetime of the schema, as they were returned by the API.

I believe that now all items returned over the API are 'clean' in this regard, and do not require extension of the life of the TypesRegistry - so this should be able to belong to the Schema (and be accessed by reference from elsewhere). If we can fix this, it is a strong indication that we have removed a code smell.

Overall I am really happy with this PR. Once the comments have been addressed, I'm happy for Emanuele to merge without further review from me.


//TODO add unit test for each type
std::string canonicalise(const std::string& keyword, const std::string& value) const;
virtual std::string canonicalise(const std::string& keyword, const std::string& value) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do canonicalise() and type() belong to the base class - surely these are functions that can only be called on one of the derived classes?


private: // members

std::string canonicalise(const std::string& keyword, const std::string& value) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to effect that this method is used in the comparison function when passed a BaseKey (otherwise it is odd that this method exists)

private: // members

std::string canonicalise(const std::string& keyword, const std::string& value) const override;
std::string type(const std::string& keyword) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should never be called on Key (only ever on TypedKey). Please remove it from here and from the base class.

auto step = keyInternal.find("step");
if (step != keyInternal.end()) {
std::string canonicalStep = keyInternal.registry().lookupType("step").toKey("step", step->second+stepunit->second);
std::string canonicalStep = config().schema().registry()->lookupType("step").toKey(step->second + static_cast<char>(tolower(stepunit->second[0])));
keyInternal.set("step", canonicalStep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which loop are you referring to here? I don't see a loop.

@@ -52,7 +51,7 @@ class EntryVisitor : public eckit::NonCopyable {

private: // methods

virtual void visitDatum(const Field& field, const Key& key) = 0;
virtual void visitDatum(const Field& field, const TypedKey& datumKey) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, using TypedKey is correct.
This defers the decision on whether we need to do canonicalisation (which is expensive) to the class which does the using.

@@ -151,11 +151,11 @@ void TocCatalogueWriter::reconsolidateIndexesAndTocs() {
writer_(writer) {}
~ConsolidateIndexVisitor() override {}
private:
void visitDatum(const Field& field, const Key& key) override {
void visitDatum(const Field& field, const TypedKey& datumKey) override {
// TODO: Do a sneaky schema.expand() here, prepopulated with the current DB/index/Rule,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Remove it.

@@ -1035,7 +1035,7 @@ std::vector<Index> TocHandler::loadIndexes(bool sorted,
s >> offset;
s >> type;
LOG_DEBUG(debug, LibFdb5) << "TocRecord TOC_INDEX " << path << " - " << offset << std::endl;
indexes.push_back( new TocIndex(s, r->header_.serialisationVersion_, currentDirectory(),
indexes.push_back( new TocIndex(s, *(dynamic_cast<const TocCatalogue*>(this)), r->header_.serialisationVersion_, currentDirectory(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this at all.

The purpose of dynamic_cast is that it uses RTTI to check that the cast is valid - and returns null if not. Here we are assuming that the return value is non-null without checking - either negating the point (which is that it should be used where we cannot assume this), or creating a possible segfault location.

One alternative would be to use dynamic_cast<const TocCatalogue&> which would throw an exception in the case that this was wrong.

Another would be to use a static_cast. Or (at the top of the function) we could ASSERT that the dynamic cast is non-null, and then use the returned value elsewhere in the function.

But this is telling us that something is not right with the type hierarchy. Either this function ought to move to TocCatalogue, or it ought to accept an argument of type "const Catalogue&". Probably the latter.

This code is smelly.

@@ -128,7 +128,7 @@ std::unique_ptr<FieldLocation> TocStore::archive(const Key &key, const void *dat

ASSERT(len == length);

return std::unique_ptr<TocFieldLocation>(new TocFieldLocation(dataPath, position, length, Key(nullptr, true)));
return std::unique_ptr<TocFieldLocation>(new TocFieldLocation(dataPath, position, length, Key()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do. It doesn't really matter (the only case where std::make_unique makes a difference is if you have a constructor to an object which owns multiple dynamically allocated objects, where the constructor of one of those objects throws after the other has been constructed, and you want to ensure there is no memory leak.

@@ -34,7 +34,11 @@ TypeClimateMonthly::~TypeClimateMonthly() {

static int month(const std::string &value) {
if (isdigit(value[0])) {
eckit::Date date(value);
int n = stoi(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that the string length <= 2 here. Otherwise we will be doing a lot of extra string work in most cases.

set +e
grep "Invalid expver value xxxx" out.1 # should fail
set -e
! grep "Invalid expver value xxxx" out.1 # should fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick. I didn't know this one.

@danovaro danovaro merged commit a310849 into develop Sep 23, 2024
101 of 111 checks passed
mcakircali added a commit that referenced this pull request Oct 8, 2024
@danovaro danovaro deleted the feature/canonicalKey branch October 31, 2024 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants