Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review comments from MarkusK #32

Open
wants to merge 1 commit into
base: copyFromPaper
Choose a base branch
from

Conversation

mkuehbach
Copy link
Collaborator

@lukaspie agree on this one

Copy link
Collaborator

@lukaspie lukaspie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, but for some changes, I don't see the reason. Eventually, it would be nice if the suggested changes on the individual content elements (not on the overall structure) would be added in the PR with the original draft, so that we avoid having the same conversation across multiple PRs.

@@ -95,17 +98,17 @@ with the available options:
converted, ensuring version consistency.
--do-not-store-nxdl Prevent the input NXDL file from being stored as a
comment at the end of the output YAML file.
--verbose Display keywords and value types in standard output to
assist in identifying issues in YAML files.
--verbose Print keywords and value types to the standard output stream
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the print of `nyaml2nxdl --help``, so if we want to change this, we should do it directly in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these changes should be made right in the code

The [nyaml](https://github.com/FAIRmat-NFDI/nyaml/tree/main) Python package serves as a tool for bidirectional converting between the `YAML` and `XML` expressed NXDL data schemas.
This `README.md` documents the specific simplified set of notation whereby users can write base class schemas or application definition schemas using `YAML`. Noteworthy, this `README.md` does not introduce the capabilities of the NeXus Definition Language (specifically its NeXus objects, terms, or types). Please refer to the official NeXus documentation at NeXus [official site](https://www.nexusformat.org/).

## Table of contents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actual order of contents should be discussed after we have finished the contents for each section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind when the order is discussed installation though as you point out also is one of the earlier needs while reading linearly through this readme.md

Make working with the NeXus Schema Definition Language more convenient for end users.
NeXus and its respective NeXus Definition Language (NXDL) represents a concerted effort for standardizing the terminology and granularization for the exchange of serialized data and metadata within and across scientific communities. NeXus is rooted in the neutron, X-ray, and muon research [J. Appl. Cryst. (2015). 48, 301-305](https://doi.org/10.1107/S1600576714027575). The tool nyaml is an effort of members of the solid-state physics community within the German National Research Data Infrastructure ([German NFDI](https://www.github.com/FAIRmat-NFDI) to extend NeXus for standardized information exchange in the research fields of materials characterization.

NeXus describes concepts through general data storage objects (so-called base classes). From these building blocks, a so-called application definition is composed. This is a measurement- and instrument-specific graph that can be used to define which pieces of information are communicated with instance data such as files or database artifacts. Base classes and application definitions are defined through a so-called NXDL schema definition file using the Extensible Markup Language, [XML](https://www.w3.org/TR/REC-xml/REC-xml-20081126.xml). The nyaml tool makes the process of working and editing NXDL schema definitions more efficient by using Yet Another Markup Language ([YAML](https://yaml.org/) with its indentation-driven approach to eliminate the need for editing starting and ending XML tags. Thereby, the schema definitions read more concisely and enable grasping more intuitively class inheritance that NeXus allows for.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of introducing NeXus a bit earlier than what was in the initial draft. However, here there are some things removed/shortened which should still be there IMO, so we should aim to merge the two paragraphs here and in the original draft.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for suggestions, the introductory part was mostly a copy of content from a NIAC side. I see value in this being shorter. Indeed, when writing the documentation for these tools I would rather test this time the idea to do it less verbose but to the point

NXDL is used to define general data storage objects (base classes) and use them as the building blocks for defining measurement-specific or even instrument-specific data storage objects (application definitions). In this process, members and definitions of individual base classes can be used as is or customized. In essence, the process of schema development, whether for a base class or an application definition, entails crafting an NXDL schema definition file with the extension 'nxdl.xml', utilizing the Extensible Markup Language, [XML](https://www.w3.org/TR/REC-xml/REC-xml-20081126.xml) .

To expedite the schema development process, we have introduced Yet Another Markup Language ([YAML](https://yaml.org/)), which provides a syntax or style specifically tailored for defining scientific domain-driven schemas with NXDL. One significant advantage of YAML over XML is its indentation-driven approach, which eliminates the need for starting and ending tags for each entity within the schema. The `YAML` format results in a reduction of NXDL keyword repetition and allows for a more intuitive grasp of Python syntax, such as class inheritance. These benefits are attained without compromising the integrity of the original NeXus schema, which is traditionally expressed in XML format.
## Getting started with nyaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come close the beginning (maybe rename to installation) since we want people to use this tool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on where to place the benefit, however, why not as bullet points rather than prose?

@@ -64,27 +83,11 @@ graph TD;
id6-->id8
```

With input file the `nyaml` converter checks for the correct file type and call appropriate converter. For XML file, the XML converter parse the `XML` file, by [lxml](https://lxml.de/) python library, into a `XML` tree object. By following the NXDL rules the converter writes the application definition or base class object into `yaml` file following the `nyaml` syntax. If the input file is `yaml` then the `yaml` converter collects the comments in a `Comments` object and parse the `yaml` file into python `dictionary` object. Later, the application definition or base classes will be written into `XML` file from the `Comments` and python `dictionary` object.
The tool is a command line application which picks up provided input in either YAML or XML triggering a conversion vice versa automatically. Functionalities of the [lxml](https://lxml.de/) python library are used to process an `XML` tree object. By following the NXDL rules the converter writes the application definition or base class object into `yaml` file following the `nyaml` syntax. If the input file is `yaml` then the `yaml` converter collects the comments in a `Comments` object and parse the `yaml` file into python `dictionary` object. Later, the application definition or base classes will be written into `XML` file from the `Comments` and python `dictionary` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of the first sentence, seems a bit verbose to me. I think combining the two drafts here would be good.

--help Show this message and exit.
```

The `--output-file` option if user wants to define output file name (including extension) otherwise converter will define the output file name e.g. from input file `NXapplication.nxdl.xml (NXapplication.yaml)` the resultant file will be `NXapplication_parser.yaml (NXapplication.nxdl.xml)`. With the option `--check-consistency` the converter produces the same type of file as the input, e.g. for input `NXapplication.nxdl.xml` the output file is `NXapplication_consistency.nxd.xml`. The intention for this option is to verify proper file and version conversion of the file. When converting the `nxdl.xml` file into `yaml` it also stores the `nxdl.xml` file at the end of `yaml` file with a hash. The option `--do-not-store-nxdl` prevents the `yaml` file from storing `nxdl.xml` text. The `verbose` option is to identify the issue, if there are some unexpected conversion, while converting the file from one to another.

## Conversion from YAML to XML
Presented below is a concise and trimmed example of the `NXmpes` application definition in `YAML` format, alongside its corresponding translation into `XML` format, as illustrated below. Subsequently, the fundamental rules governing this conversion process are elucidated. For a comprehensive understanding of the basic structure of NXDL, readers are encouraged to explore the [NeXus Manual](https://manual.nexusformat.org/user_manual.html). Throughout the followed discussions, various components of the NXmpes application definition will be discussed in the light of `nyaml` converter.
## Conversion from `YAML` to `YAML` to `XML`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why YAML twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake


### Root section for base classes and application definitions:
Within the YAML format, the root section denotes the top-level description of the application definition or base class schema, comprising the `category`, `type`, `doc`, `symbols` block, and the name of the schema (e.g. `NXmpes(NXobject)`). Correspondingly, the root section refers to the XML element `definition`, encompassing the first `doc` child of the `definition` and `symbols`. The definition element encapsulates essential xml attributes such as the schema's `name` (and xml attribute), the object it `extends` (an xml attribute), and the schema `type` (an xml attribute), with additional XML attributes (e.i. `xmlns:xsi`) handled by the nyaml converter. The accurate designation of category as either `base` or `application` distinguishes between an `application definition` and a `base class`. The schema name (e.i. `NXmpes(NXobject)`) with paranthesis indicates the extension of the current application definition, noting that base classes must `extends` NXobject, whereas application definitions may `extends` either `NXobject` or another `application definition` (excluding base classes). Schemas may incorporate one or multiple symbols, each imbued with specialized physical meanings beyond their literal interpretation, which are utilised over the application definition.
Within the YAML format, the root section denotes the top-level description of the application definition or base class schema, comprising the `category`, `type`, `doc`, `symbols` block, and the name of the schema (e.g. `NXmpes(NXobject)`). Correspondingly, the root section refers to the XML element `definition`, encompassing the first `doc` child of the `definition` and `symbols`. The definition element encapsulates essential XML attributes such as the `name` (and XML attribute), the object it `extends` (an XML attribute), and the schema `type` (an XML attribute), with additional XML attributes (e.i. `xmlns:xsi`) handled by the nyaml converter. The category specifies if a `base class` or an `application definition` is defined. The schema name (e.i. `NXmpes(NXobject)`) `NXmpes` is supplemented with a class name in parenthesis that defines which concept NXmpes extents. A base class can only extend NXobject, whereas an application definition extends either `NXobject` or another `application definition` (excluding base classes). Schemas may incorporate one or multiple symbols, each imbued with specialized physical meanings beyond their literal interpretation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this paragraph, it is a bit more concise than originally while still retaining the same meaning.

@@ -303,19 +306,22 @@ symbols:
NXmpes(NXobject):
```

### NeXus Group
[NeXus groups](https://manual.nexusformat.org/design.html#design-groups), as instances of NeXus base classes, embody the compositional structure of application definitions. These groups can be initialized dynamically or statically, each approach offering distinct advantages.
### Uniqueness of NeXus concepts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph does not seem very easy to understand for outsiders.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree for end users I think it can be as simple as base classes (concept glossary with some value constraints), appdefs brickstone models with constraints (values and existence). Much should refer as hyperlinks maybe to specific portions of the NIAC docs (no point to repeat this) or even better to an edited version of our tech paper about NeXus which should go i) deeper than the above brickstone analogy ii) but not repeat what people can find when they read the NIAC documentation, iii) but should contain outer conceptual additions to NeXus


Descriptive information about NeXus groups is encapsulated within the `doc` child of the respective group. It is important to note that the group annotation of `source_TYPE(NXsource)` or `(NXsource)source_TYPE` signifies the encapsulation of the group's `name` as `source_TYPE` and its type as `NXsource` base class. Notably, the order between `name` and `type` within the XML element must be inverted such two different syntax.

Furthermore, the uppercase part of the group's name can be dynamically overwritten, allowing for the instantiation of multiple instances. For example, `source_electric` and `source_magnetic` can coexist from `NXsource`. It is essential to adhere to the uppercase dynamic rules for NeXus groups, fields, and attributes.
Uppercase substrings of a symbol can be dynamically overwritten, allowing for the instantiation of multiple instances. For example, `source_electric` and `source_magnetic` can coexist as childs of an `NXsource`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change IMO

@@ -465,10 +471,12 @@ The `xref` keyword used inside the `doc` to refer any other ontology or any othe
```

## Conclusion
Defining a NeXus application definition or base class in YAML format is not a official structure of NeXus, but it a format to reduce the effor of the application developer to construct an application definition or base class. The `nyaml` is the tool that converts the application definitions or base classes from `YAML` format to `nxdl.xml` (`XML` type) format with any knowledge of `XML` style or syntax. This is a open source software funded by [NFDI](https://www.nfdi.de/) under FARImat progect and sitting on the github repo therefore anyone can create an issue after detecting a bug, suggestion for improvement and open to contribution. The `nyaml` is also [published in PyPi](https://pypi.org/project/nyaml/) and can be installed with `pip` python package manageer.
The Python software `nyaml` is an open-source project to reduce typing effort when working with NeXus data schemas. As a tool developed for software developers, data stewards, and scientists, `nyaml` converts base classes and application definitions bidirectionally between `YAML` and the `XML` serialization of the NeXus Definition Language. The tool is open source accessible through [PyPi](https://pypi.org/project/nyaml/).
Contributions in the form of bug reports and other suggestions for improvements are welcomed on this git repository. The work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 460197019 (FAIRmat). FAIRmat is a consortium within the German National Research Data Infrastructure [German NFDI](https://www.nfdi.de/). The software has been accepted as a community contribution by the NeXus International Advisory Board (NIAC) which substantiates the cross-community interaction and efforts to improve on the interoperability of serialized data artifacts and their more expressive and comprehensive expression using knowledge graphs and semantic technology. The work on `nyaml` is connected to recent efforts within NeXus to express the concepts of NeXus as rigorous semantic artifacts through efforts like the [NeXusOntology](https://github.com/FAIRmat-NFDI/NeXusOntology).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in line with what I proposed on the original draft

@mkuehbach
Copy link
Collaborator Author

Thanks @lukaspie for your feedback, will formulate an updated version, then you can check again and this me merged thereafter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants