Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add WRROC models, validators & unit tests #20

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
67 changes: 34 additions & 33 deletions .github/workflows/ci.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, but with the exception of the last couple of lines, the changes here are not related to this PR. And given that the PR is already huge and difficult to review, I would strongly suggest not to blow it up with anything that isn't directly related to its core focus.

My suggestion:

  • Open a new PR ci: reformat, reset branches & upgrade actions (or similar) from a branch created from the default branch that introduces all of the changes introduced here, except for adding the job for running the unit tests (however, do not include the commented stuff, just remove it for now). We can then merge this PR immediately and you can merge the changes into this branch.
  • In this branch, you then just add the job for running the unit tests.

In this way, we already merge some unrelated changes from this PR and make it a bit smaller.

I will probably suggest some more such points, because this PR is dragging along waaay too long, which often happens when there are too many different things being cobbled together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll move the changes related to .github/workflows/ci.yml into a new PR, as suggested. I’ll only include the job for running the unit tests in this current PR. This should make the current PR smaller and easier to review.

Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,44 @@ name: CI

on:
push:
branches: [ main ]
branches: '*'
pull_request:
branches: [ main ]
branches: '*'

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.11'

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
poetry install

- name: Lint with Ruff
run: |
poetry run ruff check crategen/

- name: Type check with Mypy
run: |
poetry run mypy crategen/

- name: Run security checks with Bandit
run: |
poetry run bandit -r crategen/

- name: Install test dependencies
run: |
poetry add pytest pytest-cov pytest-mock

# - name: Run tests
# run: |
# poetry run pytest --cov=crategen
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
poetry install

- name: Lint with Ruff
run: |
poetry run ruff check crategen/
if: ${{ success() }}

- name: Type check with Mypy
run: |
poetry run mypy crategen/

- name: Run security checks with Bandit
run: |
poetry run bandit -r crategen/

- name: Install test dependencies
run: |
poetry add pytest pytest-cov pytest-mock

- name: Run tests
run: |
poetry run pytest --cov=crategen
33 changes: 20 additions & 13 deletions crategen/cli.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, all the changes introduced here are linting/style-related changes. They are all fine, and I do think they should be made, but please not in this PR.

Please create a separate PR for linting/style changes, where you can do reformatting changes across the entire codebase. I suggest you create that one ASAP, we quickly review and merge it into the default branch, and then you can merge the default branch into this branch to get rid of many of the suggested changes in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll create a separate PR for the linting and style-related changes in crategen/cli.py and other files, as advised. This will help keep the PR focused on core functionality.

Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
import click
import json

import click

from crategen.converter_manager import ConverterManager


@click.command()
@click.option('--input', prompt='Input file', help='Path to the input JSON file.')
@click.option('--output', prompt='Output file', help='Path to the output JSON file.')
@click.option('--conversion-type', prompt='Conversion type', type=click.Choice(['tes-to-wrroc', 'wes-to-wrroc']), help='Type of conversion to perform.')
@click.option("--input", prompt="Input file", help="Path to the input JSON file.")
@click.option("--output", prompt="Output file", help="Path to the output JSON file.")
@click.option(
"--conversion-type",
prompt="Conversion type",
type=click.Choice(["tes-to-wrroc", "wes-to-wrroc"]),
help="Type of conversion to perform.",
)
def cli(input, output, conversion_type):
"""
Command Line Interface for converting TES/WES to WRROC.
"""
"""Command Line Interface for converting TES/WES to WRROC."""
manager = ConverterManager()

# Load input data from JSON file
with open(input, 'r') as input_file:
with open(input) as input_file:
data = json.load(input_file)

# Perform the conversion based on the specified type
if conversion_type == 'tes-to-wrroc':
if conversion_type == "tes-to-wrroc":
result = manager.convert_tes_to_wrroc(data)
elif conversion_type == 'wes-to-wrroc':
elif conversion_type == "wes-to-wrroc":
result = manager.convert_wes_to_wrroc(data)

# Save the result to the output JSON file
with open(output, 'w') as output_file:
with open(output, "w") as output_file:
json.dump(result, output_file, indent=4)

if __name__ == '__main__':

if __name__ == "__main__":
cli()
1 change: 1 addition & 0 deletions crategen/converter_manager.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/format change unrelated to this PR: See above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll move the style and formatting changes from crategen/converter_manager.py into the linting/style PR.

Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from .converters.tes_converter import TESConverter
from .converters.wes_converter import WESConverter


class ConverterManager:
def __init__(self):
self.tes_converter = TESConverter()
Expand Down
5 changes: 3 additions & 2 deletions crategen/converters/abstract_converter.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just style/format changes: See above

Exception: The wrroc_data to data renaming should stay in this PR (or again a separate PR, I will tell you later when I've looked through everything).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll retain the wrroc_data to data renaming in this PR, as you mentioned. I’ll move the other style/format changes into a separate PR for linting and formatting.

Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
from abc import ABC, abstractmethod


class AbstractConverter(ABC):
@abstractmethod
def convert_to_wrroc(self, data):
"""Convert data to WRROC format"""

@abstractmethod
def convert_from_wrroc(self, wrroc_data):
def convert_from_wrroc(self, data):
"""Convert WRROC data to the original format"""
149 changes: 105 additions & 44 deletions crategen/converters/tes_converter.py
Original file line number Diff line number Diff line change
@@ -1,52 +1,113 @@
from datetime import datetime

from pydantic import AnyUrl, ValidationError

from ..models.tes_models import (
TESData,
TESExecutor,
TESInput,
TESOutput,
TESState,
TESTaskLog,
)
from ..models.wrroc_models import WRROCDataTES
from ..validators import validate_wrroc_tes
from .abstract_converter import AbstractConverter
uniqueg marked this conversation as resolved.
Show resolved Hide resolved
from .utils import convert_to_iso8601


class TESConverter(AbstractConverter):
def convert_to_wrroc(self, data: dict) -> dict:
"""
Convert TES data to WRROC format.

Args:
data: The input TES data.

Returns:
The converted WRROC data.

Raises:
ValidationError: If TES data is invalid.
"""
# Validate TES data
uniqueg marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all comments from the code in this entire module, I think the code is trivial enough to understand without comments (and the comments are distracting and sometimes misleading).

Please also remove all trivial comments from any other modules. I would say that this should be all or almost all comments. There are no complex algorithms in the project and no workarounds for bugs in other libraries etc so far (as far as I remember). If you feel that your code is too complex to understand just by reading it and the associated docstring, consider refactoring it to make it simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll remove the comments from crategen/converters/tes_converter.py and other modules as requested. I’ll focus on refactoring the code where necessary to ensure clarity without comments.

try:
data_tes = TESData(**data)
except ValidationError as e:
raise ValueError(f"Invalid TES data: {e.errors()}") from e

executors = data_tes.executors
end_time = data_tes.logs[0].end_time if data_tes.logs else None

def convert_to_wrroc(self, tes_data):
# Validate and extract data with defaults
id = tes_data.get("id", "")
name = tes_data.get("name", "")
description = tes_data.get("description", "")
executors = tes_data.get("executors", [{}])
inputs = tes_data.get("inputs", [])
outputs = tes_data.get("outputs", [])
creation_time = tes_data.get("creation_time", "")
end_time = tes_data.get("logs", [{}])[0].get("end_time", "") # Corrected to fetch from logs

# Convert to WRROC
wrroc_data = {
"@id": id,
"name": name,
"description": description,
"instrument": executors[0].get("image", None) if executors else None,
"object": [{"@id": input.get("url", ""), "name": input.get("path", "")} for input in inputs],
"result": [{"@id": output.get("url", ""), "name": output.get("path", "")} for output in outputs],
"startTime": convert_to_iso8601(creation_time),
"endTime": convert_to_iso8601(end_time),
"@id": data_tes.id,
"name": data_tes.name,
"description": data_tes.description,
"instrument": executors[0].image if executors else None,
uniqueg marked this conversation as resolved.
Show resolved Hide resolved
uniqueg marked this conversation as resolved.
Show resolved Hide resolved
"object": [
{"@id": input.url, "name": input.path} for input in data_tes.inputs
],
"result": [
{"@id": output.url, "name": output.path} for output in data_tes.outputs
],
"startTime": data_tes.creation_time,
"endTime": end_time,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to validate your wrroc_data before you return it? Or do you do this somewhere else upstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I’ll add a validation step for wrroc_data in the convert_to_wrroc method of tes_converter.py to ensure the data conforms to the expected schema before returning it.


validate_wrroc_tes(wrroc_data)
return wrroc_data

def convert_from_wrroc(self, wrroc_data):
# Validate and extract data with defaults
id = wrroc_data.get("@id", "")
name = wrroc_data.get("name", "")
description = wrroc_data.get("description", "")
instrument = wrroc_data.get("instrument", "")
object_data = wrroc_data.get("object", [])
result_data = wrroc_data.get("result", [])
start_time = wrroc_data.get("startTime", "")
end_time = wrroc_data.get("endTime", "")

# Convert from WRROC to TES
tes_data = {
"id": id,
"name": name,
"description": description,
"executors": [{"image": instrument}],
"inputs": [{"url": obj.get("@id", ""), "path": obj.get("name", "")} for obj in object_data],
"outputs": [{"url": res.get("@id", ""), "path": res.get("name", "")} for res in result_data],
"creation_time": start_time,
"logs": [{"end_time": end_time}], # Added to logs
}
return tes_data
def convert_from_wrroc(self, data: dict) -> dict:
"""
Convert WRROC data to TES format.

Args:
data: The input WRROC data.

Returns:
The converted TES data.

Raises:
ValidationError: If WRROC data is invalid.
"""
# Validate WRROC data
try:
data_wrroc = WRROCDataTES(**data)
except ValidationError as e:
raise ValueError(f"Invalid WRROC data: {e.errors()}") from e

# Convert URL strings to AnyUrl
tes_inputs = [TESInput(url=AnyUrl(url=obj.id), path=obj.name) for obj in data_wrroc.object]
tes_outputs = [TESOutput(url=AnyUrl(url=res.id), path=res.name) for res in data_wrroc.result]

# Ensure 'image' and 'command' fields are provided
tes_executors = [TESExecutor(image=data_wrroc.instrument or "", command=[])] # Provide default empty list for command

# Ensure correct type for end_time (datetime)
end_time = datetime.fromisoformat(data_wrroc.endTime) if data_wrroc.endTime else None

tes_logs = [
TESTaskLog(
logs=[],
metadata=None,
start_time=None,
end_time=end_time,
outputs=[],
system_logs=None
)
]

tes_data = TESData(
id=data_wrroc.id,
name=data_wrroc.name,
description=data_wrroc.description,
executors=tes_executors,
inputs=tes_inputs,
outputs=tes_outputs,
creation_time=None,
logs=tes_logs,
state=TESState.UNKNOWN
)

# Validate TES data before returning
tes_data = TESData(**tes_data.dict())
return tes_data.dict()
Loading
Loading