Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

Closed
2 of 6 tasks
Tracked by #30550
valentinogiardino opened this issue Nov 26, 2024 · 6 comments · Fixed by #30783
Closed
2 of 6 tasks
Tracked by #30550

Comments

@valentinogiardino
Copy link
Contributor

valentinogiardino commented Nov 26, 2024

Parent Issue

#30550

Task

Implement the POST /content/_import/_validate endpoint to allow the creation and enqueuing of new content import jobs in preview mode. This endpoint will initiate the content import dry run process and return a unique job identifier.

POST /content/_import/_validate

Parameters:

  • Content Type: Specify the content type for the import (e.g., page, asset).
  • CSV File Reference: Reference to the CSV file containing the content to be imported.
  • Language Code(s): Define the language(s) for the imported content.
  • Workflow Action Id: Define the workflow action to apply to the imported content (e.g., publish, review).
  • Key Fields: Specify the fields used to match and update existing content (e.g., contentId, title).

Description:

This endpoint creates a new content import job in preview mode, enqueues it into the job queue, and returns a unique job identifier. The content import will proceed based on the provided parameters and failure handling options.###

Proposed Objective

Core Features

Proposed Priority

Priority 2 - Important

Acceptance Criteria

  • The POST /content/_import/_validate endpoint is implemented and returns a unique job identifier.
  • Basic validation is performed on input parameters (e.g., file type, content type).
  • Appropriate HTTP status codes are returned based on the request outcome.
  • Error messages are clear and actionable
  • The import job is successfully validated.
  • Postman tests are implemented

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

No response

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

@valentinogiardino valentinogiardino self-assigned this Nov 26, 2024
@valentinogiardino valentinogiardino moved this from New to In Progress in dotCMS - Product Planning Nov 26, 2024
@valentinogiardino valentinogiardino changed the title [Content Import Job Management] Implement the Validate Import Job REST endpoint #30669 [Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30669 Nov 26, 2024
@valentinogiardino valentinogiardino changed the title [Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30669 [Content Import Job Management] Implement the Content Import Validate Job REST endpoint Nov 26, 2024
valentinogiardino added a commit that referenced this issue Nov 26, 2024
valentinogiardino added a commit that referenced this issue Nov 26, 2024
valentinogiardino added a commit that referenced this issue Nov 26, 2024
valentinogiardino added a commit that referenced this issue Nov 26, 2024
@valentinogiardino valentinogiardino linked a pull request Nov 26, 2024 that will close this issue
3 tasks
Copy link

valentinogiardino added a commit that referenced this issue Nov 27, 2024
@valentinogiardino valentinogiardino moved this from In Progress to In Review in dotCMS - Product Planning Nov 27, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 28, 2024
### Proposed Changes
* Added a new class `ContentImportParamsSchema` to define the schema for
content import parameters, ensuring better API documentation and
clarity.
* Updated the `ContentImportResource` to include validation for import
parameters and support for a new `_validate` endpoint.
* Refactored `ContentImportResource` to utilize
`ContentImportParamsSchema` for improved parameter handling.
* Enhanced Swagger documentation to include detailed descriptions and
examples for the import parameters.
* Renamed `_import` package to `dotimport` for better naming
consistency.

### Checklist
- [x] Tests  
- [x] Translations  
- [x] Security Implications Contemplated (Added input validation and
restricted endpoint usage based on user permissions.)

### Additional Info
The `_validate` endpoint allows for previewing the content import
process. This enhancement helps users validate their CSV and settings
before committing to an import job.

The changes also address the issue where `ContentImportParamsSchema` was
missing, leading to inaccurate Swagger documentation.

### Swagger Screenshot
<img width="946" alt="image"
src="https://github.com/user-attachments/assets/93934cba-ae73-4540-a8c1-fe5b3aa6d74c">
@github-project-automation github-project-automation bot moved this from In Review to Internal QA in dotCMS - Product Planning Nov 28, 2024
@github-project-automation github-project-automation bot moved this from Internal QA to Current Sprint Backlog in dotCMS - Product Planning Nov 29, 2024
@valentinogiardino valentinogiardino moved this from Current Sprint Backlog to Internal QA in dotCMS - Product Planning Nov 29, 2024
@rjvelazco rjvelazco self-assigned this Dec 2, 2024
@rjvelazco
Copy link
Contributor

Passed Internal QA

  • Tested on docker: [dotcms/dotcms:trunk_910d7e2]

Video

issue-30771-content-import-job-management-implement-the-content-import-validate-job-rest-endpoint-iqa.mov

@fmontes
Copy link
Member

fmontes commented Dec 5, 2024

Testing findings and improvements needed

Current issues:

  • Empty CSV files still create a job that fails later
  • CSV files with wrong headers create a job that fails later
  • Invalid relationship field IDs create jobs that fail during import with DotValidationException
  • Invalid image paths create jobs but failures aren't properly logged
  • Error responses lack detail and guidance for troubleshooting

Improvements needed:

  1. Pre-job validation (synchronous):

    • Validate CSV is not empty
    • Validate CSV headers match content type fields
    • Validate basic file structure/format
    • Return proper error responses for these cases without creating jobs
  2. Async validation (during job):

    • Keep relationship field validation
    • Keep image path validation
    • Keep content-specific validations
    • Improve error logging for image-related failures
  3. API response improvements:

    • Include job status URL in response
    • Add more descriptive error messages
    • Include example of correct format when validation fails
    • Return consistent error structure for all failure cases
  4. Every error message should have these 3 key elements (as possible)

    • What went wrong - Clear description of the error
    • Why it went wrong - Context and reason for the failure
    • How to fix it - Specific steps or examples to resolve the issue

Suggestions:

{
  "status": 400,
  "statusText": "",
  "error": [
    {
      "errorCode": null,
      "fieldName": "jsonForm",
      "message": "Form data is required to specify content type, language and workflow. Please include a valid JSON form object."
    }
  ]
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Content Type 'NonExistentType' not found. Available types are: Blog, News, FakeContent."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Invalid workflow action ID 'invalid-uuid'. Workflow action must be a valid UUID from an existing workflow."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "File must be a valid CSV with required headers: contentHost, title. Please check file format and try again."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Language code 'invalid' not found. Available languages are: 1 (English), 2 (Spanish)."
  }
}

@fmontes
Copy link
Member

fmontes commented Dec 5, 2024

Improvements needed

  1. Bad job id returns error with huge stacktrace: http://localhost:8080/api/v1/jobs/undefined/status

  2. The validation response object needs improvement, some of the issues:

  • Unclear data types - mixing arrays and strings inconsistently
  • Redundant information presented as raw messages rather than structured data
  • Missing clear categorization - metadata mixes different concerns
  • Non-standardized error/warning format - just string messages

Current response:

{
  "result": {
    "errorDetail": null,
    "metadata": {
      "counters": [],
      "errors": [],
      "identifiers": [],
      "lastInode": [],
      "messages": [
        "1 headers found on the file matches all the Content Type fields.",
        "4 lines of data were read.",
        "Attempting to create 3 contentlets - check below for errors affecting input"
      ],
      "results": [
        "3 New \"Fake Content\" were created.",
        "0 \"Fake Content\" content updated corresponding to 0 repeated content based on the key provided"
      ],
      "updatedInodes": [],
      "warnings": [
        "Header \"contentHost\" doesn't match any Content Type field; this column of data will be ignored.",
        "No key fields were chosen, this could result in duplicated content.",
        "Not all the Content Type fields match the file headers. Some fields may be empty."
      ],
      "wfActionId": []
    }
  }
}

Can have a more structured approach like this:

{
  "results": {
    "file": {
      "totalRows": 4,
      "parsedRows": 3,
      "headers": {
        "valid": ["title"],
        "invalid": ["contentHost"],
        "missing": ["image", "blogs"]
      }
    },
    "data": {
      "processed": {
        "valid": 3,
        "invalid": 0
      },
      "summary": {
        "created": 3,
        "updated": 0,
        "contentType": "Fake Content"
      }
    },
    "warnings": [
      {
        "code": "INVALID_HEADER",
        "field": "contentHost",
        "message": "Header doesn't match any Content Type field; column will be ignored"
      },
      {
        "code": "MISSING_KEY_FIELD",
        "message": "No key fields specified, may result in duplicate content"
      },
      {
        "code": "INCOMPLETE_HEADERS",
        "message": "Not all Content Type fields match file headers. Some fields may be empty"
      }
    ],
    "errors": [
      {
        "code": "INVALID_FILE_TYPE",
        "message": "File type is not supported"
      },
      {
        "code": "INVALID_IMAGE_PATH",
        "row": 2,
        "field": "image",
        "value": "/invalid/path.jpg",
        "message": "Image path not found in Site Browser"
      },
      {
        "code": "INVALID_RELATIONSHIP",
        "row": 3,
        "field": "blogs",
        "value": "invalid-blog-id",
        "message": "Blog with ID 'invalid-blog-id' not found"
      },
      {
        "code": "REQUIRED_FIELD_MISSING",
        "row": 4,
        "field": "title",
        "message": "Required field 'title' is missing"
      }
    ]
  }
}

The improved version is better because it:

  • Uses consistent data structures
  • Groups related data logically
  • Standardizes error/warning formats with codes
  • Removes redundant information
  • Makes the data machine-parseable while remaining human-readable

@nollymar
Copy link
Contributor

nollymar commented Dec 6, 2024

@fmontes please create separate issues for 1 and 2. These errors you got belong to the jobs endpoint (api/v1/jobs) and this card covers the new /content/_import/_validate

@nollymar
Copy link
Contributor

nollymar commented Dec 6, 2024

@nollymar nollymar moved this from QA - Backlog to Done in dotCMS - Product Planning Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants