[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

valentinogiardino · 2024-11-26T02:55:47Z

Parent Issue

Task

Implement the POST /content/_import/_validate endpoint to allow the creation and enqueuing of new content import jobs in preview mode. This endpoint will initiate the content import dry run process and return a unique job identifier.

POST /content/_import/_validate

Parameters:

Content Type: Specify the content type for the import (e.g., page, asset).
CSV File Reference: Reference to the CSV file containing the content to be imported.
Language Code(s): Define the language(s) for the imported content.
Workflow Action Id: Define the workflow action to apply to the imported content (e.g., publish, review).
Key Fields: Specify the fields used to match and update existing content (e.g., contentId, title).

Description:

This endpoint creates a new content import job in preview mode, enqueues it into the job queue, and returns a unique job identifier. The content import will proceed based on the provided parameters and failure handling options.###

Proposed Objective

Core Features

Proposed Priority

Priority 2 - Important

Acceptance Criteria

The POST /content/_import/_validate endpoint is implemented and returns a unique job identifier.
Basic validation is performed on input parameters (e.g., file type, content type).
Appropriate HTTP status codes are returned based on the request outcome.
Error messages are clear and actionable
The import job is successfully validated.
Postman tests are implemented

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

No response

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-26T21:25:00Z

PRs:

feat(contentImport) Implement Validate endpoint #30771 #30783

### Proposed Changes * Added a new class `ContentImportParamsSchema` to define the schema for content import parameters, ensuring better API documentation and clarity. * Updated the `ContentImportResource` to include validation for import parameters and support for a new `_validate` endpoint. * Refactored `ContentImportResource` to utilize `ContentImportParamsSchema` for improved parameter handling. * Enhanced Swagger documentation to include detailed descriptions and examples for the import parameters. * Renamed `_import` package to `dotimport` for better naming consistency. ### Checklist - [x] Tests - [x] Translations - [x] Security Implications Contemplated (Added input validation and restricted endpoint usage based on user permissions.) ### Additional Info The `_validate` endpoint allows for previewing the content import process. This enhancement helps users validate their CSV and settings before committing to an import job. The changes also address the issue where `ContentImportParamsSchema` was missing, leading to inaccurate Swagger documentation. ### Swagger Screenshot <img width="946" alt="image" src="https://github.com/user-attachments/assets/93934cba-ae73-4540-a8c1-fe5b3aa6d74c">

rjvelazco · 2024-12-02T18:48:23Z

Passed Internal QA

Tested on docker: [dotcms/dotcms:trunk_910d7e2]

Video

issue-30771-content-import-job-management-implement-the-content-import-validate-job-rest-endpoint-iqa.mov

fmontes · 2024-12-05T20:47:12Z

Testing findings and improvements needed

Current issues:

Empty CSV files still create a job that fails later
CSV files with wrong headers create a job that fails later
Invalid relationship field IDs create jobs that fail during import with DotValidationException
Invalid image paths create jobs but failures aren't properly logged
Error responses lack detail and guidance for troubleshooting

Improvements needed:

Pre-job validation (synchronous):
- Validate CSV is not empty
- Validate CSV headers match content type fields
- Validate basic file structure/format
- Return proper error responses for these cases without creating jobs
Async validation (during job):
- Keep relationship field validation
- Keep image path validation
- Keep content-specific validations
- Improve error logging for image-related failures
API response improvements:
- Include job status URL in response
- Add more descriptive error messages
- Include example of correct format when validation fails
- Return consistent error structure for all failure cases
Every error message should have these 3 key elements (as possible)
- What went wrong - Clear description of the error
- Why it went wrong - Context and reason for the failure
- How to fix it - Specific steps or examples to resolve the issue

Suggestions:

{
  "status": 400,
  "statusText": "",
  "error": [
    {
      "errorCode": null,
      "fieldName": "jsonForm",
      "message": "Form data is required to specify content type, language and workflow. Please include a valid JSON form object."
    }
  ]
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Content Type 'NonExistentType' not found. Available types are: Blog, News, FakeContent."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Invalid workflow action ID 'invalid-uuid'. Workflow action must be a valid UUID from an existing workflow."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "File must be a valid CSV with required headers: contentHost, title. Please check file format and try again."
  }
}

{
  "status": 400,
  "statusText": "",
  "error": {
    "message": "Language code 'invalid' not found. Available languages are: 1 (English), 2 (Spanish)."
  }
}

fmontes · 2024-12-05T21:50:53Z

Improvements needed

Bad job id returns error with huge stacktrace: http://localhost:8080/api/v1/jobs/undefined/status
The validation response object needs improvement, some of the issues:

Unclear data types - mixing arrays and strings inconsistently
Redundant information presented as raw messages rather than structured data
Missing clear categorization - metadata mixes different concerns
Non-standardized error/warning format - just string messages

Current response:

{
  "result": {
    "errorDetail": null,
    "metadata": {
      "counters": [],
      "errors": [],
      "identifiers": [],
      "lastInode": [],
      "messages": [
        "1 headers found on the file matches all the Content Type fields.",
        "4 lines of data were read.",
        "Attempting to create 3 contentlets - check below for errors affecting input"
      ],
      "results": [
        "3 New \"Fake Content\" were created.",
        "0 \"Fake Content\" content updated corresponding to 0 repeated content based on the key provided"
      ],
      "updatedInodes": [],
      "warnings": [
        "Header \"contentHost\" doesn't match any Content Type field; this column of data will be ignored.",
        "No key fields were chosen, this could result in duplicated content.",
        "Not all the Content Type fields match the file headers. Some fields may be empty."
      ],
      "wfActionId": []
    }
  }
}

Can have a more structured approach like this:

{
  "results": {
    "file": {
      "totalRows": 4,
      "parsedRows": 3,
      "headers": {
        "valid": ["title"],
        "invalid": ["contentHost"],
        "missing": ["image", "blogs"]
      }
    },
    "data": {
      "processed": {
        "valid": 3,
        "invalid": 0
      },
      "summary": {
        "created": 3,
        "updated": 0,
        "contentType": "Fake Content"
      }
    },
    "warnings": [
      {
        "code": "INVALID_HEADER",
        "field": "contentHost",
        "message": "Header doesn't match any Content Type field; column will be ignored"
      },
      {
        "code": "MISSING_KEY_FIELD",
        "message": "No key fields specified, may result in duplicate content"
      },
      {
        "code": "INCOMPLETE_HEADERS",
        "message": "Not all Content Type fields match file headers. Some fields may be empty"
      }
    ],
    "errors": [
      {
        "code": "INVALID_FILE_TYPE",
        "message": "File type is not supported"
      },
      {
        "code": "INVALID_IMAGE_PATH",
        "row": 2,
        "field": "image",
        "value": "/invalid/path.jpg",
        "message": "Image path not found in Site Browser"
      },
      {
        "code": "INVALID_RELATIONSHIP",
        "row": 3,
        "field": "blogs",
        "value": "invalid-blog-id",
        "message": "Blog with ID 'invalid-blog-id' not found"
      },
      {
        "code": "REQUIRED_FIELD_MISSING",
        "row": 4,
        "field": "title",
        "message": "Required field 'title' is missing"
      }
    ]
  }
}

The improved version is better because it:

Uses consistent data structures
Groups related data logically
Standardizes error/warning formats with codes
Removes redundant information
Makes the data machine-parseable while remaining human-readable

nollymar · 2024-12-06T18:54:43Z

@fmontes please create separate issues for 1 and 2. These errors you got belong to the jobs endpoint (api/v1/jobs) and this card covers the new /content/_import/_validate

nollymar · 2024-12-06T22:21:01Z

We have created these cards to address the issues and suggested improvements:

valentinogiardino added Team : Scout Type : Task labels Nov 26, 2024

valentinogiardino self-assigned this Nov 26, 2024

valentinogiardino added this to dotCMS - Product Planning Nov 26, 2024

github-project-automation bot moved this to New in dotCMS - Product Planning Nov 26, 2024

valentinogiardino mentioned this issue Nov 26, 2024

New REST Endpoints for Content Import Job Management #30550

Open

valentinogiardino moved this from New to In Progress in dotCMS - Product Planning Nov 26, 2024

valentinogiardino changed the title ~~[Content Import Job Management] Implement the Validate Import Job REST endpoint #30669~~ [Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30669 Nov 26, 2024

valentinogiardino changed the title ~~[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30669~~ [Content Import Job Management] Implement the Content Import Validate Job REST endpoint Nov 26, 2024

valentinogiardino added a commit that referenced this issue Nov 26, 2024

#30771 add swagger schema

2718697

valentinogiardino added a commit that referenced this issue Nov 26, 2024

#30771 add /_validate endpoint

b1af973

valentinogiardino added a commit that referenced this issue Nov 26, 2024

#30771 add it

f90b84f

valentinogiardino added a commit that referenced this issue Nov 26, 2024

#30771 add postman tests

c3faf06

valentinogiardino added a commit that referenced this issue Nov 26, 2024

#30771 rename package

4dfe390

valentinogiardino linked a pull request Nov 26, 2024 that will close this issue

feat(contentImport) Implement Validate endpoint #30771 #30783

Merged

3 tasks

valentinogiardino added a commit that referenced this issue Nov 27, 2024

#30771 fix path in postman tests

2b65763

valentinogiardino added a commit that referenced this issue Nov 27, 2024

#30771 add swagger response schema

e0c033a

valentinogiardino added a commit that referenced this issue Nov 27, 2024

#30771 add it

2bfda10

valentinogiardino moved this from In Progress to In Review in dotCMS - Product Planning Nov 27, 2024

valentinogiardino closed this as completed in #30783 Nov 28, 2024

github-project-automation bot moved this from In Review to Internal QA in dotCMS - Product Planning Nov 28, 2024

github-actions bot mentioned this issue Nov 28, 2024

feat(contentImport) Implement Validate endpoint #30771 #30783

Merged

3 tasks

valentinogiardino reopened this Nov 29, 2024

github-project-automation bot moved this from Internal QA to Current Sprint Backlog in dotCMS - Product Planning Nov 29, 2024

valentinogiardino moved this from Current Sprint Backlog to Internal QA in dotCMS - Product Planning Nov 29, 2024

valentinogiardino added Merged QA : Needs Internal labels Nov 29, 2024

nollymar unassigned valentinogiardino Nov 29, 2024

rjvelazco self-assigned this Dec 2, 2024

rjvelazco closed this as completed Dec 2, 2024

rjvelazco moved this from Internal QA to QA - Backlog in dotCMS - Product Planning Dec 2, 2024

rjvelazco added QA : Passed Internal and removed QA : Needs Internal labels Dec 2, 2024

rjvelazco removed their assignment Dec 2, 2024

rjvelazco added OKR : Core Features Owned by Will Priority : 2 High labels Dec 2, 2024

nollymar added Doc : Needs Doc Release : 24.12.05 labels Dec 4, 2024

nollymar mentioned this issue Dec 6, 2024

Spike: Define Requirements for Improving Error Handling in Content Import Logic #30884

Open

nollymar moved this from QA - Backlog to Done in dotCMS - Product Planning Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

valentinogiardino commented Nov 26, 2024 •

edited by rjvelazco

Loading

github-actions bot commented Nov 26, 2024

rjvelazco commented Dec 2, 2024

fmontes commented Dec 5, 2024 •

edited

Loading

fmontes commented Dec 5, 2024

nollymar commented Dec 6, 2024

nollymar commented Dec 6, 2024 •

edited

Loading

[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

[Content Import Job Management] Implement the Content Import Validate Job REST endpoint #30771

Comments

valentinogiardino commented Nov 26, 2024 • edited by rjvelazco Loading

Parent Issue

Task

Parameters:

Description:

Proposed Objective

Proposed Priority

Acceptance Criteria

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

Assumptions & Initiation Needs

Quality Assurance Notes & Workarounds

Sub-Tasks & Estimates

github-actions bot commented Nov 26, 2024

rjvelazco commented Dec 2, 2024

Passed Internal QA

Video

fmontes commented Dec 5, 2024 • edited Loading

Testing findings and improvements needed

fmontes commented Dec 5, 2024

Improvements needed

nollymar commented Dec 6, 2024

nollymar commented Dec 6, 2024 • edited Loading

valentinogiardino commented Nov 26, 2024 •

edited by rjvelazco

Loading

fmontes commented Dec 5, 2024 •

edited

Loading

nollymar commented Dec 6, 2024 •

edited

Loading