diff --git a/README.md b/README.md
index 61c08688..13e1a122 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,21 @@
# JBZoo / CSV Blueprint
-[![CI](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/main.yml?query=branch%3Amaster) [![CI](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml/badge.svg)](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml) [![CI](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/publish.yml/badge.svg)](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/publish.yml) [![Coverage Status](https://coveralls.io/repos/github/JBZoo/Csv-Blueprint/badge.svg?branch=master)](https://coveralls.io/github/JBZoo/Csv-Blueprint?branch=master) [![Psalm Coverage](https://shepherd.dev/github/JBZoo/Csv-Blueprint/coverage.svg)](https://shepherd.dev/github/JBZoo/Csv-Blueprint) [![GitHub License](https://img.shields.io/github/license/jbzoo/csv-blueprint)](https://github.com/JBZoo/Csv-Blueprint/blob/master/LICENSE)
-[![GitHub Release](https://img.shields.io/github/v/release/jbzoo/csv-blueprint?label=Latest)](https://github.com/jbzoo/csv-blueprint/releases) [![Total Downloads](https://poser.pugx.org/jbzoo/csv-blueprint/downloads)](https://packagist.org/packages/jbzoo/csv-blueprint/stats) [![Docker Pulls](https://img.shields.io/docker/pulls/jbzoo/csv-blueprint.svg)](https://hub.docker.com/r/jbzoo/csv-blueprint/tags) [![Docker Image Size](https://img.shields.io/docker/image-size/jbzoo/csv-blueprint)](https://hub.docker.com/r/jbzoo/csv-blueprint/tags)
+
+[![CI](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/main.yml?query=branch%3Amaster)
+[![CI](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml/badge.svg)](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml)
+[![Coverage Status](https://coveralls.io/repos/github/JBZoo/Csv-Blueprint/badge.svg?branch=master)](https://coveralls.io/github/JBZoo/Csv-Blueprint?branch=master)
+[![Psalm Coverage](https://shepherd.dev/github/JBZoo/Csv-Blueprint/coverage.svg)](https://shepherd.dev/github/JBZoo/Csv-Blueprint)
+[![GitHub Release](https://img.shields.io/github/v/release/jbzoo/csv-blueprint?label=Latest)](https://github.com/jbzoo/csv-blueprint/releases)
+[![Total Downloads](https://poser.pugx.org/jbzoo/csv-blueprint/downloads)](https://packagist.org/packages/jbzoo/csv-blueprint/stats)
+[![Docker Pulls](https://img.shields.io/docker/pulls/jbzoo/csv-blueprint.svg)](https://hub.docker.com/r/jbzoo/csv-blueprint/tags)
+
-[![Static Badge](https://img.shields.io/badge/Rules-364-green?label=Total%20number%20of%20rules&labelColor=darkgreen&color=gray)](schema-examples/full.yml) [![Static Badge](https://img.shields.io/badge/Rules-153-green?label=Cell%20rules&labelColor=blue&color=gray)](src/Rules/Cell) [![Static Badge](https://img.shields.io/badge/Rules-206-green?label=Aggregate%20rules&labelColor=blue&color=gray)](src/Rules/Aggregate) [![Static Badge](https://img.shields.io/badge/Rules-5-green?label=Extra%20checks&labelColor=blue&color=gray)](#extra-checks) [![Static Badge](https://img.shields.io/badge/Rules-32/54/8-green?label=Plan%20to%20add&labelColor=gray&color=gray)](tests/schemas/todo.yml)
+[![Static Badge](https://img.shields.io/badge/Rules-366-green?label=Total%20number%20of%20rules&labelColor=darkgreen&color=gray)](schema-examples/full.yml)
+[![Static Badge](https://img.shields.io/badge/Rules-153-green?label=Cell%20rules&labelColor=blue&color=gray)](src/Rules/Cell)
+[![Static Badge](https://img.shields.io/badge/Rules-206-green?label=Aggregate%20rules&labelColor=blue&color=gray)](src/Rules/Aggregate)
+[![Static Badge](https://img.shields.io/badge/Rules-7-green?label=Extra%20checks&labelColor=blue&color=gray)](#extra-checks)
+[![Static Badge](https://img.shields.io/badge/Rules-32/54/13-green?label=Plan%20to%20add&labelColor=gray&color=gray)](tests/schemas/todo.yml)
A console utility designed for validating CSV files against a strictly defined schema and validation rules outlined
@@ -12,6 +23,37 @@ in [YAML files](#schema-definition) serves an essential purpose in ensuring data
This utility facilitates automated checks to verify that the structure and content of CSV files adhere to predefined
specifications, making it invaluable in scenarios where data quality and consistency are critical.
+
+
+
+
+- [Introduction](#introduction)
+ - [Why?](#why)
+ - [Features](#features)
+ - [Live Demo](#live-demo)
+- [Usage](#usage)
+ - [GitHub Action](#github-action)
+ - [Docker container](#docker-container)
+ - [PHP binary](#php-binary)
+- [Schema definition](#schema-definition)
+ - [Full description of the schema](#full-description-of-the-schema)
+ - [Extra checks](#extra-checks)
+- [Complete CLI Help Message](#complete-cli-help-message)
+- [Report examples](#report-examples)
+- [Benchmarks](#benchmarks)
+ - [Brief conclusions](#brief-conclusions)
+ - [Examples of CSV files](#examples-of-csv-files)
+ - [Run benchmark locally](#run-benchmark-locally)
+- [Disadvantages?](#disadvantages)
+- [Coming soon](#coming-soon)
+- [Contributing](#contributing)
+- [License](#license)
+- [See Also](#see-also)
+
+
+
+## Introduction
+
### Why?
* **Data Integration:** When integrating data from multiple sources, ensuring that incoming CSV files meet expected
@@ -52,6 +94,113 @@ specifications, making it invaluable in scenarios where data quality and consist
* [demo_valid.yml](tests/schemas/demo_valid.yml)
* [demo.csv](tests/fixtures/demo.csv)
+## Usage
+
+You can find launch examples in the [workflow demo](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml).
+
+### GitHub Action
+
+
+```yml
+- uses: jbzoo/csv-blueprint@master # See the specific version on releases page
+ with:
+ # Path(s) to validate. You can specify path in which CSV files will be searched. Feel free to use glob pattrens. Usage examples: /full/path/file.csv, p/file.csv, p/*.csv, p/**/*.csv, p/**/name-*.csv, **/*.csv, etc.
+ # Required: true
+ csv: './tests/**/*.csv'
+
+ # Schema filepath. It can be a YAML, JSON or PHP. See examples on GitHub.
+ # Required: true
+ schema: './tests/**/*.yml'
+
+ # Report format. Available options: text, table, github, gitlab, teamcity, junit.
+ # Default value: table
+ # You can skip it
+ report: table
+
+ # Quick mode. It will not validate all rows. It will stop after the first error.
+ # Default value: no
+ # You can skip it
+ quick: no
+
+ # Skip schema validation. If you are sure that the schema is correct, you can skip this check.
+ # Default value: no
+ # You can skip it
+ skip-schema: no
+```
+
+
+You can specify `report: github` to see friendly error output in your PRs
+using [annotations](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message).
+This allows you to see bugs in the GitHub interface at the PR level.
+See [the PR as a live demo](https://github.com/JBZoo/Csv-Blueprint-Demo/pull/1/files). That is, the error will be shown
+in a specific place in the CSV file right in diff of your Pull Requests!
+
+![GitHub Actions - PR](.github/assets/github-actions-pr.png)
+
+
+ Click to see example in GitHub Actions terminal
+
+![GitHub Actions - Terminal](.github/assets/github-actions-termintal.png)
+
+
+
+### Docker container
+
+Ensure you have Docker installed on your machine.
+
+```sh
+# Pull the Docker image
+docker pull jbzoo/csv-blueprint:latest
+
+# Run the tool inside Docker
+docker run --rm \
+ --workdir=/parent-host \
+ -v $(pwd):/parent-host \
+ jbzoo/csv-blueprint:latest \
+ validate:csv \
+ --csv=./tests/fixtures/demo.csv \
+ --schema=./tests/schemas/demo_invalid.yml \
+ --ansi -vvv
+
+# OR build it from source.
+git clone git@github.com:JBZoo/Csv-Blueprint.git csv-blueprint
+cd csv-blueprint
+make docker-build # local tag is "jbzoo/csv-blueprint:local"
+```
+
+### PHP binary
+
+
+ Click to see PHAR and PHP binary ways
+
+Ensure you have PHP installed on your machine.
+
+```sh
+# download the latest version
+
+wget https://github.com/JBZoo/Csv-Blueprint/releases/latest/download/csv-blueprint.phar
+chmod +x ./csv-blueprint.phar
+./csv-blueprint.phar validate:csv \
+ --csv=./tests/fixtures/demo.csv \
+ --schema=./tests/schemas/demo_invalid.yml
+
+# OR create project via Composer (--no-dev is optional)
+composer create-project --no-dev jbzoo/csv-blueprint
+cd ./csv-blueprint
+./csv-blueprint validate:csv \
+ --csv=./tests/fixtures/demo.csv \
+ --schema=./tests/schemas/demo_invalid.yml
+
+# OR build from source
+git clone git@github.com:jbzoo/csv-blueprint.git csv-blueprint
+cd csv-blueprint
+make build
+./csv-blueprint validate:csv \
+ --csv=./tests/fixtures/demo.csv \
+ --schema=./tests/schemas/demo_invalid.yml
+```
+
+
## Schema definition
Define your CSV validation schema in a [YAML](schema-examples/full.yml). Other formats are also available: [JSON](schema-examples/full.json), [PHP](schema-examples/full.php).
@@ -59,22 +208,33 @@ Define your CSV validation schema in a [YAML](schema-examples/full.yml). Other f
This example defines a simple schema for a CSV file with a header row, specifying that the `id` column must not be empty and must contain integer values.
Also, it checks that the `name` column has a minimum length of 3 characters.
+
+
```yml
+name: Simple CSV Schema
+filename_pattern: /my-favorite-csv-\d+\.csv$/i
csv:
- delimiter: ;
+ delimiter: ';'
columns:
- name: id
rules:
not_empty: true
is_int: true
+ aggregate_rules:
+ is_unique: true
+ sorted: [ asc, numeric ]
- name: name
rules:
- min_length: 3
+ length_min: 3
+ aggregate_rules:
+ count: 10
```
+
+
### Full description of the schema
@@ -96,9 +256,6 @@ This part of the readme is also covered by autotests, so these code are always u
In any unclear situation, look into it first.
-
- CLICK HERE to see the most complete description of ALL features!
-
```yml
# It's a complete example of the CSV schema file in YAML format.
@@ -113,14 +270,12 @@ description: | # Any description of the CSV file. Not u
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
-
# Regular expression to match the file name. If not set, then no pattern check.
# This allows you to pre-validate the file name before processing its contents.
# Feel free to check parent directories as well.
-# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
+# See: https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i
-
# Here are default values to parse CSV file.
# You can skip this section if you don't need to override the default values.
csv:
@@ -131,6 +286,12 @@ csv:
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.
+# Structural rules for the CSV file. These rules are applied to the entire CSV file.
+# They are not(!) related to the data in the columns.
+# You can skip this section if you don't need to override the default values.
+structural_rules: # Here are default values.
+ strict_column_order: true # Ensure columns in CSV follow the same order as defined in this YML schema. It works only if "csv.header" is true.
+ allow_extra_columns: false # Allow CSV files to have more columns than specified in this YML schema.
# Description of each column in CSV.
# It is recommended to present each column in the same order as presented in the CSV file.
@@ -165,7 +326,7 @@ columns:
allow_values: [ y, n, "" ] # Strict set of values that are allowed.
not_allow_values: [ invalid ] # Strict set of values that are NOT allowed.
- # Any valid regex pattern. See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
+ # Any valid regex pattern. See: https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
# Of course it's a super powerful tool to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
@@ -498,7 +659,7 @@ columns:
contraharmonic_mean_max: 9.0 # x <= 9.0
# Root mean square (quadratic mean) The square root of the arithmetic mean of the squares of a set of numbers.
- # See https://en.wikipedia.org/wiki/Root_mean_square
+ # See: https://en.wikipedia.org/wiki/Root_mean_square
root_mean_square_min: 1.0 # x >= 1.0
root_mean_square_greater: 2.0 # x > 2.0
root_mean_square_not: 5.0 # x != 5.0
@@ -652,7 +813,6 @@ columns:
```
-
### Extra checks
@@ -663,114 +823,14 @@ Behind the scenes to what is outlined in the yml above, there are additional che
* With `filename_pattern` rule, you can check if the file name matches the pattern.
* Property `name` is not defined in a column. If `csv.header: true`.
* Check that each row matches the number of columns.
-* If `csv.header: true`. Schema contains an unknown column `name` that is not found in the CSV file.
-* If `csv.header: false`. Compare the number of columns in the schema and the CSV file.
+* With `strict_column_order` rule, you can check that the columns are in the correct order.
+* With `allow_extra_columns` rule, you can check that there are no extra columns in the CSV file.
+ * If `csv.header: true`. Schema contains an unknown column `name` that is not found in the CSV file.
+ * If `csv.header: false`. Compare the number of columns in the schema and the CSV file.
-
-
-## Usage
-
-You can find launch examples in the [workflow demo](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml).
-
-
-### GitHub Action
-
-
-```yml
-- uses: jbzoo/csv-blueprint@master # See the specific version on releases page
- with:
- # Path(s) to validate. You can specify path in which CSV files will be searched. Feel free to use glob pattrens. Usage examples: /full/path/file.csv, p/file.csv, p/*.csv, p/**/*.csv, p/**/name-*.csv, **/*.csv, etc.
- # Required: true
- csv: './tests/**/*.csv'
-
- # Schema filepath. It can be a YAML, JSON or PHP. See examples on GitHub.
- # Required: true
- schema: './tests/**/*.yml'
-
- # Report format. Available options: text, table, github, gitlab, teamcity, junit.
- # Default value: table
- # You can skip it
- report: table
-
- # Quick mode. It will not validate all rows. It will stop after the first error.
- # Default value: no
- # You can skip it
- quick: no
-
- # Skip schema validation. If you are sure that the schema is correct, you can skip this check.
- # Default value: no
- # You can skip it
- skip-schema: no
-
-```
-
-
-**Note**. GitHub Actions report format is `table` by default.
-
-But you can specify `report: github` to see friendly error output in your PRs using [annotations](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message).
-This allows you to see bugs in the GitHub interface at the PR level. See [the PR as a live demo](https://github.com/JBZoo/Csv-Blueprint-Demo/pull/1/files).
-That is, the error will be shown in a specific place in the CSV file right in diff of your Pull Requests!
-
-![GitHub Actions - PR](.github/assets/github-actions-pr.png)
-
-
- Click to see example in GitHub Actions terminal
-
- ![GitHub Actions - Terminal](.github/assets/github-actions-termintal.png)
-
-
-
-
-### Docker container
-Ensure you have Docker installed on your machine.
-
-```sh
-# Pull the Docker image
-docker pull jbzoo/csv-blueprint:latest
-
-# Run the tool inside Docker
-docker run --rm \
- --workdir=/parent-host \
- -v $(pwd):/parent-host \
- jbzoo/csv-blueprint:latest \
- validate:csv \
- --csv=./tests/fixtures/demo.csv \
- --schema=./tests/schemas/demo_invalid.yml \
- --ansi -vvv
-
-
-# OR build it from source.
-git clone git@github.com:JBZoo/Csv-Blueprint.git csv-blueprint
-cd csv-blueprint
-make docker-build # local tag is "jbzoo/csv-blueprint:local"
-```
-
-
-### PHP binary
-Ensure you have PHP installed on your machine.
-
-```sh
-# download the latest version
-
-wget https://github.com/JBZoo/Csv-Blueprint/releases/latest/download/csv-blueprint.phar
-chmod +x ./csv-blueprint.phar
-./csv-blueprint.phar validate:csv \
- --csv=./tests/fixtures/demo.csv \
- --schema=./tests/schemas/demo_invalid.yml
-
-
-# OR build from source
-git clone git@github.com:jbzoo/csv-blueprint.git csv-blueprint
-cd csv-blueprint
-make build
-./csv-blueprint validate:csv \
- --csv=./tests/fixtures/demo.csv \
- --schema=./tests/schemas/demo_invalid.yml
-```
-
-### Complete CLI Help Message
+## Complete CLI Help Message
Here you can see all available options and commands. Tool uses [JBZoo/Cli](https://github.com/JBZoo/Cli) package for the
CLI interface.
@@ -863,23 +923,23 @@ CSV file validation: 1
(1/1) Schema: ./tests/schemas/demo_invalid.yml
(1/1) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/1) Issues: 10
-+------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
-| Line | id:Column | Rule | Message |
-+------+------------------+--------------+------------------------------------------------------------------------------------------------------+
-| 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
-| 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
-| 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
-| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
-| 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
-| 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
-| 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
-| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
-| 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
-| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
-| 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
-| | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
-| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
-+------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
++------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
+| Line | id:Column | Rule | Message |
++------+------------------+---------------------+------------------------------------------------------------------------------------------------------+
+| 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+| 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
+| 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
+| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
+| 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
+| 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
+| 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+| 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+| 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
+| | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
++------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
Summary:
@@ -1211,15 +1271,6 @@ Since I don't know under what conditions the code will be used, everything I can
So... as strictly as possible in today's PHP world. I think it works as expected.
-## Interesting fact
-
-I've set a personal record. The [first version](https://github.com/JBZoo/Csv-Blueprint/releases/tag/0.1) was written
-from scratch in about 3 days (with really frequent breaks to take care of 4 month baby).
-I'm looking at the first commit and the very first git tag. I'd say over the weekend, in my spare time on my personal
-laptop. Well... AI was only used for this Readme file because I'm not very good at English. 🤔
-
-I seem to be typing fast and I had really great inspiration. I hope my wife doesn't divorce me. 😅
-
## Coming soon
It's random ideas and plans. No promises and deadlines. Feel free to [help me!](#contributing).
@@ -1328,3 +1379,15 @@ make codestyle
- [Image](https://github.com/JBZoo/Image) - Package provides object-oriented way to manipulate with images as simple as possible.
- [Data](https://github.com/JBZoo/Data) - Extended implementation of ArrayObject. Use Yml/PHP/JSON/INI files as config. Forget about arrays.
- [Retry](https://github.com/JBZoo/Retry) - Tiny PHP library providing retry/backoff functionality with strategies and jitter.
+
+
+ Click to see interesting fact
+
+I've set a personal record. The [first version](https://github.com/JBZoo/Csv-Blueprint/releases/tag/0.1) was written
+from scratch in about 3 days (with really frequent breaks to take care of 4 month baby).
+I'm looking at the first commit and the very first git tag. I'd say over the weekend, in my spare time on my personal
+laptop. Well... AI was only used for this Readme file because I'm not very good at English. 🤔
+
+I seem to be typing fast and I had really great inspiration. I hope my wife doesn't divorce me. 😅
+
+
diff --git a/schema-examples/full.json b/schema-examples/full.json
index 02fedaab..1c1c2e82 100644
--- a/schema-examples/full.json
+++ b/schema-examples/full.json
@@ -13,6 +13,11 @@
"bom" : false
},
+ "structural_rules" : {
+ "strict_column_order" : true,
+ "allow_extra_columns" : false
+ },
+
"columns" : [
{
"name" : "Column Name (header)",
diff --git a/schema-examples/full.php b/schema-examples/full.php
index a2c61958..acf9dc43 100644
--- a/schema-examples/full.php
+++ b/schema-examples/full.php
@@ -34,6 +34,11 @@
'bom' => false,
],
+ 'structural_rules' => [
+ 'strict_column_order' => true,
+ 'allow_extra_columns' => false,
+ ],
+
'columns' => [
[
'name' => 'Column Name (header)',
diff --git a/schema-examples/full.yml b/schema-examples/full.yml
index e8fe147f..0d9ff045 100644
--- a/schema-examples/full.yml
+++ b/schema-examples/full.yml
@@ -22,14 +22,12 @@ description: | # Any description of the CSV file. Not u
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
-
# Regular expression to match the file name. If not set, then no pattern check.
# This allows you to pre-validate the file name before processing its contents.
# Feel free to check parent directories as well.
-# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
+# See: https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i
-
# Here are default values to parse CSV file.
# You can skip this section if you don't need to override the default values.
csv:
@@ -40,6 +38,12 @@ csv:
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.
+# Structural rules for the CSV file. These rules are applied to the entire CSV file.
+# They are not(!) related to the data in the columns.
+# You can skip this section if you don't need to override the default values.
+structural_rules: # Here are default values.
+ strict_column_order: true # Ensure columns in CSV follow the same order as defined in this YML schema. It works only if "csv.header" is true.
+ allow_extra_columns: false # Allow CSV files to have more columns than specified in this YML schema.
# Description of each column in CSV.
# It is recommended to present each column in the same order as presented in the CSV file.
@@ -74,7 +78,7 @@ columns:
allow_values: [ y, n, "" ] # Strict set of values that are allowed.
not_allow_values: [ invalid ] # Strict set of values that are NOT allowed.
- # Any valid regex pattern. See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
+ # Any valid regex pattern. See: https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
# Of course it's a super powerful tool to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
@@ -407,7 +411,7 @@ columns:
contraharmonic_mean_max: 9.0 # x <= 9.0
# Root mean square (quadratic mean) The square root of the arithmetic mean of the squares of a set of numbers.
- # See https://en.wikipedia.org/wiki/Root_mean_square
+ # See: https://en.wikipedia.org/wiki/Root_mean_square
root_mean_square_min: 1.0 # x >= 1.0
root_mean_square_greater: 2.0 # x > 2.0
root_mean_square_not: 5.0 # x != 5.0
diff --git a/schema-examples/full_clean.yml b/schema-examples/full_clean.yml
index af5cbb0f..45d2d25d 100644
--- a/schema-examples/full_clean.yml
+++ b/schema-examples/full_clean.yml
@@ -31,6 +31,10 @@ csv:
encoding: utf-8
bom: false
+structural_rules:
+ strict_column_order: true
+ allow_extra_columns: false
+
columns:
- name: 'Column Name (header)'
description: 'Lorem ipsum'
diff --git a/schema-examples/readme_sample.yml b/schema-examples/readme_sample.yml
new file mode 100644
index 00000000..13bc2181
--- /dev/null
+++ b/schema-examples/readme_sample.yml
@@ -0,0 +1,31 @@
+#
+# JBZoo Toolbox - Csv-Blueprint.
+#
+# This file is part of the JBZoo Toolbox project.
+# For the full copyright and license information, please view the LICENSE
+# file that was distributed with this source code.
+#
+# @license MIT
+# @copyright Copyright (C) JBZoo.com, All rights reserved.
+# @see https://github.com/JBZoo/Csv-Blueprint
+#
+
+name: Simple CSV Schema
+filename_pattern: /my-favorite-csv-\d+\.csv$/i
+csv:
+ delimiter: ';'
+
+columns:
+ - name: id
+ rules:
+ not_empty: true
+ is_int: true
+ aggregate_rules:
+ is_unique: true
+ sorted: [ asc, numeric ]
+
+ - name: name
+ rules:
+ length_min: 3
+ aggregate_rules:
+ count: 10
diff --git a/src/Csv/ParseConfig.php b/src/Csv/ParseConfig.php
index a5b78673..41ed7e37 100644
--- a/src/Csv/ParseConfig.php
+++ b/src/Csv/ParseConfig.php
@@ -25,15 +25,17 @@ final class ParseConfig
public const ENCODING_UTF32 = 'utf-32';
private const FALLBACK_VALUES = [
- 'inherit' => null,
- 'bom' => false,
- 'delimiter' => ',',
- 'quote_char' => '\\',
- 'enclosure' => '"',
- 'encoding' => 'utf-8',
- 'header' => true,
- 'strict_column_order' => false,
- 'other_columns_possible' => false,
+ 'inherit' => null,
+ 'bom' => false,
+ 'delimiter' => ',',
+ 'quote_char' => '\\',
+ 'enclosure' => '"',
+ 'encoding' => 'utf-8',
+ 'header' => true,
+
+ // Global validation rules
+ 'strict_column_order' => true,
+ 'allow_extra_columns' => false,
];
private Data $structure;
@@ -109,12 +111,18 @@ public function isHeader(): bool
public function isStrictColumnOrder(): bool
{
- return $this->structure->getBool('strict_column_order', self::FALLBACK_VALUES['strict_column_order']);
+ return $this->structure->findBool(
+ 'structural_rules.strict_column_order',
+ self::FALLBACK_VALUES['strict_column_order'],
+ );
}
- public function isOtherColumnsPossible(): bool
+ public function isAllowExtraColumns(): bool
{
- return $this->structure->getBool('other_columns_possible', self::FALLBACK_VALUES['other_columns_possible']);
+ return $this->structure->findBool(
+ 'structural_rules.allow_extra_columns',
+ self::FALLBACK_VALUES['allow_extra_columns'],
+ );
}
public function getArrayCopy(): array
diff --git a/src/Rules/Aggregate/ComboRootMeanSquare.php b/src/Rules/Aggregate/ComboRootMeanSquare.php
index da682cb6..e86d265d 100644
--- a/src/Rules/Aggregate/ComboRootMeanSquare.php
+++ b/src/Rules/Aggregate/ComboRootMeanSquare.php
@@ -31,7 +31,7 @@ public function getHelpMeta(): array
[
'Root mean square (quadratic mean) ' .
'The square root of the arithmetic mean of the squares of a set of numbers.',
- 'See https://en.wikipedia.org/wiki/Root_mean_square',
+ 'See: https://en.wikipedia.org/wiki/Root_mean_square',
],
[],
];
diff --git a/src/Rules/Cell/Regex.php b/src/Rules/Cell/Regex.php
index 1494e28b..e36ed907 100644
--- a/src/Rules/Cell/Regex.php
+++ b/src/Rules/Cell/Regex.php
@@ -24,7 +24,7 @@ public function getHelpMeta(): array
{
return [
[
- 'Any valid regex pattern. See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php',
+ 'Any valid regex pattern. See: https://www.php.net/manual/en/reference.pcre.pattern.syntax.php',
"Of course it's a super powerful tool to verify any sort of string data.",
'Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.',
'Remember that if you want to solve a problem with regex, you now have two problems.',
diff --git a/src/Utils.php b/src/Utils.php
index df65e85d..bfdbd003 100644
--- a/src/Utils.php
+++ b/src/Utils.php
@@ -29,6 +29,22 @@ final class Utils
{
public const MAX_DIRECTORY_DEPTH = 10;
+ public static function isArrayInOrder(array $array, array $correctOrder): bool
+ {
+ $orderIndex = 0;
+
+ foreach ($array as $element) {
+ $foundIndex = \array_search($element, \array_slice($correctOrder, $orderIndex), true);
+ if ($foundIndex !== false) {
+ $orderIndex += (int)$foundIndex + 1;
+ } elseif (\in_array($element, $correctOrder, true)) {
+ return false;
+ }
+ }
+
+ return true;
+ }
+
public static function printList(null|array|bool|float|int|string $items, string $color = ''): string
{
if (!\is_array($items)) {
diff --git a/src/Validators/ValidatorCsv.php b/src/Validators/ValidatorCsv.php
index 5a0a941f..f12125cd 100644
--- a/src/Validators/ValidatorCsv.php
+++ b/src/Validators/ValidatorCsv.php
@@ -97,6 +97,27 @@ private function validateHeader(bool $quickStop = false): ErrorSuite
}
}
+ if ($this->schema->getCsvStructure()->isStrictColumnOrder()) {
+ $realColumns = $this->csv->getHeader();
+ $schemaColumns = $this->schema->getSchemaHeader();
+
+ if (!Utils::isArrayInOrder($schemaColumns, $realColumns)) {
+ $error = new Error(
+ 'strict_column_order',
+ "Real columns order doesn't match schema. " .
+ 'Expected: ' . Utils::printList($realColumns) . '. ' .
+ 'Actual: ' . Utils::printList($schemaColumns) . '',
+ '',
+ ValidatorColumn::FALLBACK_LINE,
+ );
+
+ $errors->addError($error);
+ if ($quickStop && $errors->count() > 0) {
+ return $errors;
+ }
+ }
+ }
+
return $errors;
}
@@ -196,8 +217,6 @@ private function validateFile(bool $quickStop = false): ErrorSuite
'filename_pattern',
'Filename "' . Utils::cutPath($this->csv->getCsvFilename()) . '" ' .
"does not match pattern: \"{$filenamePattern}\"",
- '',
- Error::UNDEFINED_LINE,
);
$errors->addError($error);
@@ -214,38 +233,41 @@ private function validateColumn(bool $quickStop): ErrorSuite
{
$errors = new ErrorSuite();
- if ($this->schema->getCsvStructure()->isHeader()) {
- $realColumns = $this->csv->getHeader();
- $schemaColumns = $this->schema->getSchemaHeader();
- $notFoundColums = \array_diff($schemaColumns, $realColumns);
-
- if (\count($notFoundColums) > 0) {
- $error = new Error(
- 'csv.header',
- 'Columns not found in CSV: ' . Utils::printList($notFoundColums, 'c'),
- '',
- ValidatorColumn::FALLBACK_LINE,
- );
-
- $errors->addError($error);
- if ($quickStop) {
- return $errors;
+ if (!$this->schema->getCsvStructure()->isAllowExtraColumns()) {
+ if ($this->schema->getCsvStructure()->isHeader()) {
+ $realColumns = $this->csv->getHeader();
+ $schemaColumns = $this->schema->getSchemaHeader();
+ $notFoundColums = \array_diff($schemaColumns, $realColumns);
+
+ if (\count($notFoundColums) > 0) {
+ $error = new Error(
+ 'allow_extra_columns',
+ 'Column(s) not found in CSV: ' . Utils::printList($notFoundColums, 'c'),
+ '',
+ ValidatorColumn::FALLBACK_LINE,
+ );
+
+ $errors->addError($error);
+ if ($quickStop) {
+ return $errors;
+ }
}
- }
- } else {
- $schemaColumns = \count($this->schema->getColumns());
- $realColumns = $this->csv->getRealColumNumber();
- if ($realColumns < $schemaColumns) {
- $error = new Error(
- 'csv.header',
- 'Real number of columns is less than schema: ' . $realColumns . ' < ' . $schemaColumns,
- '',
- ValidatorColumn::FALLBACK_LINE,
- );
-
- $errors->addError($error);
- if ($quickStop) {
- return $errors;
+ } else {
+ $schemaColumns = \count($this->schema->getColumns());
+ $realColumns = $this->csv->getRealColumNumber();
+ if ($realColumns < $schemaColumns) {
+ $error = new Error(
+ 'allow_extra_columns',
+ "Schema number of columns \"{$schemaColumns}\" greater " .
+ "than real \"{$realColumns}\"",
+ '',
+ ValidatorColumn::FALLBACK_LINE,
+ );
+
+ $errors->addError($error);
+ if ($quickStop) {
+ return $errors;
+ }
}
}
}
diff --git a/tests/Commands/ValidateCsvBasicTest.php b/tests/Commands/ValidateCsvBasicTest.php
index 5ccc7ffb..79ab3151 100644
--- a/tests/Commands/ValidateCsvBasicTest.php
+++ b/tests/Commands/ValidateCsvBasicTest.php
@@ -127,23 +127,23 @@ public function testValidateOneCsvWithInvalidSchemaNegative(): void
(1/1) Schema: ./tests/schemas/demo_invalid.yml
(1/1) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/1) Issues: 10
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
- | Line | id:Column | Rule | Message |
- +------+------------------+--------------+------------------------------------------------------------------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
- | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
- | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
- | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
- | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
- | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
- | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
- | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+------------------+---------------------+------------------------------------------------------------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
+ | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
+ | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
+ | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
+ | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
+ | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
+ | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+ | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
Summary:
diff --git a/tests/Commands/ValidateCsvBatchCsvTest.php b/tests/Commands/ValidateCsvBatchCsvTest.php
index 9422c95b..100bccbf 100644
--- a/tests/Commands/ValidateCsvBatchCsvTest.php
+++ b/tests/Commands/ValidateCsvBatchCsvTest.php
@@ -104,42 +104,42 @@ public function testValidateManyCsvNegative(): void
(1/3) Schema: ./tests/schemas/demo_invalid.yml
(1/3) CSV : ./tests/fixtures/batch/demo-1.csv; Size: 123.34 MB
(1/3) Issues: 5
- +------+------------------+--------------+------------------------ demo-1.csv ------------------------------------------------------------------+
- | Line | id:Column | Rule | Message |
- +------+------------------+--------------+------------------------------------------------------------------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 1, total: 2 |
- | 1 | 2:Float | ag:nth_num | The column does not have a line 4, so the value cannot be checked. |
- | 1 | 3:Birthday | ag:nth | The value on line 2 in the column is "1998-02-28", which is not equal than the expected "2000-12-01" |
- | 3 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
- +------+------------------+--------------+------------------------ demo-1.csv ------------------------------------------------------------------+
+ +------+------------------+---------------------+--------------------- demo-1.csv ---------------------------------------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+------------------+---------------------+------------------------------------------------------------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 1, total: 2 |
+ | 1 | 2:Float | ag:nth_num | The column does not have a line 4, so the value cannot be checked. |
+ | 1 | 3:Birthday | ag:nth | The value on line 2 in the column is "1998-02-28", which is not equal than the expected "2000-12-01" |
+ | 3 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+ +------+------------------+---------------------+--------------------- demo-1.csv ---------------------------------------------------------------------+
(2/3) Schema: ./tests/schemas/demo_invalid.yml
(2/3) CSV : ./tests/fixtures/batch/demo-2.csv; Size: 123.34 MB
(2/3) Issues: 7
- +------+------------+------------+---------------------------- demo-2.csv --------------------------------------------------------------+
- | Line | id:Column | Rule | Message |
- +------+------------+------------+------------------------------------------------------------------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- | 2 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
- | 7 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
- | 2 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 4 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 5 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
- | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
- | 1 | 3:Birthday | ag:nth | The value on line 2 in the column is "1989-05-15", which is not equal than the expected "2000-12-01" |
- +------+------------+------------+---------------------------- demo-2.csv --------------------------------------------------------------+
+ +------+------------+---------------------+------------------------ demo-2.csv ------------------------------------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+------------+---------------------+------------------------------------------------------------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ | 2 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
+ | 7 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
+ | 2 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 4 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 5 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
+ | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+ | 1 | 3:Birthday | ag:nth | The value on line 2 in the column is "1989-05-15", which is not equal than the expected "2000-12-01" |
+ +------+------------+---------------------+------------------------ demo-2.csv ------------------------------------------------------------------+
(3/3) Schema: ./tests/schemas/demo_invalid.yml
(3/3) CSV : ./tests/fixtures/batch/sub/demo-3.csv; Size: 123.34 MB
(3/3) Issues: 1
- +------+-----------+------------+- demo-3.csv ----------------------------------+
- | Line | id:Column | Rule | Message |
- +------+-----------+------------+-----------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- +------+-----------+------------+- demo-3.csv ----------------------------------+
+ +------+-----------+-------------------- demo-3.csv ---------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+-----------+---------------------+-------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ +------+-----------+-------------------- demo-3.csv ---------------------------------------+
Summary:
diff --git a/tests/Commands/ValidateCsvBatchSchemaTest.php b/tests/Commands/ValidateCsvBatchSchemaTest.php
index 0c4a8411..4b91acde 100644
--- a/tests/Commands/ValidateCsvBatchSchemaTest.php
+++ b/tests/Commands/ValidateCsvBatchSchemaTest.php
@@ -73,23 +73,23 @@ public function testMultiSchemaDiscovery(): void
(1/2) Schema: ./tests/schemas/demo_invalid.yml
(1/2) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/2) Issues: 10
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
- | Line | id:Column | Rule | Message |
- +------+------------------+--------------+------------------------------------------------------------------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
- | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
- | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
- | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
- | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
- | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
- | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
- | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+------------------+---------------------+------------------------------------------------------------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
+ | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
+ | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
+ | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
+ | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
+ | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
+ | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+ | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
(2/2) Schema: ./tests/schemas/demo_valid.yml
(2/2) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
diff --git a/tests/Commands/ValidateCsvQuickTest.php b/tests/Commands/ValidateCsvQuickTest.php
index 4f2b5a47..56a0cabb 100644
--- a/tests/Commands/ValidateCsvQuickTest.php
+++ b/tests/Commands/ValidateCsvQuickTest.php
@@ -45,7 +45,7 @@ public function testEnabled(): void
(1/3) Schema: ./tests/schemas/demo_invalid.yml
(1/3) CSV : ./tests/fixtures/batch/demo-1.csv; Size: 123.34 MB
(1/3) Issues: 1
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
(2/3) Schema: ./tests/schemas/demo_invalid.yml
(2/3) CSV : ./tests/fixtures/batch/demo-2.csv; Size: 123.34 MB
@@ -84,7 +84,7 @@ public function testDisabled(): void
(1/3) Schema: ./tests/schemas/demo_invalid.yml
(1/3) CSV : ./tests/fixtures/batch/demo-1.csv; Size: 123.34 MB
(1/3) Issues: 5
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
"ag:is_unique" at line 1, column "1:City". Column has non-unique values. Unique: 1, total: 2.
"ag:nth_num" at line 1, column "2:Float". The column does not have a line 4, so the value cannot be checked.
"ag:nth" at line 1, column "3:Birthday". The value on line 2 in the column is "1998-02-28", which is not equal than the expected "2000-12-01".
@@ -93,7 +93,7 @@ public function testDisabled(): void
(2/3) Schema: ./tests/schemas/demo_invalid.yml
(2/3) CSV : ./tests/fixtures/batch/demo-2.csv; Size: 123.34 MB
(2/3) Issues: 7
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
"length_min" at line 2, column "0:Name". The length of the value "Carl" is 4, which is less than the expected "5".
"length_min" at line 7, column "0:Name". The length of the value "Lois" is 4, which is less than the expected "5".
"date_min" at line 2, column "3:Birthday". The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the expected "1955-05-15 00:00:00 +00:00 (1955-05-15)".
@@ -104,7 +104,7 @@ public function testDisabled(): void
(3/3) Schema: ./tests/schemas/demo_invalid.yml
(3/3) CSV : ./tests/fixtures/batch/sub/demo-3.csv; Size: 123.34 MB
(3/3) Issues: 1
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
Summary:
diff --git a/tests/Commands/ValidateCsvReportsTest.php b/tests/Commands/ValidateCsvReportsTest.php
index b6069ddd..bdfcab27 100644
--- a/tests/Commands/ValidateCsvReportsTest.php
+++ b/tests/Commands/ValidateCsvReportsTest.php
@@ -47,23 +47,23 @@ public function testDefault(): void
(1/1) Schema: ./tests/schemas/demo_invalid.yml
(1/1) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/1) Issues: 10
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
- | Line | id:Column | Rule | Message |
- +------+------------------+--------------+------------------------------------------------------------------------------------------------------+
- | 1 | | csv.header | Columns not found in CSV: "wrong_column_name" |
- | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
- | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
- | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
- | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
- | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
- | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
- | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
- | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
- | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
- | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
- +------+------------------+--------------+------------------------- demo.csv -------------------------------------------------------------------+
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
+ | Line | id:Column | Rule | Message |
+ +------+------------------+---------------------+------------------------------------------------------------------------------------------------------+
+ | 1 | | allow_extra_columns | Column(s) not found in CSV: "wrong_column_name" |
+ | 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
+ | 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
+ | 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
+ | 2 | 2:Float | num_max | The value "4825.185" is greater than the expected "4825.184" |
+ | 1 | 2:Float | ag:nth_num | The N-th value in the column is "74", which is not equal than the expected "0.001" |
+ | 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
+ | | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
+ | 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
+ | | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+ | 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+ +------+------------------+---------------------+---------------------- demo.csv ----------------------------------------------------------------------+
Summary:
@@ -96,7 +96,7 @@ public function testText(): void
(1/1) Schema: ./tests/schemas/demo_invalid.yml
(1/1) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/1) Issues: 10
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
"length_min" at line 6, column "0:Name". The length of the value "Carl" is 4, which is less than the expected "5".
"length_min" at line 11, column "0:Name". The length of the value "Lois" is 4, which is less than the expected "5".
"ag:is_unique" at line 1, column "1:City". Column has non-unique values. Unique: 9, total: 10.
@@ -139,7 +139,7 @@ public function testGithub(): void
(1/1) Schema: ./tests/schemas/demo_invalid.yml
(1/1) CSV : ./tests/fixtures/demo.csv; Size: 123.34 MB
(1/1) Issues: 10
- ::error file=/tests/fixtures/demo.csv,line=1::csv.header at column%0A"csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+ ::error file=/tests/fixtures/demo.csv,line=1::allow_extra_columns at column%0A"allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
::error file=/tests/fixtures/demo.csv,line=6::length_min at column 0:Name%0A"length_min" at line 6, column "0:Name". The length of the value "Carl" is 4, which is less than the expected "5".
@@ -194,9 +194,9 @@ public function testTeamcity(): void
##teamcity[testSuiteStarted name='demo.csv' flowId='42']
- ##teamcity[testStarted name='csv.header at column' locationHint='php_qn:///tests/fixtures/demo.csv' flowId='42']
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
- ##teamcity[testFinished name='csv.header at column' flowId='42']
+ ##teamcity[testStarted name='allow_extra_columns at column' locationHint='php_qn:///tests/fixtures/demo.csv' flowId='42']
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
+ ##teamcity[testFinished name='allow_extra_columns at column' flowId='42']
##teamcity[testStarted name='length_min at column 0:Name' locationHint='php_qn:///tests/fixtures/demo.csv' flowId='42']
"length_min" at line 6, column "0:Name". The length of the value "Carl" is 4, which is less than the expected "5".
@@ -261,8 +261,8 @@ public function testJunit(): void
-
- "csv.header" at line 1. Columns not found in CSV: "wrong_column_name".
+
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "wrong_column_name".
"length_min" at line 6, column "0:Name". The length of the value "Carl" is 4, which is less than the expected "5".
@@ -331,7 +331,7 @@ public function testGitlab(): void
[
{
- "description": "csv.header at column\n\"csv.header\" at line 1. Columns not found in CSV: \"wrong_column_name\".",
+ "description": "allow_extra_columns at column\n\"allow_extra_columns\" at line 1. Column(s) not found in CSV: \"wrong_column_name\".",
"fingerprint": "_replaced_",
"severity": "major",
"location": {
diff --git a/tests/GithubActionsTest.php b/tests/GithubActionsTest.php
index 7ece74a3..960113ed 100644
--- a/tests/GithubActionsTest.php
+++ b/tests/GithubActionsTest.php
@@ -87,9 +87,7 @@ public function testGitHubActionsReadMe(): void
$expectedMessage[] = '';
}
- $expectedMessage[] = '```';
-
- $text = \implode("\n", $expectedMessage);
+ $text = \trim(\implode("\n", $expectedMessage)) . "\n```";
Tools::insertInReadme('github-actions-yml', $text);
}
}
diff --git a/tests/PackageTest.php b/tests/PackageTest.php
index c0b9a95a..b5d89355 100644
--- a/tests/PackageTest.php
+++ b/tests/PackageTest.php
@@ -64,18 +64,11 @@ final class PackageTest extends \JBZoo\Codestyle\PHPUnit\AbstractPackageTest
protected array $badgesTemplate = [
'github_actions',
'github_actions_demo',
- 'github_actions_release_docker',
'coveralls',
'psalm_coverage',
- 'psalm_level',
- 'codefactor',
- 'github_license',
- '__BR__',
'github_latest_release',
'packagist_downloads_total',
'docker_pulls',
- 'docker_image_size',
- 'packagist_dependents',
];
protected function setUp(): void
@@ -96,6 +89,32 @@ public function testComposerType(): void
isSame('project', $composer->find('type'));
}
+ public function testReadmeHeader(): void
+ {
+ $expectedBadges = [];
+
+ foreach ($this->badgesTemplate as $badgeName) {
+ if ($badgeName === '__BR__') {
+ $expectedBadges[$badgeName] = ' ';
+ } else {
+ $testMethod = 'checkBadge' . \str_replace('_', '', \ucwords($badgeName, '_'));
+
+ if (\method_exists($this, $testMethod)) {
+ $tmpBadge = $this->{$testMethod}();
+ if ($tmpBadge !== null) {
+ $expectedBadges[$badgeName] = $tmpBadge;
+ }
+ } else {
+ fail("Method not found: '{$testMethod}'");
+ }
+ }
+ }
+
+ $expectedBadgeLine = \implode("\n", $expectedBadges);
+
+ Tools::insertInReadme('top-badges', $expectedBadgeLine);
+ }
+
protected function checkBadgeGithubActionsDemo(): ?string
{
$path = 'https://github.com/__VENDOR_ORIG__/__PACKAGE_ORIG__/actions/workflows';
diff --git a/tests/ReadmeTest.php b/tests/ReadmeTest.php
index cca01d33..7eb42ecf 100644
--- a/tests/ReadmeTest.php
+++ b/tests/ReadmeTest.php
@@ -27,8 +27,10 @@ final class ReadmeTest extends TestCase
'* With `filename_pattern` rule, you can check if the file name matches the pattern.',
'* Property `name` is not defined in a column. If `csv.header: true`.',
'* Check that each row matches the number of columns.',
- '* If `csv.header: true`. Schema contains an unknown column `name` that is not found in the CSV file.',
- '* If `csv.header: false`. Compare the number of columns in the schema and the CSV file.',
+ '* With `strict_column_order` rule, you can check that the columns are in the correct order.',
+ '* With `allow_extra_columns` rule, you can check that there are no extra columns in the CSV file.',
+ ' * If `csv.header: true`. Schema contains an unknown column `name` that is not found in the CSV file.',
+ ' * If `csv.header: false`. Compare the number of columns in the schema and the CSV file.',
];
public function testCreateCsvHelp(): void
@@ -82,7 +84,7 @@ public function testBadgeOfRules(): void
$todoYml = yml(Tools::SCHEMA_TODO);
$planToAdd = \count($todoYml->findArray('columns.0.rules')) . '/' .
(\count($todoYml->findArray('columns.0.aggregate_rules')) * 6) . '/' .
- \count([
+ (\count([
'required',
'null_values',
'multiple + separator',
@@ -91,7 +93,7 @@ public function testBadgeOfRules(): void
'complex_rules. one example',
'inherit',
'rule not found',
- ]);
+ ]) + \count($todoYml->findArray('structural_rules')));
$badge = static function (string $label, int|string $count, string $url, string $color): string {
$label = \str_replace(' ', '%20', $label);
@@ -105,7 +107,7 @@ public function testBadgeOfRules(): void
return $badge;
};
- $text = \implode(' ', [
+ $text = \implode("\n", [
$badge('Total number of rules', $totalRules, 'schema-examples/full.yml', 'darkgreen'),
$badge('Cell rules', $cellRules, 'src/Rules/Cell', 'blue'),
$badge('Aggregate rules', $aggRules, 'src/Rules/Aggregate', 'blue'),
@@ -128,6 +130,18 @@ public function testCheckYmlSchemaExampleInReadme(): void
Tools::insertInReadme('full-yml', $text);
}
+ public function testCheckSimpleYmlSchemaExampleInReadme(): void
+ {
+ $ymlContent = \implode(
+ "\n",
+ \array_slice(\explode("\n", \file_get_contents('./schema-examples/readme_sample.yml')), 12),
+ );
+
+ $text = \implode("\n", ['```yml', $ymlContent, '```']);
+
+ Tools::insertInReadme('readme-sample-yml', $text);
+ }
+
public function testAdditionalValidationRules(): void
{
$list[] = '';
diff --git a/tests/SchemaTest.php b/tests/SchemaTest.php
index e43391da..3b217040 100644
--- a/tests/SchemaTest.php
+++ b/tests/SchemaTest.php
@@ -216,6 +216,9 @@ public function testValidateValidSchemaFixtures(): void
{
$schemas = (new Finder())
->in(PROJECT_ROOT . '/tests/schemas')
+ ->in(PROJECT_ROOT . '/tests/Benchmarks')
+ ->in(PROJECT_ROOT . '/schema-examples')
+ ->name('*.yml')
->notName([
'todo.yml',
'invalid_schema.yml',
diff --git a/tests/UtilsTest.php b/tests/UtilsTest.php
index 13838504..88c364aa 100644
--- a/tests/UtilsTest.php
+++ b/tests/UtilsTest.php
@@ -235,6 +235,24 @@ public function testColorOfCellValue(): void
}
}
+ public function testIsArrayInOrder(): void
+ {
+ isTrue(Utils::isArrayInOrder(['a', 'b', 'c'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['a', 'b'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['b', 'c'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['a', 'c'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['a'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['b'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder(['c'], ['a', 'b', 'c']));
+ isTrue(Utils::isArrayInOrder([], ['a', 'b', 'c']));
+
+ isTrue(Utils::isArrayInOrder(['d'], ['a', 'b', 'c'])); // ignore extra
+
+ isFalse(Utils::isArrayInOrder(['a', 'c', 'b'], ['a', 'b', 'c']));
+ isFalse(Utils::isArrayInOrder(['c', 'a', 'b'], ['a', 'b', 'c']));
+ isFalse(Utils::isArrayInOrder(['b', 'a'], ['a', 'b', 'c']));
+ }
+
/**
* @param SplFileInfo[] $files
* @return string[]
diff --git a/tests/Validators/CsvValidatorTest.php b/tests/Validators/CsvValidatorTest.php
index ceec7d3e..f0ea7411 100644
--- a/tests/Validators/CsvValidatorTest.php
+++ b/tests/Validators/CsvValidatorTest.php
@@ -41,7 +41,7 @@ public function testInvalidWithoutHeader(): void
$csv = new CsvFile(Tools::CSV_SIMPLE_NO_HEADER, Tools::SCHEMA_SIMPLE_NO_HEADER);
isSame(
<<<'TEXT'
- "csv.header" at line 1. Real number of columns is less than schema: 2 < 3.
+ "allow_extra_columns" at line 1. Schema number of columns "3" greater than real "2".
TEXT,
\strip_tags((string)$csv->validate()),
@@ -118,7 +118,7 @@ public function testCellRuleNoName(): void
isSame(
<<<'TXT'
"csv.header" at line 1, column "0:". Property "name" is not defined in schema: "_custom_array_".
- "csv.header" at line 1. Columns not found in CSV: "0".
+ "allow_extra_columns" at line 1. Column(s) not found in CSV: "0".
TXT,
\strip_tags((string)$csv->validate()),
@@ -236,4 +236,91 @@ public function testHeaderMatchingIfHeaderDisabled(): void
['Favorite color', '3:Favorite color'], // 3 is important here
], $names);
}
+
+ public function testStrictColumnOrderValid(): void
+ {
+ $csv = new CsvFile(Tools::DEMO_CSV, [
+ 'columns' => [
+ ['name' => 'Name'],
+ ['name' => 'City'],
+ ['name' => 'Float'],
+ ['name' => 'Birthday'],
+ ['name' => 'Favorite color'],
+ ],
+ ]);
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(Tools::DEMO_CSV, [
+ 'columns' => [
+ ['name' => 'Name'],
+ ['name' => 'City'],
+ ['name' => 'Float'],
+ ['name' => 'Birthday'],
+ ],
+ ]);
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(Tools::DEMO_CSV, [
+ 'columns' => [
+ ['name' => 'City'],
+ ['name' => 'Float'],
+ ['name' => 'Birthday'],
+ ['name' => 'Favorite color'],
+ ],
+ ]);
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(
+ Tools::DEMO_CSV,
+ ['columns' => [['name' => 'City'], ['name' => 'Float'], ['name' => 'Birthday']]],
+ );
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(Tools::DEMO_CSV, ['columns' => [['name' => 'City'], ['name' => 'Birthday']]]);
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(Tools::DEMO_CSV, ['columns' => [['name' => 'City']]]);
+ isSame(null, $csv->validate()->render());
+
+ $csv = new CsvFile(Tools::DEMO_CSV);
+ isSame(null, $csv->validate()->render());
+ }
+
+ public function testStrictColumnOrderInvalid(): void
+ {
+ $columns = [
+ ['name' => 'City'],
+ ['name' => 'Name'], // Wrong order here
+ ['name' => 'Float'],
+ ['name' => 'Birthday'],
+ ['name' => 'Favorite color'],
+ ];
+
+ $csv = new CsvFile(Tools::DEMO_CSV, ['columns' => $columns]);
+
+ isSame(
+ '"strict_column_order" at line 1. Real columns order doesn\'t match schema. ' .
+ 'Expected: ["Name", "City", "Float", "Birthday", "Favorite color"]. ' .
+ 'Actual: ["City", "Name", "Float", "Birthday", "Favorite color"].' . "\n",
+ $csv->validate()->render(),
+ );
+
+ $columns = [
+ ['name' => 'City'],
+ ['name' => 'Name'], // Wrong order here
+ ['name' => 'Float'],
+ ['name' => 'Favorite color'],
+ ['name' => 'Birthday'],
+ ['name' => 'Birthday'],
+ ];
+
+ $csv = new CsvFile(Tools::DEMO_CSV, ['columns' => $columns]);
+
+ isSame(
+ '"strict_column_order" at line 1. Real columns order doesn\'t match schema. ' .
+ 'Expected: ["Name", "City", "Float", "Birthday", "Favorite color"]. ' .
+ 'Actual: ["City", "Name", "Float", "Favorite color", "Birthday"].' . "\n",
+ $csv->validate()->render(),
+ );
+ }
}
diff --git a/tests/schemas/todo.yml b/tests/schemas/todo.yml
index b5abe540..cff73e45 100644
--- a/tests/schemas/todo.yml
+++ b/tests/schemas/todo.yml
@@ -21,10 +21,17 @@ includes: # Alias is always required
csv: # How to parse file before validation
inherit: alias_1 # Inherited from another schema. Options above will overwrite inherited options.
- strict_column_order: true # true - columns must be in the same order as in the schema, false - no strict
- other_columns_possible: true # true - other columns are allowed, false - no other columns
null_values: [ "none", "nil" ] # List of values that will be treated as empty
+
+structural_rules:
+ duplicate_column_names: false # Allow duplicate rows in the CSV file.
+ columns_count_min: 3 # Minimum number of columns in the file. By default, it is equal to the number of columns in the schema.
+ columns_count: 5 # Exact number of columns in the file. By default, it is equal to the number of columns in the schema.
+ columns_count_max: 5 # Minimum number of columns in the file. By default, it is equal to the number of columns in the schema.
+ ignore_duplicate_rows: false # If true, then duplicate rows will be ignored. Duplicate rows are rows that have the same values in all columns - 100% match.
+
+
columns:
- required: true # If true, then column must be present in the file
null_values: # (Override csv\empty_values) List of values that will be treated as empty