Skip to content

Commit

Permalink
Add GcsExporter (#1377)
Browse files Browse the repository at this point in the history
  • Loading branch information
Kesin11 authored Dec 27, 2024
2 parents 6f8c246 + 4264c89 commit a3dd9e1
Show file tree
Hide file tree
Showing 11 changed files with 349 additions and 27 deletions.
57 changes: 43 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ See full schema: [test_report.proto](./proto/test_report.proto)
- Export
- BigQuery
- Local file (output JSON or JSON Lines)
- Google Cloud Storage (GCS)

# USAGE
```bash
Expand Down Expand Up @@ -118,8 +119,9 @@ Most recommend tag for user is `v{major}`. If you prefere more conservetive vers
- LastRunStore
- GOOGLE_APPLICATION_CREDENTIALS

## Setup BigQuery (Recommend)
If you want to use `bigquery_exporter`, you have to create dataset and table that CIAnalyzer will export data to it.
## Setup Exporter
### Setup BigQuery table (Recommend)
If you want to use `exporter.bigquery`, you have to create dataset and table that CIAnalyzer will export data to it.

```bash
# Prepare bigquery schema json files
Expand Down Expand Up @@ -153,7 +155,24 @@ bq mk \

And also GCP service account used for CIAnalyzer needs some BigQuery permissions. Please attach `roles/bigquery.dataEditor` and `roles/bigquery.jobUser`. More detail, check [BigQuery access control document](https://cloud.google.com/bigquery/docs/access-control).

## Setup GCS bucket (Recommend)
### Setup GCS
If you want to use `exporter.gcs`, you have to create a bucket that CIAnalyzer will export data to.

BigQuery can also read JSONL formatted data stored in GCS as [external tables](https://cloud.google.com/bigquery/docs/external-data-cloud-storage), so it is useful to save data to GCS instead of exporting directly to a BigQuery table. In that case, it is recommended to save data in a path that includes the DATE to be recognized as a Hive partition for efficient querying from BigQuery.

see: https://cloud.google.com/bigquery/docs/hive-partitioned-queries

CIAnalyzer can save data to a path with date partitions by specifying a `prefixTemplate` in the configuration file as follows:

```yaml
exporter:
gcs:
project: $GCP_PROJECT_ID
bucket: $BUCKET_NAME
prefixTemplate: ci_analyzer/{reportType}/dt={YYYY}-{MM}-{DD}/
```
## Setup LastRunStore
### What is LastRunStore
CIAnalyzer collects build data from each CI service API, but there may be duplicates of the previously collected data. To remove the duplicate, it is necessary to save the last build number of the previous run and output only the difference from the previous run.
Expand All @@ -163,7 +182,7 @@ By default, CIAnalyzer uses a local JSON file as a backend for LastRunStore. How
Resolving these problems, CIAnalyzer can use GCS as LastRunStore to read/write the last build number from any machine. It inspired by [Terraform backend](https://www.terraform.io/docs/backends/index.html).
### Create GCS bucket
### Setup GCS bucket (Recommend)
If you want to use `lastRunStore.backend: gcs`, you have to create GCS bucket before execute CIAnalyzer.

```bash
Expand Down Expand Up @@ -385,15 +404,25 @@ To load your custom schema JSON from CIAnalyzer that runs inside of container, y

See sample [cron.jenkinsfile](./sample/cron.jenkinsfile).

# Roadmap
- [x] Collect test data
- [x] Collect any of JSON format from build artifacts
- [x] Support Bitrise
- [x] Support CircleCI API v2
- [x] Implement better logger
- [x] Better error message
- [x] Export commit message
- [x] Export executor data (CircleCI, Bitrise)
# Roadmap and features
- Supported CI services
- [x] GitHub Actions
- [x] CircleCI API v2
- [x] Bitrise
- [x] Jenkins
- Supported data
- [x] Workflow, Job
- [x] Test data (JUnit format)
- [x] Any of JSON format from build artifacts
- Supported exporters
- [x] Local file
- [x] BigQuery
- [x] Google Cloud Storage
- [ ] S3/S3 compatible storage
- Supported LastRunStore
- [x] Local file
- [x] Google Cloud Storage
- [ ] S3/S3 compatible storage

# Debug options
- Fetch only selected service
Expand All @@ -405,7 +434,7 @@ See sample [cron.jenkinsfile](./sample/cron.jenkinsfile).
- Enable debug mode
- `--debug`
- Limit fetching build results only 10 by each services
- Export result to local only
- Export result to local only if `--only-exporters` omitted
- Don't loading and storing last build number
- Enable debug log
- `export CI_ANALYZER_DEBUG=1`
Expand Down
2 changes: 1 addition & 1 deletion __tests__/exporter/bigquery_exporter.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import path from "node:path";
import { vi, describe, it, expect, beforeEach } from "vitest";
import { BigqueryExporter } from "../../src/exporter/bigquery_exporter";
import type { BigqueryExporterConfig } from "../../src/config/config";
import type { BigqueryExporterConfig } from "../../src/config/schema";
import { CustomReportCollection } from "../../src/custom_report_collection";
import { Logger } from "tslog";

Expand Down
137 changes: 137 additions & 0 deletions __tests__/exporter/gcs_exporter.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
import { GcsExporter } from "../../src/exporter/gcs_exporter";
import type { GcsExporterConfig } from "../../src/config/schema";
import { Logger } from "tslog";

const mockStorage = {
bucket: vi.fn().mockReturnThis(),
file: vi.fn().mockReturnThis(),
save: vi.fn(),
};
const logger = new Logger({ type: "hidden" });

describe("GcsExporter", () => {
const baseConfig: GcsExporterConfig = {
project: "project",
bucket: "bucket",
prefixTemplate: "ci_analyzer/{reportType}/dt={YYYY}-{MM}-{DD}/",
};

beforeEach(() => {
// Mock the current time for `now = dayjs()`
vi.useFakeTimers();
vi.setSystemTime(new Date("2023-01-01T12:34:56Z"));
});

afterEach(() => {
vi.useRealTimers();
});

describe("new", () => {
it("should not throw when all required params are provided", () => {
expect(() => {
new GcsExporter(logger, "github", baseConfig);
}).not.toThrow();
});

it("should throw when prefixTemplate does not include {reportType}", () => {
const config = {
...baseConfig,
prefixTemplate: "ci_analyzer/dt={YYYY}-{MM}-{DD}/",
};
expect(() => {
new GcsExporter(logger, "github", config);
}).toThrow();
});
});

describe("export", () => {
let exporter: GcsExporter;

beforeEach(() => {
exporter = new GcsExporter(logger, "github", baseConfig);
exporter.storage = mockStorage as any;
});

it("exportWorkflowReports should create correct file path when all reports have the same createdAt", async () => {
const report = [{ createdAt: "2023-01-01T12:34:56Z" }];
await exporter.exportWorkflowReports(report as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/workflow/dt=2023-01-01/20230101-123456-workflow-github.json",
);
});

it("exportWorkflowReports should create correct file paths when reports have different createdAt", async () => {
const reports = [
{ createdAt: "2023-01-01T12:34:56Z" },
{ createdAt: "2022-12-31T12:34:56Z" },
{ createdAt: "2023-01-01T12:34:56Z" },
];
await exporter.exportWorkflowReports(reports as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/workflow/dt=2023-01-01/20230101-123456-workflow-github.json",
);
expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/workflow/dt=2022-12-31/20230101-123456-workflow-github.json",
);
});

it("exportTestReports should create correct file path when all reports have the same createdAt", async () => {
const report = [{ createdAt: "2023-01-01T12:34:56Z" }];
await exporter.exportTestReports(report as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/test/dt=2023-01-01/20230101-123456-test-github.json",
);
});

it("exportTestReports should create correct file paths when reports have different createdAt", async () => {
const reports = [
{ createdAt: "2023-01-01T12:34:56Z" },
{ createdAt: "2022-12-31T12:34:56Z" },
{ createdAt: "2023-01-01T12:34:56Z" },
];
await exporter.exportTestReports(reports as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/test/dt=2023-01-01/20230101-123456-test-github.json",
);
expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/test/dt=2022-12-31/20230101-123456-test-github.json",
);
});

it("exportCustomReports should create correct file path when all reports have the same createdAt", async () => {
const report = [{ createdAt: "2023-01-01T12:34:56Z" }];
const customReportCollection = {
customReports: new Map([["custom", report]]),
};
await exporter.exportCustomReports(customReportCollection as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/custom/dt=2023-01-01/20230101-123456-custom-github.json",
);
});

it("exportCustomReports should create correct file paths when reports have different createdAt", async () => {
const reports = [
{ createdAt: "2023-01-01T12:34:56Z" },
{ createdAt: "2022-12-31T12:34:56Z" },
{ createdAt: "2023-01-01T12:34:56Z" },
];
const customReportCollection = {
customReports: new Map([["custom", reports]]),
};
await exporter.exportCustomReports(customReportCollection as any);

expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/custom/dt=2023-01-01/20230101-123456-custom-github.json",
);
expect(mockStorage.file).toHaveBeenCalledWith(
"ci_analyzer/custom/dt=2022-12-31/20230101-123456-custom-github.json",
);
});
});
});
2 changes: 1 addition & 1 deletion __tests__/exporter/local_exporter.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { vi, describe, it, expect, beforeEach } from "vitest";
import { LocalExporter } from "../../src/exporter/local_exporter";
import path from "node:path";
import type { LocalExporterConfig } from "../../src/config/config";
import type { LocalExporterConfig } from "../../src/config/schema";
import { Logger } from "tslog";

const mockFsPromises = {
Expand Down
7 changes: 7 additions & 0 deletions ci_analyzer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,13 @@ github:
table: $CUSTOM_REPORT_TABLE
schema: ./$SCHEMA_DIR/$CUSTOM_REPORT_TABLE_SCHEMA.json # It accepts absolute path or relative path from this config yaml.
maxBadRecords: 0 # (Optional) default: 0. If set > 0, skip bad record. This option should only be used for workaround.
gcs:
project: $GCP_PROJECT_ID
bucket: $BUCKET_NAME
# {reportType} placeholder is Required.
# {YYYY}, {MM}, {DD} placeholders are optional.
# If you want to use BigQuery external tables, the GCS path should be in a format supported by Hive partitions like this
prefixTemplate: ci_analyzer/{reportType}/dt={YYYY}-{MM}-{DD}/
lastRunStore:
backend: gcs # Recommend using 'gcs' backend
project: $GCP_PROJECT_ID
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@
"biome:ci": "biome ci .",
"lint:fix": "biome lint --apply-unsafe .",
"fmt:fix": "biome format --write .",
"test": "vitest",
"test:ci": "vitest --run --coverage",
"test": "TZ=UTC vitest",
"test:ci": "TZ=UTC vitest --run --coverage",
"proto": "earthly --strict --remote-cache=ghcr.io/kesin11/ci_analyzer_earthly:cache +proto",
"docker": "earthly --strict --remote-cache=ghcr.io/kesin11/ci_analyzer_earthly:cache +docker",
"schema": "earthly --strict --remote-cache=ghcr.io/kesin11/ci_analyzer_earthly:cache +schema"
Expand Down
20 changes: 20 additions & 0 deletions schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,26 @@
}
},
"additionalProperties": false
},
"gcs": {
"type": "object",
"properties": {
"project": {
"type": "string"
},
"bucket": {
"type": "string"
},
"prefixTemplate": {
"type": "string"
}
},
"required": [
"project",
"bucket",
"prefixTemplate"
],
"additionalProperties": false
}
},
"additionalProperties": false
Expand Down
8 changes: 8 additions & 0 deletions src/config/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,17 @@ const bigqueryExporterSchema = z.object({
});
export type BigqueryExporterConfig = z.infer<typeof bigqueryExporterSchema>;

const gcsExporterSchema = z.object({
project: z.string(),
bucket: z.string(),
prefixTemplate: z.string(),
});
export type GcsExporterConfig = z.infer<typeof gcsExporterSchema>;

const exporterSchema = z.object({
local: localExporterSchema.optional(),
bigquery: bigqueryExporterSchema.optional(),
gcs: gcsExporterSchema.optional(),
});
export type ExporterConfig = z.infer<typeof exporterSchema>;

Expand Down
16 changes: 14 additions & 2 deletions src/exporter/exporter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ import type {
ExporterConfig,
LocalExporterConfig,
BigqueryExporterConfig,
GcsExporterConfig,
} from "../config/schema.js";
import { BigqueryExporter } from "./bigquery_exporter.js";
import type { CustomReportCollection } from "../custom_report_collection.js";
import type { ArgumentOptions } from "../arg_options.js";
import type { Logger } from "tslog";
import { GcsExporter } from "./gcs_exporter.js";

export interface Exporter {
exportWorkflowReports(reports: WorkflowReport[]): Promise<void>;
Expand All @@ -26,7 +28,7 @@ export class CompositExporter implements Exporter {
service: string,
config?: ExporterConfig,
) {
if (options.debug || !config) {
if ((options.debug && options.onlyExporters === undefined) || !config) {
this.exporters = [
new LocalExporter(logger, service, options.configDir, {}),
];
Expand All @@ -41,7 +43,10 @@ export class CompositExporter implements Exporter {

this.exporters = exporters
.map((exporter) => {
let _config: LocalExporterConfig | BigqueryExporterConfig;
let _config:
| LocalExporterConfig
| BigqueryExporterConfig
| GcsExporterConfig;
switch (exporter) {
case "local":
_config = config[exporter] ?? {};
Expand All @@ -54,6 +59,13 @@ export class CompositExporter implements Exporter {
case "bigquery":
_config = config[exporter] ?? {};
return new BigqueryExporter(logger, _config, options.configDir);
case "gcs":
_config = config[exporter] ?? {};
return new GcsExporter(
logger,
service,
_config as GcsExporterConfig,
);
}
})
.filter((exporter) => exporter !== undefined);
Expand Down
Loading

0 comments on commit a3dd9e1

Please sign in to comment.