diff --git a/README.md b/README.md index 3668404a..401cf915 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,11 @@ The list is used in a variety of ways, which include: by tools such as [ReSpec](https://respec.org/docs/) and [Bikeshed](https://speced.github.io/bikeshed/) to create terminology and reference links between Web specifications. +* [BCD](https://github.com/mdn/browser-compat-data) and + [web-features](https://github.com/web-platform-dx/web-features) to validate + specification URLs +* [Specref](https://www.specref.org/) to complete the list of specifications + that can be referenced. * Analyzers of browser technologies to create reports on test coverage, WebIDL, and specification quality. @@ -186,10 +191,11 @@ The `shortname` property is always set. ### `title` -The title of the spec. The title is either retrieved from the -[W3C API](https://w3c.github.io/w3c-api/) for W3C specs, -[Specref](https://www.specref.org/) or from the spec itself. The -[`source`](#source) property details the actual provenance. +The title of the spec. The title is either retrieved from an official source +(the [W3C API](https://w3c.github.io/w3c-api/) for W3C specs, the +[workstreams database](https://github.com/whatwg/sg/blob/main/db.json) for +WHATWG specs, etc.), or from the spec itself. The [`source`](#source) property +details the actual provenance. The `title` property is always set. @@ -485,11 +491,12 @@ available. The URL of the latest Editor's Draft or of the living standard. -The URL is either retrieved from the [W3C API](https://w3c.github.io/w3c-api/) -for W3C specs, or [Specref](https://www.specref.org/). The document at the -versioned URL is considered to be the latest Editor's Draft if the spec does -neither exist in the W3C API nor in Specref. The [`source`](#source) property -details the actual provenance. +The URL is either retrieved from an official source (the +[W3C API](https://w3c.github.io/w3c-api/) for W3C specs, the +[workstreams database](https://github.com/whatwg/sg/blob/main/db.json) for +WHATWG specs, etc.) when possible. The document at the versioned URL is +considered to be the latest Editor's Draft otherwise. The [`source`](#source) +property details the actual provenance. The URL should be relatively stable but may still change over time. See [Spec identifiers](#spec-identifiers) for details. @@ -552,8 +559,7 @@ The `pages` property is only set for specs identified as multipage specs. The URL of the repository that contains the source of the Editor's Draft or of the living standard. -The URL is either retrieved from the [Specref](https://www.specref.org/) or -computed from `nightly.url`. +The URL is computed from `nightly.url`. The `repository` property is always set except for IETF specs where such a repo does not always exist. @@ -621,7 +627,7 @@ The `excludePaths` property is seldom set. The provenance for the `title` and `nightly` property values. Can be one of: - `w3c`: information retrieved from the [W3C API](https://w3c.github.io/w3c-api/) -- `specref`: information retrieved from [Specref](https://www.specref.org/) +- `whatwg`: information retrieved from [WHATWG](https://spec.whatwg.org/) - `ietf`: information retrieved from the [IETF datatracker](https://datatracker.ietf.org) - `spec`: information retrieved from the spec itself diff --git a/schema/definitions.json b/schema/definitions.json index 16159c30..220dad85 100644 --- a/schema/definitions.json +++ b/schema/definitions.json @@ -60,7 +60,7 @@ "source": { "type": "string", - "enum": ["w3c", "specref", "spec", "ietf", "whatwg"] + "enum": ["w3c", "spec", "ietf", "whatwg"] }, "nightly": { diff --git a/src/fetch-info.js b/src/fetch-info.js index c45932f1..966f5de8 100644 --- a/src/fetch-info.js +++ b/src/fetch-info.js @@ -2,8 +2,8 @@ * Module that exports a function that takes an array of specifications objects * that each have at least a "url" and a "short" property. The function returns * an object indexed by specification "shortname" with additional information - * about the specification fetched from the W3C API, Specref, or from the spec - * itself. Object returned for each specification contains the following + * about the specification fetched from the W3C API, WHATWG, IETF or from the + * spec itself. Object returned for each specification contains the following * properties: * * - "nightly": an object that describes the nightly version. The object will @@ -15,8 +15,8 @@ * feature the URL of the TR document for W3C specs when it exists, and is not * present for specs that don't have release versions (WHATWG specs, CG drafts). * - "title": the title of the specification. Always set. - * - "source": one of "w3c", "specref", "spec", depending on how the information - * was determined. + * - "source": one of "w3c", "ietf", "whatwg", "spec", depending on how the + * information was determined. * * The function throws when something goes wrong, e.g. if the given spec object * describes a /TR/ specification but the specification has actually not been @@ -25,17 +25,14 @@ * * The function will start by querying the W3C API, using the given "shortname" * properties. For specifications where this fails, the function will query - * SpecRef, using the given "shortname" as well. If that too fails, the function - * assumes that the given "url" is the URL of the Editor's Draft, and will fetch - * it to determine the title. + * IETF, then WHATWG, using the given "shortname" as well. If that too fails, + * the function assumes that the given "url" is the URL of the Editor's Draft, + * and will fetch it to determine the title. * * If the function needs to retrieve the spec itself, note that it will parse * the HTTP response body as a string, applying regular expressions to extract * the title. It will not parse it as HTML in particular. This means that the * function will fail if the title cannot easily be extracted for some reason. - * - * Note: the function operates on a list of specs and not only on one spec to - * bundle requests to Specref. */ import puppeteer from "puppeteer"; @@ -45,17 +42,6 @@ import Octokit from "./octokit.js"; import ThrottledQueue from "./throttled-queue.js"; import fetchJSON from "./fetch-json.js"; -// Map spec statuses returned by Specref to those used in specs -// Note we typically won't get /TR statuses from Specref, since all /TR URLs -// are handled through the W3C API. Also, "Proposal for a CSS module" entries -// were probably manually hardcoded in Specref, they are really just Editor's -// Drafts in practice. -const specrefStatusMapping = { - "ED": "Editor's Draft", - "Proposal for a CSS module": "Editor's Draft", - "cg-draft": "Draft Community Group Report" -}; - async function useLastInfoForDiscontinuedSpecs(specs) { const results = {}; for (const spec of specs) { @@ -215,97 +201,6 @@ async function fetchInfoFromWHATWG(specs, options) { return specInfo; } -async function fetchInfoFromSpecref(specs, options) { - function chunkArray(arr, len) { - let chunks = []; - let i = 0; - let n = arr.length; - while (i < n) { - chunks.push(arr.slice(i, i += len)); - } - return chunks; - } - - // Browser-specs contributes specs to Specref. By definition, we cannot rely - // on information from Specref about these specs. Unfortunately, the Specref - // API does not return the "source" field, so we need to retrieve the list - // ourselves from Specref's GitHub repository. - const specrefBrowserspecsUrl = "https://raw.githubusercontent.com/tobie/specref/main/refs/browser-specs.json"; - const browserSpecs = await fetchJSON(specrefBrowserspecsUrl, options); - specs = specs.filter(spec => !browserSpecs[spec.shortname.toUpperCase()]); - - // Browser-specs now acts as source for Specref for the WICG specs and W3C - // Editor's Drafts that have not yet been published to /TR. Let's filter out - // these specs to avoid a catch-22 where the info in browser-specs gets stuck - // to the that in Specref. - const filteredSpecs = specs.filter(spec => - !spec.url.match(/\/\/(wicg|w3c)\.github\.io\//) && - !spec.url.match(/\/\/www\.w3\.org\//) && - !spec.url.match(/\/\/drafts\.csswg\.org\//)); - - const chunks = chunkArray(filteredSpecs, 50); - const chunksRes = await Promise.all(chunks.map(async chunk => { - let specrefUrl = "https://api.specref.org/bibrefs?refs=" + - chunk.map(spec => spec.shortname).join(','); - return fetchJSON(specrefUrl, options); - })); - - const results = {}; - chunksRes.forEach(chunkRes => { - - // Specref manages aliases, let's follow the chain to the final spec - function resolveAlias(name, counter) { - counter = counter || 0; - if (counter > 100) { - throw "Too many aliases returned by Respec"; - } - if (chunkRes[name].aliasOf) { - return resolveAlias(chunkRes[name].aliasOf, counter + 1); - } - else { - return name; - } - } - Object.keys(chunkRes).forEach(name => { - if (specs.find(spec => spec.shortname === name)) { - const info = chunkRes[resolveAlias(name)]; - if (info.edDraft?.startsWith('http:')) { - console.warn(`[warning] force HTTPS for nightly of ` + - `"${spec.shortname}", Specref returned "${info.edDraft}"`); - } - if (info.href?.startsWith('http:')) { - console.warn(`[warning] force HTTPS for nightly of ` + - `"${spec.shortname}", Specref returned "${info.href}"`); - } - const nightly = - info.edDraft?.replace(/^http:/, 'https:') ?? - info.href?.replace(/^http:/, 'https:') ?? - null; - const status = - specrefStatusMapping[info.status] ?? - info.status ?? - "Editor's Draft"; - if (nightly?.startsWith("https://www.iso.org/")) { - // The URL is to a page that describes the spec, not to the spec - // itself (ISO specs are not public). - results[name] = { - title: info.title - } - } - else { - results[name] = { - nightly: { url: nightly, status }, - title: info.title - }; - } - } - }); - }); - - return results; -} - - async function fetchInfoFromIETF(specs, options) { async function fetchRFCName(docUrl) { const body = await fetchJSON(docUrl, options); @@ -611,7 +506,6 @@ async function fetchInfo(specs, options) { { name: 'w3c', fn: fetchInfoFromW3CApi }, { name: 'ietf', fn: fetchInfoFromIETF }, { name: 'whatwg', fn: fetchInfoFromWHATWG }, - { name: 'specref', fn: fetchInfoFromSpecref }, { name: 'spec', fn: fetchInfoFromSpecs } ]; let remainingSpecs = specs; diff --git a/test/fetch-info.js b/test/fetch-info.js index 0f135586..b32c2233 100644 --- a/test/fetch-info.js +++ b/test/fetch-info.js @@ -35,45 +35,6 @@ describe("fetch-info module", function () { }); }); - describe("fetch from Specref", () => { - it("works on an ISO spec", async () => { - const spec = { - url: "https://www.iso.org/standard/85253.html", - shortname: "iso18181-2" - }; - const info = await fetchInfo([spec]); - assert.ok(info[spec.shortname]); - assert.equal(info[spec.shortname].source, "specref"); - assert.equal(info[spec.shortname].title, "Information technology — JPEG XL image coding system — Part 2: File format"); - assert.equal(info[spec.shortname].nightly, undefined); - }); - - it("can operate on multiple specs at once", async () => { - const spec = getW3CSpec("presentation-api"); - const other = getW3CSpec("hr-time-2"); - const info = await fetchInfo([spec, other]); - assert.ok(info[spec.shortname]); - assert.equal(info[spec.shortname].source, "w3c"); - assert.equal(info[spec.shortname].nightly.url, "https://w3c.github.io/presentation-api/"); - assert.equal(info[spec.shortname].title, "Presentation API"); - - assert.ok(info[other.shortname]); - assert.equal(info[other.shortname].source, "w3c"); - assert.equal(info[other.shortname].nightly.url, "https://w3c.github.io/hr-time/"); - assert.equal(info[other.shortname].title, "High Resolution Time Level 2"); - }); - - it("does not retrieve info from a spec that got contributed to Specref", async () => { - const spec = { - url: "https://registry.khronos.org/webgl/extensions/ANGLE_instanced_arrays/", - shortname: "ANGLE_instanced_arrays" - }; - const info = await fetchInfo([spec]); - assert.ok(info[spec.shortname]); - assert.equal(info[spec.shortname].source, "spec"); - }); - }); - describe("fetch from IETF datatracker", () => { it("fetches info about RFCs from datatracker", async () => { const spec = { @@ -337,6 +298,21 @@ describe("fetch-info module", function () { fetchInfo([spec]), /^Error: W3C API redirects "webaudio" to "webaudio-.*"/); }); + + it("can operate on multiple specs at once", async () => { + const spec = getW3CSpec("presentation-api"); + const other = getW3CSpec("hr-time-2"); + const info = await fetchInfo([spec, other]); + assert.ok(info[spec.shortname]); + assert.equal(info[spec.shortname].source, "w3c"); + assert.equal(info[spec.shortname].nightly.url, "https://w3c.github.io/presentation-api/"); + assert.equal(info[spec.shortname].title, "Presentation API"); + + assert.ok(info[other.shortname]); + assert.equal(info[other.shortname].source, "w3c"); + assert.equal(info[other.shortname].nightly.url, "https://w3c.github.io/hr-time/"); + assert.equal(info[other.shortname].title, "High Resolution Time Level 2"); + }); }); describe("fetch from all sources", () => {