Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect time zone offsets in arts organization's Event structured data #123

Open
fjjulien opened this issue Aug 5, 2024 · 3 comments
Open

Comments

@fjjulien
Copy link
Contributor

fjjulien commented Aug 5, 2024

Problem

Several arts organizations have an incorrect time zone offset value in their Event startDate.

There are three common mistakes I regularly observe:

1. The time zone offset value is not adjusted to take into account daylight saving.

  • Example: An event is taking place on September 1st at 8:00 pm in New Brunswick. The proper dateTime value for this event should be: "2024-09-01T20:00:00-03:00". However, if the daylight savings adjustment is not taken into consideration, the date time value might be populated as if it were in Atlantic Standard Time (which would be "2024-09-01T20:00:00-04:00"). The wrong dateTime value is the equivalent of 11:00 pm in UTC or 7:00 pm in Atlantic Daylight Time.
  • Possible causes:
    • Human error: The structured data is manually populated and the person responsible for this tasks forgets to adjust the time zone offset.
    • Incorrect settings in website's CMS: If the website's time zone is not defined as an absolute value (for example, "-04:00") rather than a relative value (for example, "Halifax - North America (Atlantic Time)"), the CMS will not automatically adjust for daylight savings time.
    • Coding error: A custom code is populating the dateTime value but the developer who wrote it didn't know how to factor in daylight savings adjustments.

2. The timezone offset value is defaulted to +00:00

  • Example: An event is taking place on September 1st at 8:00 pm in New Brunswick. The proper dateTime value for this event should be "2024-09-01T20:00:00-03:00", but it reads "2024-09-01T20:00:00+00:00" instead. This wrong dateTime value is the equivalent of 4:00 pm in Atlantic Daylight Time.
  • Possible cause:
    • Incorrect settings in website's CMS: If the website's time zone is not defined in the CMS's general settings, then any event plugin that rely the website's time zone will be using the nul value +00:00 instead of the proper time zone offset. In a WordPress site, the default time zone for the entire site is set in Settings > General > Timezone.
      • Note: If this problem is found and solved, any dateTime value that you was previously populated may be automatically modified (i.e. 20:00:00+00:00 may become 16:00:00-04:00).

3. The timezone offset value is missing

  • Example: An event is taking place on September 1st at 8:00 pm in New Brunswick. The proper dateTime value for this event should be "2024-09-01T20:00:00-03:00", but it reads "2024-09-01T20:00:00" instead.
  • Possible causes
    • Human error: The structured data is manually populated and the person responsible for this tasks chooses not to include the time zone offset.
    • Coding error (or decision): A developer might choose to voluntarily omit the time zone value.
      • Note: This is not as big a problem as the other two. When no time zone offset value is specified, Google presumes that the time zone is the local time zone for the place.

While solving these kinds of problems at the source is usually the preferred course of action, time zone offset errors are so frequent that the Artsdata cannot reasonably contact and support every organization that has a wrong time zone offset value.

Potential solutions

If we want Artsdata to provide quality event data, I believe we must develop a capacity:

  • to infer the local time zone for an event;
  • to validate dateTime values against the inferred time zone; and to
    • to fill in missing time zones; and
    • correct wrong time zone values.

Inference of the local time zone based on the Place entity

If a place entity can be clearly identified, the local time zone can be inferred base on the city or the province. This method was used in the Spreadsheet to Artsdata MVP developed by A10s. Their script calls an API to retrieve the time zone based on the province's alpha-two standard code.

The key for this kind of programmatic inference is to have a clearly identified Place entity. This is not a given. Very few structured data sources do not have an location.id with a proper URI identifying the place, and even fewer have a location.sameAs pointing to an external persistent identifier.

Inference of the local time zone based on the source

All graphs in Artsdata come from a clearly identified source: sometimes a single organization, some other times an aggregator serving a geographic area. Events from such sources usually all take place within the same time zone (with possible rare exceptions). Until programmatic inference is possible, I propose that we manually assign a local time zone to Artsdata graphs (where possible). That would be a cost efficient means of inferring a local time zone to events.

Inference of time zone was initially in scope for the Nebula project, but it was removed from the scope. This capacity is now needed for the Algorithm for an Event Structured Data Score (discussion #120). We should reconsider the priority for this work.

@saumier
Copy link
Member

saumier commented Aug 5, 2024

@fjjulien Thanks for you analysis. I can add that, of the 3 common mistakes you mention, there are additional considerations and that only the 3rd case can be detected fully automatically. Here is my breakdown:

  1. The time zone offset value is not adjusted to take into account daylight saving
    This case cannot be easily detected. Today, there are 10 time-sensitive regions in Canada that need to be tracked for accurate time. Recently some regions have stopped using daylight savings, and should therefore not be adjusted. The Footlight CMS has the 10 regions listed. The only way I see this type of problem getting fixed is for a human to detect that a specific website has this problem across all events in the same time-sensitive region, and in this case it can be automatically fixed. I don't have an example of this case. @fjjulien Can you provide a website with this problem?

  2. The timezone offset value is defaulted to +00:00
    This case cannot be easily detected. Sometimes the time zone is specifically set to UTC and is correct. It is fairly common for computer systems do this and still maintain accurate time (8 PM on 2024-08-31 becomes 2024-09-01T00:00:00+00:00 in the data) . For example, GTQ uses +00:00 for all their startDates and they are correct. Also OSAC has specially decided to use +00:00 until they can find a better solution. As in the previous case, the only way I see this type of problem getting fixed is for a human to detect that a specific website has this problem across all events, and in this case it can be automatically fixed by assigning the website to a time-sensitive region and adjusting all events from the website. I don't have an example of this case. @fjjulien Can you provide a website with this problem?

  3. The timezone offset value is missing
    This case is possible to auto-detect. Once detected, it can be fixed by assigning the website to a time-sensitive region. There is only one website loaded into Artsdata with this problem spectart.com and I propose we use this as an example that can be fixed by assigning "Eastern Time" as the time-sensitive region. The task in Nebula will be to add a "post import" transform (sparql) that will be assigned to this artifact, and it will run every time this artifact is imported into Artsdata.

@fjjulien
Copy link
Contributor Author

fjjulien commented Aug 5, 2024

@saumier

1. The time zone offset value is not adjusted to take into account daylight saving
I don't know of any website that has this problem across all events. @dlh28 may (or not) have stumbled upon this case among her Digital Discoverability Program clients.

2. The time zone offset value is defaulted to +00:00
The Imperial Theatre (St. John, NB) has this issue. However, it's not their only dateTime issue: their date follows a YYYY-DD-MM syntax, and there is no time value. Again, we'll try to look for a case among program clients.

@fjjulien
Copy link
Contributor Author

fjjulien commented Aug 9, 2024

@saumier Voici un autre cas de mauvais fuseau horaire.

Theatre New Brunswick a des données structurées sur leur billetterie (Thundertix). Ces données ont les propriétés obligatoires d'Artsdata. Par contre, la valeur du fuseau horaire est erronée: le fuseau horaire par défaut est celui du Centre plutôt que celui de la région Atlantique. Ce fuseau horaire est peut-être défini par le serveur de leur compagnie de billetterie.
Exemple d'événement
Résultat du validateur de données

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants