Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

develop gtfs cleaner to handle repeated pair (trip_id, departure_time) warnings #73

Closed
ethan-moss opened this issue Aug 17, 2023 · 2 comments
Assignees
Labels
GTFS technical debt A better way is available. Fix later approach has been adopted. wontfix This will not be worked on
Milestone

Comments

@ethan-moss
Copy link
Collaborator

Description of the Feature to be Added

Calling is_valid() regularly leads to repeated pair (trip_id, departure_time) in stop_times.txt but these are not cleaned by clean_feed(). Further investigation shows that:

  • typically this is because neighbouring rows in stop_times.txt, within a given trip, have duplicate trip_id, arrival_time, departure_time, and stop_id. It's clear this circumstance is an erroneous duplication.
  • less often, there are neighbouring stops very close together, but they have different stop_ids - it's possible for these to be timetabled with the same time because of their proximity. Here is an example where a bus turns around and serves two stops on either side of the road (near Dingestow):
    Screenshot 2023-08-14 at 18 35 05

(OPTIONAL) Suggested Implementations

In stop_times.txt, when records have duplicate trip_id, arrival_time, departure_time, and stop_id these can be safely dropped (represent the same trip, stop and time). Develop a clean which drops these from the stop_times feed, to prevent these warnings from remaining.

Additional context

@ethan-moss ethan-moss added needs triage technical debt A better way is available. Fix later approach has been adopted. GTFS labels Aug 17, 2023
@CBROWN-ONS CBROWN-ONS self-assigned this Oct 9, 2023
@CBROWN-ONS CBROWN-ONS added this to the sprint 5 end milestone Oct 9, 2023
@r-leyshon
Copy link
Contributor

@ethan-moss
I need help in understanding the issue:

1	typically this is because neighbouring rows in stop_times.txt, within a given trip, have duplicate trip_id,
arrival_time, departure_time, and stop_id. It's clear this circumstance is an erroneous duplication.

2	less often, there are neighbouring stops very close together, but they have different stop_ids - it's possible for
these to be timetabled with the same time because of their proximity. Here is an example where a bus turns around
and serves two stops on either side of the road (near Dingestow):

Point 1 seems straight forward - whenever consecutive trips have the same trip_id, arrival_time, departure_time, and stop_id, drop the duplicate.

Point 2 - I am uncertain if it asks for any intervention. 2 different stops that are very close together and have the same arrival / departure time (not specified above). As these have different stop_ids they would not qualify for cleaning in point 1. Therefore I am uncertain as to whether this point calls for any treatment. Could you let me know if I have misunderstood the ask?

@r-leyshon
Copy link
Contributor

Migrated to datasciencecampus/assess_gtfs#5

@r-leyshon r-leyshon closed this as not planned Won't fix, can't repro, duplicate, stale Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS technical debt A better way is available. Fix later approach has been adopted. wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants