-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entire Section text version #171
Comments
@TimTCM - thanks for the suggestion. We'll look at feasibility of this. From an acceptance criteria point of view, would having a complete package html/text file meet the need? This would include the text for the entire daily issue. For this:
Could you provide an example where a single Senate speaker's remarks are split across multiple granules? That will help us understand that portion a bit better. If they are speaking on different subjects, it makes sense to me that they would have separate granules in GovInfo, but perhaps there's a scenario that I'm not thinking of at the moment. |
That would work. Thank you!
Almost every day, the Senate leader remarks at the beginning of the day, which usually cover different subjects, get broken up by topic. Sometimes pages, after the first one, when divided up this way also don't have the speaker name at the beginning of the remarks. In the following examples,
Here are some examples, from 7/31:
Here, from 9/25, the China page doesn't have the speaker name on it:
If they're not broken up by topic, then that's because they get rolled into a long Legislative Session or Executive Session page. For example, from 7/25:
Either way, I've never seen a single link from the source to a leader's full remarks and only full remarks, notwithstanding the times a leader only spoke on one topic. It's not just leaders that get their remarks split up over multiple pages. Here are some other examples, from 9/25:
That last example with Senator Kennedy is mixed. He starts by saying, "Three quick points," and then his remarks get split over three pages with only the first having his name at the beginning. What's mixed about this one is his third point becomes a unanimous consent request. UC debates have multiple Senators speaking and it makes sense to have the whole debate on one page. One way to tell from GPO if sometimes-long pages are broken up into shorter ones by speaker is if the title is in all caps or not. All-caps "SESSION" pages can be very long, while if title words are mostly in lower-case, then it seems the Reporters of Congressional Debate added more dividers in the content. Senate 9/24 has lots of lowercase. In contrast, 9/19 Senate has a lengthy page with lots of things combined:
Topical divisions instead of speaker divisions tend to cause speaker names to be missing from the beginning of subsequent pages. What happens is a Senator gives a speech, another Senator arrives on the Senate floor to talk about something else, listens to the current speaker, and then when the next Senator goes to speak, the Senator first starts with some comments about the previous speaker before speaking on their main topic. Then, in the Record, the subsequent speaker's name is on the previous page, and not at the beginning—or sometimes even at all—on the page where the main substance of their remarks is found. For instance, from 9/18, there's no Senator name on the second page because it already appeared on the first:
UC requests don't always get their own pages, too. Sometimes they do, like on 9/25:
Sometimes they don't, like on 9/17:
To bring it back to your question about the usefulness of a comprehensive page that has everything in succession, particularly for the Senate, yes, that would help deal with the many different ways pages and the same types of content get divided in the Congressional Record. I realize some of these things may be artifacts of how the content appears in print. Adding the name on pages where it often seems missing may not happen because of this. Having a comprehensive version makes it easier to divide content by speaker. |
Thanks for the additional detail. From the content originator's perspective, breaking by subject was the original request, but I see where consolidating in a different manner would be helpful. At this time, we are looking at providing the text files at a package or book level. |
This is something we are looking at as a March 2025 item. |
Even with the current Microcomp format, would GPO be willing to publish a text file of the Congressional Record's four sections in entirety?
Entire sections are available in PDF, but not in text.
One can create a combined version with the downloaded zip file, and I am doing so right now, and confidence in the product would be greater if the official source had this available.
Over time, I don't plan to store the Record indefinitely, and so if later there is content that needs re-caching, I'd like to be able to pull from the official source without reprocessing the whole zip file on the fly each time.
This would be especially helpful for things that combine pages like House Morning Hour debate, one-minute speeches, and then even more so in the Senate where a Senate speaker's remarks can cross multiple pages as they are currently divided.
Thank you
The text was updated successfully, but these errors were encountered: