-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge "sibling" RSS item that belong to a multi-entry docket item #217
base: main
Are you sure you want to change the base?
Merge "sibling" RSS item that belong to a multi-entry docket item #217
Conversation
Attempt to explain why we override this function. I...actually don't understand why we do, so this explanation may be wrong. It seems like we could have fit within the BaseDocketReport framework of returning our information as `docket_entries` rather than `data` but maybe I'm missing something.
Snazzier than setting an old= var at the end of the loop.
prevdata -> lastdata (well, not really) preventry -> previous_entry Elminate `de` and just use data['docket_entries'] which we moved up
Because otherwise tests break. I think there may be a platform issue here, but this fix is obviously correct, so I'm not worried about masking it.
# the previous entry's and continue the loop. | ||
if ( | ||
data_list and data_list[-1][u'docket_entries'] | ||
and data[u'docket_entries'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like these two lines are of a different nature than the rest of the equality lines. Is it worth pulling them out of the if statement here? For example, maintain the logic, but do these types of tests in an earlier if statement? Something like:
if data_list and data_list[-1][u'docket_entries'] and data[u'docket_entries']:
if all([entry.title == previous_entry.title, ...]):
It might allow you to put the comments about equality closer to where the equality is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err...I think it is advantageous to not materially increase the nesting level of logic with gratuitous nested ifs.
The comments didn't seem to me to be far removed from equality — to me they made sense before the word if
.
Perhaps your reaction is to the fact that the comments don't discuss the first 2 lines of the if expressions…that's because I was focusing on what was maybe a little specialized. But certainly the comment could be broadened to discus the first 2 lines, i.e. "If there are docket entries associated with the previous entry and the current entry, and if those two entries match in metdata…"
But also, there's nothing stopping us from putting comments in the midst of the if expressions, so if there's really a desire for "Instructions at the point of need" then that can just happen, there's no need to change the structure of the if
statement.
So if you want to change the structure of the if
, probably a better reason is required. I actually did try the all()
style and concluded erroneously that it didn't work here (I think I omitted the []
), but that said it's not terribly familiar to me so I wouldn't vote for it though I don't oppose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the extraneous nesting bugged me too. If you pull all this logic into a function, you could reverse the logic here and do:
if not all([this, that, the other thing]):
return # Or what have you
That'd avoid the deep logic and keep the two things apart?
and entry.id == previous_entry.id | ||
and entry.published == previous_entry.published | ||
): | ||
data_list[-1][u'docket_entries'][0][u'short_description'] += ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like a line like this deserves a comment, if even a short one. Or maybe break the bits into variables for readability?
previous_entry_desc = data_list[-1][u'docket_entries'][0][u'short_description']
current_entry_desc = data[u'docket_entries'][0][u'short_description']
combined_desc = ' AND '.join([previous_entry_desc, current_entry_desc])
data_list[-1][u'docket_entries'][0][u'short_description'] = combined_desc
That's a bit wordy, but clearer. This multiline thing is pretty bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think that is clearer.
Forcing the repetition of data_list[-1][u'docket_entries'][0][u'short_description']
is something I would like to avoid, because it requires the reader to parse the 5 levels of hierarchy in their brain twice, and it's not obvious until inspection that they are the same. Edit: esp. when they're not consecutive.
Of course, the flipside is that I suspect you have an unfamiliarity with the +=
operator and do not love it the way you should, and no amount of cajoling will instill that love in you until you have worked with it for a long time. To my eye the avoidance of complex repetition is the strongest case for the so-called "addition assignment" operator. (The real problem is you can't do many of the usual fixes because the complexity is in an lvalue not just an rvalue.
And I definitely don't think that ' AND '.join([a, b])
is somehow clearer than a+' AND '+b
.
I'll think about how to write this better.
At one point I had prevdata
(or previous_data
) instead of data_list[-1]
but I axed it because I felt it wasn't helping. (d59e007)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm familiar with the +=
. It's more about how you have three lines of code to do one small thing. That's a smell, but I'll await further approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me:
I suspect A and B
Mike:
Not A.
So B then. As I feared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I suspect you won't go for this one, but it kind of jumped out at me so I thought I'd offer it here:
def entry(d):
return d[u'docket_entries'][0]
def get_append_shorttext(de, text=''):
de[u'short_description'] += text
return de[u'short_description']
previous_entry = entry(data_list[-1])
current_text = get_append_shorttext(entry(data))
get_append_shorttext(previous_entry, ' AND ' + current_text)
Think of it as a conversation-starter :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's a lot of indirection in there, and get_append_shorttext
is either getting or setting depending on the text
argument....not great.
Hm. What about something like:
previous_entry = data_list[-1][u'docket_entries'][0]
current_entry = data[u'docket_entries'][0]
previous_entry[u'short_description'] += ' AND ' + current_entry[u'short_description']
That gets us back down to three lines, avoids repetition of the 5 levels of hierarchy, and uses +=
instead of a join? No additional helper functions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, while I'm functionally happen with that, it fails PEP-8. And it's actually kind of tricky to fix, and it's why I introduced the helper function, which I agree is totally ridiculous and I'm not sure I could dignify by calling it a strawman:
# 1 2 3 4 5 6 7 8
#2345678901234567890123456789012345678901234567890123456789012345678901234567890
previous_entry[u'short_description'] += ' AND ' +current_entry[u'short_description'] #1
previous_entry[u'short_description'] += ( #2
' AND ' + current_entry[u'short_description'])
previous_entry[u'short_description'] += ( #3
' AND ' +
current_entry[u'short_description'])
previous_entry[u'short_description'] += ( #4
' AND ' +
current_entry[u'short_description']
)
previous_entry[u'short_description'] += \
' AND ' + current_entry[u'short_description'] #5
Although I find 5 tempting, I think that backslash terminates are disfavored (although not against PEP8?).
And of 2,3,4 I dunno. I had the sense you wanted to avoid multiple lines for this expression, and yet there's a certain additional clarity. What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 5 has my vote followed closely by 1 (a minor PEP8 violation isn't the end of the world in my book). 2, 3, and 4 just don't seem worth it. I leave the rest of this to you. Let's land this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One other thing that doesn't work in a line comment: What about pulling all this logic, comments, etc into its own little method/function so that the data method is still small? This feels like a lot for it.
Well, doing that would basically pull 90% of the bulk out of I don't think there's much benefit but I'll give it a whirl. |
It's super-confusing to talk about "docket entries" as well as "RSS entries" and variables called "entry" don't help disambiguate this. Unfortunately we can't do much about the fact that feedparser's dict of items is `.entries` but hopefully this is still a readability improvement.
Err, so...maybe you had some better idea, but doing this really doesn't seem [to me] to work: johnhawkinson@ad5be23 (specifically these lines). I mean, the function needs access to a lot of internal state from the principle And it also has to tell that So I don't think this is a great way to go. But I'm a little tired, so maybe I'm missing something important. On the other hand, we could also duplicate the Hoping for a better suggestion... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping for a better suggestion...
I can't figure out Github's reply mechanism for your comment, but what about doing this as a separate step after we loop through and create all the dockets. So, instead of injecting doing the merging in the currently loop, we create a separate function after the loop terminated that does the merging?
and entry.id == previous_entry.id | ||
and entry.published == previous_entry.published | ||
): | ||
data_list[-1][u'docket_entries'][0][u'short_description'] += ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's a lot of indirection in there, and get_append_shorttext
is either getting or setting depending on the text
argument....not great.
Hm. What about something like:
previous_entry = data_list[-1][u'docket_entries'][0]
current_entry = data[u'docket_entries'][0]
previous_entry[u'short_description'] += ' AND ' + current_entry[u'short_description']
That gets us back down to three lines, avoids repetition of the 5 levels of hierarchy, and uses +=
instead of a join? No additional helper functions needed?
Ha! It was just a regular "reply" at the bottom of the page.
Umm, sure. It's what I suggested above:
I figured the ugliness of looping twice would doom this idea, but if it's your preference I can do that. |
I actually don't mind processing data multiple times like that. Reminds me of map/reduce/filter kinds of processing. |
@johnhawkinson hi 👋 |
Fixes freelawproject/recap#248.
Sorry this history is a little messier than it should be, but it's all going to be squashed so there's probably not much point in cleaning it up. I am really sad that we don't get to keep 49cbaf2 which the non-Python progarmmer in me really likes as a solution to avoid if expressions that don't line up:
I'm also not 100% the explanation in a978447 is accurate; and if it is, why was that the right way to go?: