Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teuthology/scrape.py: Remove empty string in _get_service_types #1889

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Sep 6, 2023

Problem:

the function grep returns a list contianing empty string which results in scrape.py throwing the warning "Misunderstood line: ".

Solution:

filter out empty strings, blank space and None
before getting match with regex.

Fixes: https://tracker.ceph.com/issues/62534

@@ -410,7 +410,10 @@ def _get_service_types(self, job):
result = defaultdict(list)
# Lines like:
# 2014-08-22T20:07:18.668 ERROR:tasks.ceph:saw valgrind issue <kind>Leak_DefinitelyLost</kind> in /var/log/ceph/valgrind/osd.3.log.gz
for line in grep(os.path.join(job.path, "teuthology.log"), "</kind> in "):
valgrind_err_line = grep(os.path.join(job.path, "teuthology.log"), "</kind> in ")
valgrind_err_line = list(filter(None, valgrind_err_line)) # remove empty strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the line below covers these cases. no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I wanted to get rid of both empty string, empty space and None. I just revised should be a one liner and covers all cases

…ypes

Problem:

the function grep returns a list contianing empty string which
results in scrape.py throwing the warning "Misunderstood line: ".

Solution:

filter out empty strings, blank space and None
before getting match with regex.

Fixes: https://tracker.ceph.com/issues/62534

Signed-off-by: Kamoltat Sirivadhna <[email protected]>
@kamoltat
Copy link
Member Author

kamoltat commented Sep 11, 2023

https://pulpito.ceph.com/ksirivad-2023-09-08_18:32:34-rados-main-distro-default-smithi/

Tried testing with the new change by checking out wip-ksirivad-fix-62534 in teuthology server and then ran the command:

./virtualenv/bin/teuthology-suite -v --ceph-repo https://github.com/ceph/ceph-ci.git --suite-repo https://github.com/ceph/ceph-ci.git --ceph main --rerun yuriw-2023-08-17_21:18:20-rados-wip-yuri11-testing-2023-08-17-0823-distro-default-smithi -m smithi -e [email protected] -k distro --filter-all "valgrind" -p 80

However, it doesn't seem like the problem goes away.

@kamoltat
Copy link
Member Author

@zmc Okay I think once this is merged, I will test it in production to see if the issue is resolved. Let me know when you approved this.

@kamoltat
Copy link
Member Author

Think maybe I need to push to upstream, instead of my origin (forked) and put --teuthology-branch wip-ksirivad-fix-62534.

So I did just that, waiting to see the result http://pulpito.front.sepia.ceph.com/ksirivad-2023-09-27_14:03:08-rados-main-distro-default-smithi/

for line in grep(os.path.join(job.path, "teuthology.log"), "</kind> in "):
valgrind_err_line = grep(os.path.join(job.path, "teuthology.log"), "</kind> in ")
# removes blank space, empty string and None
valgrind_err_line = [line for line in valgrind_err_line if line and line.strip()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teuthology logs are usually huge, and grep function is already returns list of lines, using of list comprehension is quite expensive here, maybe it is better to use a generator expression?
but I've got more general question, is there any sense to call grep in Call out to native grep rather than feeding massive log files through python line by line, are python regex are that slow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my benchmarks, using a list comprehension is ~2x as fast as grep for a statement like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My points were:

  1. use generator expression instead of list comprehension here.
  2. remove scrape.grep methods and replace with pure python equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's more clear. I don't disagree, but I think replacing grep() can be considered a separate issue from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's more clear. I don't disagree, but I think replacing grep() can be considered a separate issue from this PR.

There are just two places where grep is used, this place and another one with similar logic. At least with this PR in this place just drop grep usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, do you want to merge this PR as is?

@kshtsk
Copy link
Contributor

kshtsk commented Jul 2, 2024

Does anyone plan to work on this PR?
Though change is pretty simple, maybe stupid question, why don't make a test mocking grep covering the case?

@kshtsk
Copy link
Contributor

kshtsk commented Aug 5, 2024

@kamoltat ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants