Add summary for scraped sources from specfiles #216

mfocko · 2024-08-15T10:34:32Z

Closes packit/packit-service#2390 Signed-off-by: Matej Focko <[email protected]>

lbarcziova

nice work, thanks!

research/specfiles/hosting-sources/process.py

lbarcziova · 2024-08-15T11:11:22Z

research/specfiles/hosting-sources/index.md

+This “research” provides the scripts that have been used to process the scraped
+sources.
+
+### Domains with ≥ 10 occurrences


so, what do you think will be the reasonable threshold for us to ask for the firewall adjustments?

Added some of the bigger domains, it appears that with dependencies it's easy to get ≥ 10. Additionally we don't have many blocked packages on the firewall, so I would consider only the really big hosts (like forges).

lbarcziova · 2024-08-15T11:19:06Z

research/specfiles/hosting-sources/index.md

+All `SourceX` fields of the specfiles have been initially scraped by the @msuchy.
+This “research” provides the scripts that have been used to process the scraped
+sources.
+


could you please include here some instructions on how the data can be obtained (how should be the script run)?

Do you mean the script in this repo or the one that scrapes the specfiles?

would it make sense to include both? It would be enough to mention here what files represent what.

lachmanfrantisek

Nice job! The outcome looks really reasonable.

Co-authored-by: Laura Barcziová <[email protected]> Signed-off-by: Matej Focko <[email protected]>

Signed-off-by: Matej Focko <[email protected]>

Add summary for scraped sources from specfiles

c1b0798

Closes packit/packit-service#2390 Signed-off-by: Matej Focko <[email protected]>

mfocko self-assigned this Aug 15, 2024

lbarcziova reviewed Aug 15, 2024

View reviewed changes

lachmanfrantisek approved these changes Aug 15, 2024

View reviewed changes

mfocko and others added 3 commits August 27, 2024 15:08

Use the constant instead of hard-coded enumeration

9288bbf

Co-authored-by: Laura Barcziová <[email protected]> Signed-off-by: Matej Focko <[email protected]>

Add steps on how to reproduce

7d783d6

Signed-off-by: Matej Focko <[email protected]>

Add outcome of the research of hosting of sources

8238bb5

Signed-off-by: Matej Focko <[email protected]>

mfocko enabled auto-merge August 27, 2024 13:27

mfocko added this pull request to the merge queue Aug 27, 2024

Merged via the queue into packit:main with commit caba156 Aug 27, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add summary for scraped sources from specfiles #216

Add summary for scraped sources from specfiles #216

mfocko commented Aug 15, 2024

lbarcziova left a comment

lbarcziova Aug 15, 2024

mfocko Aug 27, 2024

lbarcziova Aug 15, 2024

mfocko Aug 19, 2024

lbarcziova Aug 22, 2024

lachmanfrantisek left a comment

Add summary for scraped sources from specfiles #216

Add summary for scraped sources from specfiles #216

Conversation

mfocko commented Aug 15, 2024

lbarcziova left a comment

Choose a reason for hiding this comment

lbarcziova Aug 15, 2024

Choose a reason for hiding this comment

mfocko Aug 27, 2024

Choose a reason for hiding this comment

lbarcziova Aug 15, 2024

Choose a reason for hiding this comment

mfocko Aug 19, 2024

Choose a reason for hiding this comment

lbarcziova Aug 22, 2024

Choose a reason for hiding this comment

lachmanfrantisek left a comment

Choose a reason for hiding this comment