-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add robust median to gopher filter #98
Merged
soldni
merged 16 commits into
allenai:main
from
KennethEnevoldsen:robust-median-for-gopher-filter
Jan 30, 2024
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
e429d56
Added robust median to gopher filter
KennethEnevoldsen fbe2682
Added robust median to gopher filter
KennethEnevoldsen 9aa8674
Merge branch 'robust-median-for-gopher-filter' of https://github.com/…
KennethEnevoldsen d213bd7
Added robust median to gopher filter
KennethEnevoldsen 6868fcb
Merge branch 'robust-median-for-gopher-filter' of https://github.com/…
KennethEnevoldsen 4d8d6c2
Added robust median to gopher filter
KennethEnevoldsen 7ff7de4
Merge branch 'robust-median-for-gopher-filter' of https://github.com/…
KennethEnevoldsen 69a30a7
Merge branch 'main' into robust-median-for-gopher-filter
soldni 215293f
fixed typing to use union
KennethEnevoldsen 5a9661f
reformatted with black
KennethEnevoldsen 06cabdd
Merge branch 'robust-median-for-gopher-filter' of https://github.com/…
KennethEnevoldsen 82beb0e
formatted using `make style`
KennethEnevoldsen 6194c57
Merge branch 'main' into robust-median-for-gopher-filter
soldni 9ce8c0b
attempting to fix style issues
soldni 2f34d34
one more style change using python 3.10
soldni a71641b
more style, make robust median always a float
soldni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
@kylel, @soldni | ||
|
||
""" | ||
|
||
from abc import abstractmethod | ||
from typing import List | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
@akshitab | ||
|
||
""" | ||
|
||
import logging | ||
import re | ||
from typing import List | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
@akshitab, @soldni | ||
|
||
""" | ||
|
||
import json | ||
import logging | ||
from pathlib import Path | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
@kylel, @soldni | ||
|
||
""" | ||
|
||
from typing import TYPE_CHECKING, Iterable, List, Tuple | ||
|
||
import necessary | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,6 @@ | |
|
||
""" | ||
|
||
|
||
import re | ||
from typing import List | ||
from warnings import warn | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,4 @@ | |
|
||
@add_tagger("extra_v1") | ||
class ExtraV1Tagger(BaseTagger): | ||
... | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,4 @@ | |
|
||
@add_tagger("extra_v3") | ||
class ExtraV1Tagger(BaseTagger): | ||
... | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,4 @@ | |
|
||
@add_tagger("extra_v2") | ||
class ExtraV2Tagger(BaseTagger): | ||
... | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,6 @@ | |
|
||
""" | ||
|
||
|
||
from unittest import TestCase | ||
|
||
from dolma.core.data_types import TextSlice | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given than
median_word_length
isbool | float
, wouldn't this makescore
potentially abool
? score is supposed to be afloat
, so we would have to cast back.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it starts out as a False, so tried to match the existing pattern. However the median can be undefined (empty list) but multiple value could represent that (np.nan, 0, False). I would probably go for np.nan or None if that is valid?