-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse array of string types #512
Comments
In theory the empty string should be julia> one(String)
"" |
Interesting... I think the empty string still makes sense in sparse string vectors though. |
You can always add these at your own risk as I have not idea what they will break. julia> Base.zero(::String) = ""
julia> Base.iszero(x::String) = length(x) == 0
julia> sparse(["123", "", "234"])
3-element SparseVector{String, Int64} with 2 stored entries:
[1] = "123"
[3] = "234" IMHO, it might make sense to have an internal zero and iszero so that it's possible to overwrite them for sparse arrays only. |
If I were writing an application, sure. But I'm writing a library, so changing Having |
I agree. @ViralBShah @dkarrasch @LilithHafner Does it make sense to you? |
I'd like to have an overlay structure similar to Jutho/SparseArrayKit.jl#13 . However SparseArrays.jl can't depend on FillArrays.jl and I don't know how to work around nicely. |
That sounds very sensible. It provides more flexibility at seemingly no cost. I wonder though what applications one has in mind, if one defines |
First, Second, Having |
The problem is of abstractions. The SparseMatrix data type and its numeric types are designed for numerical sparse matrix linear algebra calculations. So extending to non-numeric types is always going to feel unnatural. It would be nice to get as much as we can get for free within this design - I do get that. For strings, isn't it better to use a different data structure like a dictionary? |
There is a proposal floating around somewhere to support sparse arrays with custom "zero" values. Implementing that (either here or in a different package) would make a sparse array of strings fit naturally into the abstraction. |
I don't like this SparseArrays.zero(::Type{String}) == "" because SparseArrays assumes zero * x == zero, and so a sparse array of strings will behave oddly under concatenation. A structural empty string concatenated with anything result in the empty string while stored empty strings concatenated with anything (x) will result in that thing (x). |
That's the idea.
Depends on the use case (of course). In my case, definitely not. Vectors have properties like length, and can be sliced, and masked, and so on.
(As an aside, I would have been happier if string concatenation was considered to be addition rather than multiplication, commutativity be damned. E.g. In what context does the above matter? Note that you can't add strings so trying to do linear algebra on strings would fail anyway.
That would be nice. |
When performing broadcasted julia> x = sprandstring(10, .1)
10-element SparseVector{String, Int64} with 2 stored entries:
[8 ] = "VlnHeUDF"
[10] = "LumRtFcf"
julia> y = sprandstring(10, .1)
10-element SparseVector{String, Int64} with 1 stored entry:
[7] = "SlumPG6z"
julia> julia> x .* y
10-element SparseVector{String, Int64} with 0 stored entries That looks like a bug, and it would be hard to avoid without adding special cases to support the zero element not being a multiplicative zero. (note: this example does not run, it is extrapolated from the results of |
I see - good point, this wouldn't be good :-( (If string concatenation was addition, this wouldn't have been a problem...) |
There's a lot to consider when deciding whether string concatenation is +, *, or something else. When/if I develop a new language or stdlib spec from scratch this sort of thing will be worth considering. Within Julia, though, it's way to late to reconsider. You're right though. This would "just work" if that language design decision were different. |
Maybe the solution is adding a wrapper that does the job? something like struct Wrap
s::Union{Nothing,String}
end
Wrap()=Wrap(nothing)
Base.zero(::Wrap) = Wrap()
Base.iszero(x::Wrap) = x.s === nothing
sparse(Wrap.(["123", nothing, "234"])) |
Nice workaround, should have thought of that. It would bubble up to everywhere in the client code (e.g. |
I tried to create a sparse vector of strings, and it failed due to
zero(String)
not being defined. I'm using the empty string""
as the zero value. Any chance of enhancingSparseArrays
to support this?The text was updated successfully, but these errors were encountered: