Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.fuzz] testing: use generics with testing.F #46890

Closed
katiehockman opened this issue Jun 23, 2021 · 15 comments
Closed

[dev.fuzz] testing: use generics with testing.F #46890

katiehockman opened this issue Jun 23, 2021 · 15 comments
Labels
FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge fuzz Issues related to native fuzzing support NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@katiehockman
Copy link
Contributor

katiehockman commented Jun 23, 2021

This API idea and explanation comes from @rogpeppe. He suggests a change to the testing.F type which uses generics, rather than interface{} types, for the Fuzz and Add functions. I haven't yet formed an opinion on whether or not we should do this, but I think this issue can help start that conversation.

/cc @golang/fuzzing

In Roger's own words:

package testing

// F is a type passed to fuzz targets for fuzz testing.
// V is the type of the argument to be fuzzed.
type F[V any] struct {
	...
}

// Fuzz runs the fuzz function, ff, for fuzz testing.
func (f *F[V]) Fuzz(ff func(t *T, x V))

// Add will add the argument to the seed corpus for the fuzz target. This will
// be a no-op if called after or within the Fuzz function.
func (f *F[V]) Add(x V)

// example fuzz function.
func FuzzFoo(f *testing.F[string]) {
	f.Add("some-piece-of-corpus")
	f.Fuzz(func(t *testing.T, x string)) { ... })
}

Essentially when you define a fuzz function, you also declare what type of arguments you expect to receive too. This is all compatible with the current generics proposal.

In my view, this does give some nice advantages:

  • the API is very clear (no need for interface{} function arguments)
  • there's a good performance improvement to be had from the generic version because you can avoid the need to use the rather slow reflect.Value.Call

Here's a little micro-benchmark that demonstrates some performance difference (the overhead of the call goes from 150ns to 6ns): https://go2goplay.golang.org/p/Nz0CRLC2DA0

Here's a little piece of code to demonstrate that the technique could actually work for real: https://go2goplay.golang.org/p/ygvwbHw_S11
[Update: using testing/quick for a tiny bit more realism: https://go2goplay.golang.org/p/FOlQQIH5YXI ]

The main thing I'm arguing for here is that I think it's a good idea to restrict fuzz functions to have a single argument. That means they're amenable to the type-safe approach outlined above, with the arguably minor restriction that you'd have to define a struct in order to pass multiple arguments.

@katiehockman katiehockman added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. FeatureRequest Issues asking for a new feature that does not need a proposal. fuzz Issues related to native fuzzing support labels Jun 23, 2021
@katiehockman katiehockman added this to the Backlog milestone Jun 23, 2021
@jayconrod
Copy link
Contributor

+1, I like this idea. We talked about this while drafting the proposal, but at that time, we weren't comfortable blocking development on generics (it seemed like we might land fuzzing in 1.17).

The performance difference is a bit surprising, but I don't think it's decisive yet. It depends a lot of how generics are implemented. It sounds like there will be separate, specialized code generated for each type (with some deduplication), so we might be able to avoid dynamic dispatch altogether. Hard to say though.

Mainly I like how the API is more obvious. Methods that accept interface{} are always a bit confusing.

Agree that the key tradeoff is that we lose multiple arguments though.

@jayconrod
Copy link
Contributor

#46218 is somewhat related: we can use vet to type-check fuzz targets even without generics. We may want to do that anyway to detect misusage of testing.F, e.g., calling F.Fatal instead of T.Fatal in the fuzz function.

@rogpeppe
Copy link
Contributor

The performance difference is a bit surprising, but I don't think it's decisive yet. It depends a lot of how generics are implemented. It sounds like there will be separate, specialized code generated for each type (with some deduplication), so we might be able to avoid dynamic dispatch altogether. Hard to say though.

FWIW I ran the benchmark on the Go tip (using -gcflags=-G=3), not using go2go, so the results might not be too inaccurate. However, regardless of that, ISTM that the generic code will always be much faster than using reflect.Value.Call just because it has so much less work to do (no intermediate slices, no need to type check other than the cheap .(V) type conversion). It's perhaps worth mentioning that this particular call will always be dynamically dispatched because we're calling a closure.

#46218 is somewhat related: we can use vet to type-check fuzz targets even without generics.

Not in all cases - I could imagine people using helper packages for fuzzing that end up invoking
Add with an interface{}-valued argument which might be hard to analyse.

However, although the vet check would work in almost all cases, it's still a poor cousin of the
compiler type checking, and the generic type signatures do make things much clearer.
For example, the current fuzz docs on the dev.fuzz branch don't actually describe the required
signature of the ff argument, but that's not necessary if it's there in the type signature.

We may want to do that anyway to detect misusage of testing.F, e.g., calling F.Fatal instead of T.Fatal in the fuzz function.

Yup, some vet checking would definitely still be useful.

@ianlancetaylor
Copy link
Member

Just a note that performance measurements of the current implementation on the dev.typeparams branch are completely meaningless. Please don't try to draw any conclusions from how the branch behaves today.

@katiehockman
Copy link
Contributor Author

katiehockman commented Jun 29, 2021

I spent some time playing around with this, and doing some prototyping. I ran into a few roadblocks. I'm still pretty new in my understanding/use of generics, so these may very well be solvable problems, and if so, LMK.

  • You can't alias a generic type. This is going to be problematic for us, as we use aliasing to avoid importing the internal/fuzz package into the testing package. We might be able to work around this, but the readability and maintainability may suffer as a result. LMK if anyone has workarounds/ideas for this.
  • If we make testing.F a generic type, as is described in this proposal, then the types which use testing.F must also be generic. We use a cross-package InternalFuzzTarget type which has the fuzz target as a field. Making InternalFuzzTarget a generic type will be painful if not impossible with the way it is used and generated. Examples: [1] [2] [3]
    • One potential workaround would be to make InternalFuzzTarget.Fn an interface{} then build the function using reflect inside testing/fuzz.go, but that will hurt readability, and I'm still not 100% sure that will work.

I haven't investigated this fully, so there may be other roadblocks as well.

Assuming we can get around these issues and can support generics, like Ian said, we can't assume anything about the performance yet. That said, it seems to me that the decision then comes down to which we value more: compile-time checking or easier support for fuzzing multiple types.

That's a hard call to make right now. Although compile-time checking would be nice, Jay is right in that vet checks would probably cover the majority of the use cases for this. We can always add more documentation and examples to make this clearer, too.
I'm not sure how often people will want to fuzz multiple types, but requiring them to use a struct in order to do this is kind of unfortunate. For example, I can see someone wanting to do something like this:

func FuzzCmp(f *testing.F) {
  f.Fuzz(func(t *testing.T, a, b string) {
    strings.Compare(a, b)
  }
}

with generics, it would have to be something like this:

func FuzzCmp(f *testing.F) {
type fuzzType struct {
a, b string
}
f.Fuzz(func(t *testing.T, x fuzzType) {
strings.Compare(x.a, x.b)
}
}

(Edit: updated code below, see following comments)

type fuzzCmpType struct {
  a, b string
}
func FuzzCmp(f *testing.F[fuzzCmpType]) {
  f.Fuzz(func(t *testing.T, x fuzzType) {
    strings.Compare(x.a, x.b)
  }
}

which really hurts readability and usability in my opinion.

@ianlancetaylor
Copy link
Member

There is some discussion of aliasing of generic types over on #46477. A description of the use case here would be helpful there. Thanks.

It does seem clear that this proposal requires that f.Fuxx take only a single argument. That said, I think your example can be written as

func FuzzCmp(f *testing.F) {
    f.Fuzz(func(t *testing.T, x struct { a, b string }) {
        strings.Compare(x.a, x.b)
    }
}

which is still far from ideal but perhaps not quite as awful. One could also consider f.Fuzz2, f.Fuzz3, etc., for more arguments. (The ideal here would be generics with variadic type parameters, but that will certainly not be in the first release and I don't yet see a way to ever implement it.)

@katiehockman
Copy link
Contributor Author

Oh, actually, I forgot that with generics the testing.F type also has to be generic. So I think that fuzz target would have to be more like this:

type fuzzCmpType struct {
  a, b string
}
func FuzzCmp(f *testing.F[fuzzCmpType]) {
  f.Add(fuzzCmpType{"abc", "acb"))
  f.Fuzz(func(t *testing.T, x fuzzType) {
    strings.Compare(x.a, x.b)
  }
}

vs.

func FuzzCmp(f *testing.F) {
  f.Add("abc", "acb")
  f.Fuzz(func(t *testing.T, a, b string) {
    strings.Compare(a, b)
  }
}

Which is much worse. LMK if I'm missing something here though.

@ianlancetaylor
Copy link
Member

Good point, I think you are right. I think you'll have to pick either generic fuzzing or single fuzzing argument.

@katiehockman
Copy link
Contributor Author

katiehockman commented Jun 29, 2021

It's becoming apparent to me that the cost of supporting generics in this design would be pretty high, and I'm not yet convinced that fuzz testing is a good application of generics. There are a lot of situations where developers will benefit from fuzzing more than one argument at a time, and the overhead/complexity of needing to use a struct as shown in #46890 (comment) outweigh the benefits.

Here are some real-world examples of fuzz targets which use multiple types today (/cc @dsnet):

@dsnet
Copy link
Member

dsnet commented Jun 30, 2021

I don't have a strong opinion whether the generic approach is better or not.

The readability of the generic version is worse, but not so much worse that I find #46890 (comment) unbearable. If we had type inferencing for composite literals (see #12854), we could avoid the named type when calling f.Add.

On the other hand, the potential for performance gain is a non-trivial strength of the generic approach. The non-generic approach must use reflect.Call, which is fairly slow. One reason for the slowness is that the reflect.Call API forces many allocations. Even if that could be fixed (see #43732), it would only improve it by 1.5x. However, you would need something like a 20x improvement to match the performance of calling a function directly. The ability to call the fuzz function faster, means that we can fuzz more inputs per unit of time, which seems like a pretty strong strength of a generic approach.

Also, the type safety of the generic approach avoids issues like #45593.

@katiehockman
Copy link
Contributor Author

I am going to close this issue as it seems like the discussions has fizzled to an end, and the general consensus is not to support generics.

@rogpeppe
Copy link
Contributor

Sorry for the late comment here - I was on holiday when the above discussion happened.

I agree with @dsnet that the performance issue could be significant.

Also, thinking forward to generics in general, I suspect it is going to be quite common for a generic API to support calling an arbitrary function with a single argument, but it's not possible with the current proposal to support generalising such an API across functions with multiple arguments.

That is, a usability like the one found here seems likely to be a common issue across many generic APIs.

I hope that such a usability issue doesn't push all those APIs towards using interface{} and dynamic typing as is happening here, because I think the usability, safety and performance benefits of using a generic API are worth having. Why should we need to add special vet rules when the compiler will be able do a better job itself?

If we agree that this is a more general problem than just in this one API, then perhaps a more general solution might work.

Without changing the language, we could implement a tuple package:

package tuple

type T0 struct{}
type T2[A0, A1 any] struct {
    T0 A0
    T1 A1
}

// T returns all the tuple's values.
func (t T2[A0, A1]) T() (A0, A1) {
	return t.T0, t.T1
}

type Pair = T2

type T3[A0, A1, A2 any] struct {
    T0 A0
    T1 A1
    T2 A2
}

... etc

Then you could write your fuzzer like this, without the need to declare another type:

func FuzzCoder(f *testing.F[tuple.T2[int64, []byte]) {
	// Add a number of inputs to the corpus including valid and invalid data.
	for _, td := range coderTestdata {
		f.Add(tuple.T2{0, []byte(td.in)})
	}

	f.Fuzz(func(t *testing.T, arg tuple.T2[int64, []byte]) {
		seed, b := arg.T()
		rn := rand.NewSource(seed)
		etc
	})
}

Looking forward a bit further, I don't believe there's any reason we couldn't support tuples directly in the language.

It's still a little awkward compared to using the reflect.Call approach, but that's often true of DSL-type APIs - I believe it's worth paying a small usability price for the additional clarity and performance brought from the use of static types.

@jayconrod
Copy link
Contributor

@rogpeppe Thanks for adding this. I don't think we'd really thought about adding TupleN types. It seems a little better than having to define a struct type for arguments, but the readability still doesn't seem good to me.

@katiehockman, @rolandshoemaker and I talked about this over a few calls. I think we reached a consensus internally, so I'll try to sum that up here.

Pros:

  • Parameterizing testing.F, F.Add, and F.Fuzz would improve type safety. Getting the compiler to tell us about errors is preferable to a vet analysis (not implemented yet) or run-time check (already implemented), though vet results would always be surfaced through go test at least.
  • From type signatures in documentation, testing.F may be more intuitive to use if it's clear that the user needs to specify a type and stick with it.
  • Performance of calling the fuzz function with generics might be better than reflect.Call.

Cons:

  • We're limited to a single fuzzable parameter. Currently, the ability to easily fuzz functions with mulitple parameters is very appealing. It's hard to estimate the value of this since most other stuff out there starts with []byte.
  • Users can work around this limitation by defining a struct type or using a generic TupleN type as above. The examples seem a lot less readable than what we have though.

Concerns:

  • If the generics proposal allowed variadic type arguments, I think we'd be much more tempted to use that. Maybe the langauge will support that some day, but not in 1.18. We don't want to hold our breaths.
  • We can't quantify the performance advantage of a generic call over reflect.Call. We could probably estimate an upper bound improvement by implementing a specialized version that only accepted []byte. Even so, the overhead of reflect.Call may be small compared with the mutator or the fuzz function itself.
  • As a pratical concern, we're running short on time to make big changes, and this would a lot of rewriting. If we did this, even if we spent much more time investigating and quantifying this, we'd need to cross some things off all: umbrella issue for release-blocking fuzzing work #47037 to ship in 1.18. @katiehockman already spent a few days prototyping this and timeboxed it.

@rogpeppe
Copy link
Contributor

Another pro: you need less type conversions, as the compiler knows what the type is. For example, in the FuzzCoder function above, there was no need to use int64(0) because the usual constant rules do the conversion for us.

Performance of calling the fuzz function with generics might be better than reflect.Call.

It seems clear to me that the performance would definitely be better than reflect.Call, although you're of course right that the improvement might be dominated by the overhead of the mutator (we can't say anything about the fuzz function itself).
My reasoning for that is that we're seriously proposing a slices package to do operations on slices, including some operations on function values. If calling a generic function value was anywhere near the performance of reflect.Call, it wouldn't be particularly viable IMHO. Also, reflect.Call needs to do more work than any reasonable generic call: it always needs to check the type compatibility of the provided arguments, whereas a generic call mechanism does not because the compiler has done that already.

If the generics proposal allowed variadic type arguments, I think we'd be much more tempted to use that. Maybe the langauge will support that some day, but not in 1.18. We don't want to hold our breaths.

I don't see variadic type arguments on the horizon any time soon (I don't think anyone has any idea how they might work), but tuples themselves might fit in quite easily, leading to something like this, which is comparable in readability to the current proposal, IMHO.

func FuzzCoder(f *testing.F[(int64, []byte)]) {
	// Add a number of inputs to the corpus including valid and invalid data.
	for _, td := range coderTestdata {
		f.Add((0, []byte(td.in)))
	}

	f.Fuzz(func(t *testing.T, arg (int64, []byte)) {
		seed, b := arg.T()
		rn := rand.NewSource(seed)
		etc
	})
}

As a practical concern, we're running short on time to make big changes, and this would a lot of rewriting.

FWIW I took a brief look at this, and at first glance I don't think it would require a lot of rewriting. The main things that need to change AFAICS are the code in cmd/go that checks whether a function looks like a fuzz function, and the code inside F.Fuzz that actually does the invocation. That should be enough initially, although you'd probably also want to simplify all the code that currently uses an array of types/values to use a single type/value too.

A sketch of the changes to the testing package (diff against 988d024): https://gist.github.com/rogpeppe/ff213c3553143a1600005b0f71ad3628

The main problem with this for landing in 1.18 is that the Go compiler on tip doesn't currently support exported generic functions, and might not until later in the cycle, so that may well be a show stopper even if this proposal is deemed to be a good idea.

@rsc
Copy link
Contributor

rsc commented Aug 23, 2021

Using generics doesn't seem like a good user experience.
The non-generic version requires listing the argument types once,
while the generic version requires listing the argument types twice.

If there is a performance problem being solved by generics,
perhaps we can solve it with generated code instead?
The go command is already reading the test file and generating code.
If more needs to be generated, no big deal.
And then the user experience is not degraded just for performance.

@golang golang locked and limited conversation to collaborators Aug 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge fuzz Issues related to native fuzzing support NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Development

No branches or pull requests

7 participants