parameters for Adams2019 auto-scheduler #5090

benzwt · 2020-07-07T07:13:03Z

benzwt
Jul 7, 2020

Are there any other options I can provide for Adams2019 auto_scheduler, other than machine_params ?

I want to build a well scheduled functions for a specify platform which different from my host machine.
To my understanding, the result of auto-scheduler for machines differ for different machine(such as L1/L2 cache size and etc).

Can I specify some fine-grain machine parameters for the auto-scheduler such as L1/L2 size ?
Or is there any good way to obtain a scheduling without running auto-scheduler for every machine with different specifications ?

./lesson_21_generate -o . -g auto_schedule_gen -f auto_schedule_true -e static_library,h,schedule target=host  -p /opt/Halide/bin/libauto_schedule.so -s Adams2019 auto_schedule=true machine_params=32,16777216,40

abadams · 2020-07-07T17:10:06Z

abadams
Jul 7, 2020
Maintainer

That autoscheduler doesn't actually pay attention to machine_params except for the number of cores / parallelism term. It also uses the target to pick a good vector width. So setting the target accurately for your target machine (i.e. not "host") and setting the number of cores accurately in machine_params should be enough. If you want to tailor to a specific machine even more you can autotune on that machine (see apps/autoscheduler/autotune_loop.sh)

0 replies

benzwt · 2020-07-08T01:31:04Z

benzwt
Jul 8, 2020
Author

thanks for the responsive reply.

I have a few more questions to ask.
I wrote a mean filter with the filter size of 21x21

the auto-scheduler split the image into tiles with the size of 512x128. Is this decision machine dependent?
Will the same decision be made if I run the auto-scheduler on a different host? Just want to make sure that adams2019 doesn't take L1/L2 size into consideration.
the scheduler put data in stack-memory( .store_in(MemoryType::Stack)) for sake of speed. Is there any way to restrict the stack-memory buffer? Since buffer overflow might have occurred if the call is deep or the stack memory is small.

inline void apply_schedule_mean21(
    ::Halide::Pipeline pipeline,
    ::Halide::Target target
) {
    using ::Halide::Func;
    using ::Halide::MemoryType;
    using ::Halide::RVar;
    using ::Halide::TailStrategy;
    using ::Halide::Var;
    Func output_mean = pipeline.get_func(3);
    Func f0 = pipeline.get_func(2);
    Func f2 = pipeline.get_func(1);
    Var x(output_mean.get_schedule().dims()[0].var);
    Var xi("xi");
    Var xii("xii");
    Var y(output_mean.get_schedule().dims()[1].var);
    Var yi("yi");
    output_mean
        .split(x, x, xi, 512, TailStrategy::ShiftInwards)
        .split(y, y, yi, 128, TailStrategy::ShiftInwards)
        .split(xi, xi, xii, 32, TailStrategy::ShiftInwards)
        .vectorize(xii)
        .compute_root()
        .reorder(xii, xi, yi, x, y);
    f0.update(0)
        .split(x, x, xi, 16, TailStrategy::GuardWithIf)
        .vectorize(xi)
        .reorder(xi, x, y);
    f2
        .store_in(MemoryType::Stack)
        .split(x, x, xi, 32, TailStrategy::ShiftInwards)
        .vectorize(xi)
        .compute_at(f0, y)
        .reorder(xi, x, y);
    f0
        .split(x, x, xi, 16, TailStrategy::RoundUp)
        .vectorize(xi)
        .compute_at(output_mean, x)
        .reorder(xi, x, y);

}

0 replies

abadams · 2020-07-08T03:23:28Z

abadams
Jul 8, 2020
Maintainer

The things that influence it are the target (e.g. x86-64-sse41), the machine params (just the parallelism parameter), and the estimates specified in the source. Changing those might change the tile size. It doesn't matter what machine you run the compiler on (unless you use the target "host").

store_in(Stack) is a bit of a misnomer. It means the storage is stack-scoped. So it's allocated once at function entry and freed at function exit, rather than being allocated and freed inside the compute_at location. The actual allocation will on the heap though, if it's more than a few kb.

0 replies

benzwt · 2020-07-09T10:31:53Z

benzwt
Jul 9, 2020
Author

I add c_source to flag -e to obtain a copy of source code.
According to this source code, A stack is declared, indeed.
It this the actual source code to be compiled ?
Not so familar with how thing works in Halide workflow.

     for (int _output_mean_s0_x_xi_xi = 0; _output_mean_s0_x_xi_xi < 0 + 4; _output_mean_s0_x_xi_xi++)
     {
      {
       uint16_t _f0[2432];
       // produce f0
       int32_t _335 = _output_mean_s0_x_xi_xi * 64;
       int32_t _336 = _335 + _334;
       int32_t _337 = _305 + -3;
       ...

0 replies

abadams · 2020-07-09T17:18:17Z

abadams
Jul 9, 2020
Maintainer

No, that's not the source to be compiled. Halide compiles directly to machine code, and doesn't go via C source code. The generated C source is an alternative output for situations where you need equivalentish C code.

For a 5k allocation like this, Halide will indeed just use the stack. The threshold where it actually uses heap storage with stack-lifetime is 16k.

If this doesn't work for you can always just remove the store_in(MemoryType::Stack) calls in the generated schedule. There's no current way to tell the autoscheduler not to inject them.

0 replies

benzwt · 2020-07-10T02:38:11Z

benzwt
Jul 10, 2020
Author

For a 5k allocation like this, Halide will indeed just use the stack. The threshold where it actually uses heap >storage with stack-lifetime is 16k.
Can the stack-lifetime be set by user ?

0 replies

benzwt · 2020-07-10T02:49:08Z

benzwt
Jul 10, 2020
Author

Is it possible to use alloca() to allocate the stack memory ?
If the alloca() return NULL, then we can return the code with "not enough memory", gracefully.

0 replies

abadams · 2020-07-10T03:21:19Z

abadams
Jul 10, 2020
Maintainer

I don't think alloca ever returns null. It just subtracts from the stack pointer. If that's an overflow, you get a crash.

0 replies

abadams · 2020-07-10T03:22:28Z

abadams
Jul 10, 2020
Maintainer

The man page on linux says "The inlined code often consists of a single instruction adjusting the stack pointer, and does not check for stack overflow. Thus, there is no NULL error return."

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parameters for Adams2019 auto-scheduler #5090

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

parameters for Adams2019 auto-scheduler #5090

benzwt Jul 7, 2020

Replies: 9 comments

abadams Jul 7, 2020 Maintainer

benzwt Jul 8, 2020 Author

abadams Jul 8, 2020 Maintainer

benzwt Jul 9, 2020 Author

abadams Jul 9, 2020 Maintainer

benzwt Jul 10, 2020 Author

benzwt Jul 10, 2020 Author

abadams Jul 10, 2020 Maintainer

abadams Jul 10, 2020 Maintainer

benzwt
Jul 7, 2020

abadams
Jul 7, 2020
Maintainer

benzwt
Jul 8, 2020
Author

abadams
Jul 8, 2020
Maintainer

benzwt
Jul 9, 2020
Author

abadams
Jul 9, 2020
Maintainer

benzwt
Jul 10, 2020
Author

benzwt
Jul 10, 2020
Author

abadams
Jul 10, 2020
Maintainer

abadams
Jul 10, 2020
Maintainer