Replies: 9 comments
-
That autoscheduler doesn't actually pay attention to machine_params except for the number of cores / parallelism term. It also uses the target to pick a good vector width. So setting the target accurately for your target machine (i.e. not "host") and setting the number of cores accurately in machine_params should be enough. If you want to tailor to a specific machine even more you can autotune on that machine (see apps/autoscheduler/autotune_loop.sh) |
Beta Was this translation helpful? Give feedback.
-
thanks for the responsive reply. I have a few more questions to ask.
|
Beta Was this translation helpful? Give feedback.
-
The things that influence it are the target (e.g. x86-64-sse41), the machine params (just the parallelism parameter), and the estimates specified in the source. Changing those might change the tile size. It doesn't matter what machine you run the compiler on (unless you use the target "host"). store_in(Stack) is a bit of a misnomer. It means the storage is stack-scoped. So it's allocated once at function entry and freed at function exit, rather than being allocated and freed inside the compute_at location. The actual allocation will on the heap though, if it's more than a few kb. |
Beta Was this translation helpful? Give feedback.
-
I add c_source to flag -e to obtain a copy of source code.
|
Beta Was this translation helpful? Give feedback.
-
No, that's not the source to be compiled. Halide compiles directly to machine code, and doesn't go via C source code. The generated C source is an alternative output for situations where you need equivalentish C code. For a 5k allocation like this, Halide will indeed just use the stack. The threshold where it actually uses heap storage with stack-lifetime is 16k. If this doesn't work for you can always just remove the store_in(MemoryType::Stack) calls in the generated schedule. There's no current way to tell the autoscheduler not to inject them. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Is it possible to use alloca() to allocate the stack memory ? |
Beta Was this translation helpful? Give feedback.
-
I don't think alloca ever returns null. It just subtracts from the stack pointer. If that's an overflow, you get a crash. |
Beta Was this translation helpful? Give feedback.
-
The man page on linux says "The inlined code often consists of a single instruction adjusting the stack pointer, and does not check for stack overflow. Thus, there is no NULL error return." |
Beta Was this translation helpful? Give feedback.
-
Are there any other options I can provide for Adams2019 auto_scheduler, other than machine_params ?
I want to build a well scheduled functions for a specify platform which different from my host machine.
To my understanding, the result of auto-scheduler for machines differ for different machine(such as L1/L2 cache size and etc).
Can I specify some fine-grain machine parameters for the auto-scheduler such as L1/L2 size ?
Or is there any good way to obtain a scheduling without running auto-scheduler for every machine with different specifications ?
Beta Was this translation helpful? Give feedback.
All reactions