Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading data to D3D12 PBO from WSL 2 got very slow performance. #12489

Open
1 of 2 tasks
Raymond-Sun opened this issue Jan 22, 2025 · 7 comments
Open
1 of 2 tasks

Uploading data to D3D12 PBO from WSL 2 got very slow performance. #12489

Raymond-Sun opened this issue Jan 22, 2025 · 7 comments
Labels

Comments

@Raymond-Sun
Copy link

Raymond-Sun commented Jan 22, 2025

WslLogs-2025-01-22_16-11-24.zip

Windows Version

10.0.19044.5371

WSL Version

2.3.26.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.167.4-1

Distro Version

OPEN EULER

Other Software

No response

Repro Steps

I uploaded data to a DirectX PBO in WSL 2, and the code is shown below. The performance is very slow; uploading 512 * 512 * 274 * 2 = 143,654,912 bytes took 470 milliseconds. I recorded a kernel trace log and, after analysis, extracted 120 functions with excessively long execution times (the log is attached). There are two scenarios:

After returning to user mode from a page fault interrupt, the execution time of the exit_to_user_mode_prepare function is exceptionally long, ranging from 1 to 10 milliseconds (normally it should be less than 1 microsecond).
During task switching in the idle process scheduler, when the CPU time slice is switched to another process, the execution time of the default_idle function is also exceptionally long, ranging from 1 to 10 milliseconds (normally it should be less than 1 microsecond).
These two scenarios combined account for most of the memcpy time, totaling 430 milliseconds, with the overall copy time being approximately 470–500 milliseconds. Moreover, these kernel activities are serialized and run on different CPU cores.

My WSL 2 environment is as follows:


wsl --version

WSL version: 2.3.26.0

Kernel version: 5.15.167.4-1

WSLg version: 1.0.65

MSRDC version: 1.2.5620

Direct3D version: 1.611.1-81528511

DXCore version: 10.0.26100.1-240331-1435.ge-release

Windows version: 10.0.19044.5371

The code is as follows:

size_t dataSize = 512 * 512 * 274 * 2;

BYTE* data = new BYTE[dataSize];

D3D12_HEAP_PROPERTIES pboHeapProperties = CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_CUSTOM);

pboHeapProperties.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_WRITE_BACK;

pboHeapProperties.MemoryPoolPreference = D3D12_MEMORY_POOL_L0;

// Create PBO (Pixel Buffer Object)

D3D12_RESOURCE_DESC pboResourceDesc = CD3DX12_RESOURCE_DESC::Buffer(dataSize);

ComPtr<ID3D12Resource> pboBuffer;

CHK(device->CreateCommittedResource(

    &pboHeapProperties,

    D3D12_HEAP_FLAG_NONE,

    &pboResourceDesc,

    D3D12_RESOURCE_STATE_COMMON,

    nullptr,

    IID_PPV_ARGS(&pboBuffer)

));

// Upload data to PBO

void* pPBOData;

CHK(pboBuffer->Map(0, nullptr, &pPBOData));

auto start = std::chrono::high_resolution_clock::now();

memcpy(pPBOData, data, dataSize);

auto end = std::chrono::high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

std::cout << "mesa pbo memcpy: " << duration << " ms" << std::endl;

pboBuffer->Unmap(0, nullptr);

delete[] data;

Part of full trace log: exit_to_user_mode_prepare takes 7 ms:

exit_to_user_mode_prepare <-irqentry_exit_to_user_mode
main-8596 [005] ..... 59090.827577: down_read_trylock <-do_user_addr_fault
main-8596 [005] ..... 59090.827577: find_vma <-do_user_addr_fault
main-8596 [005] ..... 59090.827577: vmacache_find <-find_vma
main-8596 [005] ..... 59090.827578: handle_mm_fault <-do_user_addr_fault
main-8596 [005] ..... 59090.827578: mem_cgroup_from_task <-handle_mm_fault
main-8596 [005] d.... 59090.827578: __count_memcg_events <-handle_mm_fault
main-8596 [005] d.... 59090.827578: cgroup_rstat_updated <-__count_memcg_events
main-8596 [005] ..... 59090.827579: rcu_read_unlock_strict <-handle_mm_fault
main-8596 [005] ..... 59090.827579: __handle_mm_fault <-handle_mm_fault
main-8596 [005] ..... 59090.827580: do_huge_pmd_anonymous_page <-__handle_mm_fault
main-8596 [005] ..... 59090.827580: pte_alloc_one <-do_huge_pmd_anonymous_page
main-8596 [005] ..... 59090.827580: __alloc_pages <-pte_alloc_one
main-8596 [005] ..... 59090.827580: should_fail_alloc_page <-__alloc_pages
main-8596 [005] ..... 59090.827581: get_page_from_freelist <-__alloc_pages
main-8596 [005] ..... 59090.827581: kernel_init_free_pages.part.0 <-get_page_from_freelist
main-8596 [005] ..... 59090.827582: __memcg_kmem_charge_page <-__alloc_pages
main-8596 [005] ..... 59090.827582: rcu_read_unlock_strict <-__memcg_kmem_charge_page
main-8596 [005] d.... 59090.827582: __mod_lruvec_page_state <-pte_alloc_one
main-8596 [005] d.... 59090.827582: rcu_read_unlock_strict <-__mod_lruvec_page_state
main-8596 [005] d.... 59090.827582: __mod_node_page_state <-pte_alloc_one
main-8596 [005] ..... 59090.827582: mm_get_huge_zero_page <-do_huge_pmd_anonymous_page
main-8596 [005] ..... 59090.827582: _raw_spin_lock <-do_huge_pmd_anonymous_page
main-8596 [005] ..... 59090.827583: pgtable_trans_huge_deposit <-do_huge_pmd_anonymous_page
main-8596 [005] ..... 59090.827583: up_read <-do_user_addr_fault
main-8596 [005] d.... 59090.827583: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode
main-8596 [005] ..... 59090.834250: down_read_trylock <-do_user_addr_fault
main-8596 [005] ..... 59090.834251: find_vma <-do_user_addr_fault
main-8596 [005] ..... 59090.834251: vmacache_find <-find_vma
main-8596 [005] ..... 59090.834251: handle_mm_fault <-do_user_addr_fault
main-8596 [005] ..... 59090.834251: mem_cgroup_from_task <-handle_mm_fault
main-8596 [005] d.... 59090.834251: __count_memcg_events <-handle_mm_fault
main-8596 [005] d.... 59090.834252: cgroup_rstat_updated <-__count_memcg_events
main-8596 [005] ..... 59090.834252: rcu_read_unlock_strict <-handle_mm_fault
main-8596 [005] ..... 59090.834252: __handle_mm_fault <-handle_mm_fault
main-8596 [005] ..... 59090.834252: do_huge_pmd_anonymous_page <-__handle_mm_fault
main-8596 [005] ..... 59090.834252: pte_alloc_one <-do_huge_pmd_anonymous_page
main-8596 [005] ..... 59090.834252: __alloc_pages <-pte_alloc_one
main-8596 [005] ..... 59090.834252: should_fail_alloc_page <-__alloc_pages
main-8596 [005] ..... 59090.834253: get_page_from_freelist <-__alloc_pages
main-8596 [005] ..... 59090.834253: kernel_init_free_pages.part.0 <-get_page_from_freelist
main-8596 [005] ..... 59090.834253: __memcg_kmem_charge_page <-__alloc_pages

The log of functions with excessively long execution times is as follows:

Line 14596 - main-8596 [005] d.... 59090.531799: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 14597 - main-8596 [005] d.... 59090.536728: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 14963 - -0 [004] d.... 59090.536811: default_idle <-default_idle_call

Line 14964 - main-8596 [005] ..... 59090.538477: down_read_trylock <-do_user_addr_fault

Line 14988 - main-8596 [005] d.... 59090.538480: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 14989 - -0 [000] d.... 59090.541368: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 15380 - -0 [003] d.... 59090.541408: default_idle <-default_idle_call

Line 15381 - main-8596 [005] ..... 59090.545148: down_read_trylock <-do_user_addr_fault

Line 15405 - main-8596 [005] d.... 59090.545154: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 15406 - main-8596 [005] d.... 59090.546740: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 15822 - -0 [002] d.... 59090.546819: default_idle <-default_idle_call

Line 15823 - -0 [000] d.... 59090.551491: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 16239 - main-8596 [005] d.... 59090.551784: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 16240 - -0 [004] d.... 59090.556749: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 16986 - -0 [000] d.... 59090.556811: default_idle <-default_idle_call

Line 16987 - main-8596 [005] ..... 59090.558568: down_read_trylock <-do_user_addr_fault

Line 17011 - main-8596 [005] d.... 59090.558573: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 17012 - -0 [003] d.... 59090.561637: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 17403 - -0 [003] d.... 59090.561680: default_idle <-default_idle_call

Line 17404 - main-8596 [005] ..... 59090.565168: down_read_trylock <-do_user_addr_fault

Line 17429 - main-8596 [005] d.... 59090.565172: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 17430 - main-8596 [005] d.... 59090.566737: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 18087 - -0 [004] d.... 59090.566891: default_idle <-default_idle_call

Line 18088 - -0 [003] d.... 59090.571774: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 18504 - main-8596 [005] d.... 59090.571965: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 18505 - main-8596 [005] d.... 59090.576734: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 18667 - -0 [000] d.... 59090.576790: default_idle <-default_idle_call

Line 18668 - main-8596 [005] ..... 59090.578686: down_read_trylock <-do_user_addr_fault

Line 18692 - main-8596 [005] d.... 59090.578691: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 18693 - -0 [003] d.... 59090.581938: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 19084 - -0 [000] d.... 59090.581979: default_idle <-default_idle_call

Line 19085 - main-8596 [005] ..... 59090.585449: down_read_trylock <-do_user_addr_fault

Line 19109 - main-8596 [005] d.... 59090.585454: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 19110 - main-8596 [005] d.... 59090.586743: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 19544 - -0 [004] d.... 59090.586804: default_idle <-default_idle_call

Line 19545 - -0 [003] d.... 59090.592042: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 19961 - main-8596 [005] d.... 59090.592175: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 19962 - main-8596 [005] d.... 59090.596734: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 20250 - -0 [004] d.... 59090.596831: default_idle <-default_idle_call

Line 20251 - main-8596 [005] ..... 59090.598873: down_read_trylock <-do_user_addr_fault

Line 20275 - main-8596 [005] d.... 59090.598876: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 20276 - -0 [000] d.... 59090.602174: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 20786 - -0 [003] d.... 59090.602217: default_idle <-default_idle_call

Line 20787 - main-8596 [005] ..... 59090.605744: down_read_trylock <-do_user_addr_fault

Line 20964 - main-8596 [005] d.... 59090.606776: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 20965 - -0 [000] d.... 59090.612332: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 21424 - main-8596 [005] d.... 59090.612445: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 21425 - -0 [002] d.... 59090.614087: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 21611 - -0 [002] d.... 59090.614121: default_idle <-default_idle_call

Line 21612 - main-8596 [005] d.... 59090.616729: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 21905 - -0 [004] d.... 59090.616822: default_idle <-default_idle_call

Line 21906 - main-8596 [005] ..... 59090.619251: down_read_trylock <-do_user_addr_fault

Line 21930 - main-8596 [005] d.... 59090.619257: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 21931 - -0 [000] d.... 59090.622507: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 22341 - -0 [003] d.... 59090.622554: default_idle <-default_idle_call

Line 22342 - -0 [000] d.... 59090.625236: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 22936 - main-8596 [005] d.... 59090.626742: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 22937 - -0 [001] d.... 59090.632274: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 23587 - main-8596 [005] d.... 59090.632729: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 23588 - main-8596 [005] d.... 59090.636762: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 23659 - main-8596 [005] d.... 59090.636773: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 23660 - main-8596 [005] ..... 59090.639431: down_read_trylock <-do_user_addr_fault

Line 23684 - main-8596 [005] d.... 59090.639435: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 23685 - -0 [003] d.... 59090.642740: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 24185 - -0 [000] d.... 59090.642808: default_idle <-default_idle_call

Line 24186 - main-8596 [005] ..... 59090.646261: down_read_trylock <-do_user_addr_fault

Line 24282 - main-8596 [005] d.... 59090.646790: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 24283 - -0 [003] d.... 59090.652844: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 24742 - main-8596 [005] d.... 59090.652964: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 24743 - main-8596 [005] d.... 59090.656738: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 24814 - main-8596 [005] d.... 59090.656750: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 24815 - main-8596 [005] ..... 59090.659599: down_read_trylock <-do_user_addr_fault

Line 24839 - main-8596 [005] d.... 59090.659604: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 24840 - -0 [003] d.... 59090.662976: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 25274 - -0 [000] d.... 59090.663091: default_idle <-default_idle_call

Line 25275 - main-8596 [005] ..... 59090.666356: down_read_trylock <-do_user_addr_fault

Line 25371 - main-8596 [005] d.... 59090.666748: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 25372 - main-8596 [005] ..... 59090.673042: down_read_trylock <-do_user_addr_fault

Line 25831 - -0 [000] d.... 59090.673247: default_idle <-default_idle_call

Line 25832 - main-8596 [005] d.... 59090.676738: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 25903 - main-8596 [005] d.... 59090.676749: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 25904 - main-8596 [005] ..... 59090.679784: down_read_trylock <-do_user_addr_fault

Line 25928 - main-8596 [005] d.... 59090.679789: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 25929 - -0 [003] d.... 59090.683300: irq_enter_rcu <-sysvec_hyperv_stimer0

Line 26429 - -0 [000] d.... 59090.683411: default_idle <-default_idle_call

Line 26430 - main-8596 [005] ..... 59090.686380: down_read_trylock <-do_user_addr_fault

Line 26527 - main-8596 [005] d.... 59090.686742: exit_to_user_mode_prepare <-irqentry_exit_to_user_mode

Line 26528 - main-8596 [005] ..... 59090.693073: down_read_trylock <-do_user_addr_fault

Line 26987 - -0 [000] d.... 59090.693533: default_idle <-default_idle_call

Line 26988 - main-8596 [005] d....

Expected Behavior

upload takes about 50 ms

Actual Behavior

upload takes more than 450 ms

Diagnostic Logs

No response

Copy link

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.
Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging
If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

Copy link

Failed to parse logs. Unexpected file: full_kernel_trace.txt
The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.

Diagnostic information
Issue was edited and new log file was found: https://github.com/user-attachments/files/18502653/full_kernel_trace.zip
Multiple log files found, using: https://github.com/user-attachments/files/18502653/full_kernel_trace.zip
Found no WSL traces in the logs

Copy link

Failed to parse logs. Unexpected file: full_kernel_trace.txt
The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.

Diagnostic information
Found no WSL traces in the logs

Copy link

Diagnostic information
Issue was edited and new log file was found: https://github.com/user-attachments/files/18502739/WslLogs-2025-01-22_16-11-24.zip
.wslconfig found
	Custom kernel command line found: 'hv_netvsc.ring_size=128'
Detected appx version: 2.3.26.0

@Raymond-Sun
Copy link
Author

This is the WSL log

WslLogs-2025-01-22_16-11-24.zip

Copy link

Diagnostic information
.wslconfig found
	Custom kernel command line found: 'hv_netvsc.ring_size=128'
Detected appx version: 2.3.26.0

@Raymond-Sun Raymond-Sun changed the title Uploading data to D3D12 PBO from WSL 2 get very slow performance. Uploading data to D3D12 PBO from WSL 2 got very slow performance. Jan 22, 2025
Copy link

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
.wslconfig found
	Custom kernel command line found: 'hv_netvsc.ring_size=128'
Detected appx version: 2.3.26.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants