Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mb_convert_encoding PHP 8.2 deprecation #52

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shvlv
Copy link
Contributor

@shvlv shvlv commented Feb 27, 2024

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Fix.

What is the current behavior? (You can also link to an open issue here)
I added minor improvements on the top of @nlemoine's PR #49. It was inspired by very good SO answer - https://stackoverflow.com/a/8218649/13962131 and we tested same change for several projects already where we need HTML processing.

So the problem is (https://www.php.net/manual/en/function.mb-convert-encoding.php):

from 8.2.0 mb_convert_encoding() will no longer return the following non text encodings: "Base64", "QPrint", "UUencode", "HTML entities", "7 bit" and "8 bit".

What is the new behavior (if this is a feature change)?

The right way to do things is mb_encode_numericentity.

DOMDocument::loadHTML loads ISO-8859-1 by default. So for map argument, we choose:

  • start_code - 128 (0x80). Anything above ISO-8859-1 that supports 0-127.
  • end_code - I think using the maximum Unicode value U+10FFFF is better.
  • offset - 0 as we don't need to change code
  • mask - we don't need to change anything so that we can repeat the end_code value. But I like the idea of ~0 that produces the maximal byte for your system. i.e. 1111111111111111111111111111111111111111111111111111111111111111 for a 64bit system. It guarantees the mask never changes its actual value.

Regarding html_entity_decode, I think it's an important addition because we can leave HTML entities for HTML content, and the browser handles them as expected. But we don't want to break Unicode characters inside JS. So if the user has var title = 'Lösungen', changing to var = 'Lösungen' is not good.

Also, tests are added, including the character from the Unicode top plate - https://util.unicode.org/UnicodeJsps/character.jsp?a=2000B. I.e. 𠀋

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No.

Other information:

@shvlv shvlv requested a review from Chrico February 27, 2024 15:05
@nlemoine
Copy link
Contributor

Thanks you @shvlv, this solves PHP 8.2 deprecations.

However, I stumbled upon another bug today. Here's the context.

I'm using Vite to build my assets (with a custom Vite loader). In dev mode, I have to include vite client to handle hot reload, etc.

The default URL for vite server is http://[::1]:5173/path/to/build/@vite/client. When this goes through DomDocument, the URL ends up encoded like so:

"http://%5B::1%5D:5173/path/to/build/@vite/client

html_entity_decode doesn't take effect here. I couldn't find a solution to this issue.

I was wondering if instead of DomDocument (which has a lot of quirks and can be heavy on memory) and since the only manipulations achieved here targets attributes, you could leverage the new Tag processor API introduced in 6.2?

Of course, that would mean a WP 6.2 requirement on the other hand.

6.3 also introduced native support for defer and async.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.93%. Comparing base (391e140) to head (268f66f).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master      #52      +/-   ##
============================================
- Coverage     91.02%   90.93%   -0.10%     
+ Complexity      282      281       -1     
============================================
  Files            21       21              
  Lines           836      827       -9     
============================================
- Hits            761      752       -9     
  Misses           75       75              
Flag Coverage Δ
unittests 90.93% <100.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shvlv
Copy link
Contributor Author

shvlv commented Mar 18, 2024

After internal discussion, the new approach with WordPress HTML API was implemented. The following points should be discussed additionally:

  1. Do we consider the current change as a breaking change, or will triggering deprecation be enough?
  2. We need integration tests for this change. I added database-less tests as PoC. Maybe we can consider a different approach here.

@nlemoine
Copy link
Contributor

nlemoine commented Apr 8, 2024

Quick note: from 6.5, there is now a wp_enqueue_script_module function.

@Chrico Chrico linked an issue May 10, 2024 that may be closed by this pull request
1 task
@nlemoine
Copy link
Contributor

@shvlv Friendly bump, any progress on this?

@shvlv
Copy link
Contributor Author

shvlv commented Aug 8, 2024

@nlemoine Sorry for the delay, we have a vacation time 😄 will proceed as soon as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: remove deprecation warning
2 participants