Skip to content

Fix split when a nbsp character is present#1073

Merged
simonbengtsson merged 3 commits intosimonbengtsson:masterfrom
FredTreg:fix-nbsp-split
Oct 15, 2024
Merged

Fix split when a nbsp character is present#1073
simonbengtsson merged 3 commits intosimonbengtsson:masterfrom
FredTreg:fix-nbsp-split

Conversation

@FredTreg
Copy link
Copy Markdown
Contributor

@FredTreg FredTreg commented Oct 14, 2024

When computing the minReadableWidth of a cell, the code does not account for non-breaking space characters (also known as nbsp or \u00A0).

nbsp characters should be treated as non-space characters when calculating the string length, as this is the intended function of such characters.

Failing to do so results in suboptimal output, particularly for languages like French, where punctuation marks such as colons (:) are always preceded by a space and should remain on the same line as the preceding word.

This PR fixes that issue.

@umaganesan
Copy link
Copy Markdown

umaganesan commented Oct 14, 2024 via email

@simonbengtsson
Copy link
Copy Markdown
Owner

Interesting! Based on my very limited understanding shouldn't a cell with the content d'où viens-tu ? be considered two words then? Ie wouldn't it be better to calculate the "longestWordWidth" based on the entire viens-tu ? instead of viens-tu and "?" separately?

@FredTreg
Copy link
Copy Markdown
Contributor Author

FredTreg commented Oct 14, 2024

This is exactly the goal of this fix, the French would be pre-processed as viens-tu\u00A0?, so with my fix it would be one word only.
Another example is with currencies (still in French), where we would write 12 € preprocessed as 12\u00A0€. Without the fix it would be appear as 2 lines, with the fix it stays on one line

@simonbengtsson
Copy link
Copy Markdown
Owner

Got it! I thought since \s does not match non breaking spaces as far as I understand it would work with this. But I'll try it tomorrow considering you have experienced issues with it.

@FredTreg
Copy link
Copy Markdown
Contributor Author

since \s does not match non breaking spaces

I do not know about that, I did the following on the Chrome console:

> const words = 'two\u00A0words'
  words.split(/\s+/)
  > (2) ['two', 'words']

So it does split on nbsp and the character is indeed listed as whitespace on MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes

@simonbengtsson
Copy link
Copy Markdown
Owner

simonbengtsson commented Oct 15, 2024

You are right. I tried with a regex tester yesterday, but apparently incorrectly. Can you add a comment or a named variable for the regex? Then I'll merge promptly.

@FredTreg
Copy link
Copy Markdown
Contributor Author

FredTreg commented Oct 15, 2024

Done!

@simonbengtsson simonbengtsson merged commit 70b66de into simonbengtsson:master Oct 15, 2024
@simonbengtsson
Copy link
Copy Markdown
Owner

Thanks! Merged and released in v3.8.4

@FredTreg
Copy link
Copy Markdown
Contributor Author

Thank you, and thanks for making it easy to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants