Perl's PERL_UNICODE and failing tests

Increasingly I'm seeing reports of modules failing (e.g., t/test_autoencoding_conversion.t fails and Gofer test failures with PERL_UNICODE=AS and PERL_UNICODE and smokes) when PERL_UNICODE is set - I see others saying much the same thing and apparently even Perl itself might not build with PERL_UNICODE set (see Installing Latest Perl on OS X).

I could have this wrong but it seems like ever since Tom Christiansen posted a reply on stackoverflow at Why does modern Perl avoid UTF-8 by default? lots of people have read it and are suddenly setting PERL_UNICODE=AS. As Tom says "AS" does

"This makes all Perl scripts decode @ARGV as UTF‑8 strings, and sets the encoding of all three of stdin, stdout, and stderr to UTF‑8. Both these are global effects, not lexical ones."

So did we have tons of code which did not work before for which setting PERL_UNICODE is a magic bullet? no, of course not. In fact, setting PERL_UNICODE in certain ways (like "AS") when a module has not explicitly set the encoding layer could stop it working e.g., the DBI link above where the error is "Error thawing: Frozen string corrupt - contains characters outside 0-255". So

  • was Tom's posting so long, detailed and thorough that everyone who saw it thought this must be something I should do too? when the one thing that was not mentioned was that if you do that without thinking you could actually break code which currently works.
  • were people not taking it in the context of the stackoverflow question
  • something else
  • Is this an opportunity to fix up code which does not explicitly set the encoding and hence can break when PERL_UNICODE is set or is it just creating rts for modules which quite happily work now when PERL_UNICODE is not set?

Comments

the road to RT#71341

I've done it because it annoyed me since years that STDOUT doesn't autodetect the encoding of my shell. All major operating systems (even Windows!) have UTF-8 shells since almost (or over?) a decade! This sounded to me like a fix for the imho stoneage default.