perl5240delta - what is new for perl v5.24.0



NAME

perl5240delta - what is new for perl v5.24.0


DESCRIPTION

This document describes the differences between the 5.22.0 release and the 5.24.0 release.


Core Enhancements

Postfix dereferencing is no longer experimental

Using the postderef and postderef_qq features no longer emits a warning. Existing code that disables the experimental::postderef warning category that they previously used will continue to work. The postderef feature has no effect; all Perl code can use postfix dereferencing, regardless of what feature declarations are in scope. The 5.24 feature bundle now includes the postderef_qq feature.

Unicode 8.0 is now supported

For details on what is in this release, see http://www.unicode.org/versions/Unicode8.0.0/.

perl will now croak when closing an in-place output file fails

Until now, failure to close the output file for an in-place edit was not detected, meaning that the input file could be clobbered without the edit being successfully completed. Now, when the output file cannot be closed successfully, an exception is raised.

New \b{lb} boundary in regular expressions

lb stands for Line Break. It is a Unicode property that determines where a line of text is suitable to break (typically so that it can be output without overflowing the available horizontal space). This capability has long been furnished by the the Unicode::LineBreak manpage module, but now a light-weight, non-customizable version that is suitable for many purposes is in core Perl.

qr/(?[ ])/ now works in UTF-8 locales

Extended Bracketed Character Classes now will successfully compile when use locale is in effect. The compiled pattern will use standard Unicode rules. If the runtime locale is not a UTF-8 one, a warning is raised and standard Unicode rules are used anyway. No tainting is done since the outcome does not actually depend on the locale.

Integer shift (<< and >>) now more explicitly defined

Negative shifts are reverse shifts: left shift becomes right shift, and right shift becomes left shift.

Shifting by the number of bits in a native integer (or more) is zero, except when the ``overshift'' is right shifting a negative value under use integer, in which case the result is -1 (arithmetic shift).

Until now negative shifting and overshifting have been undefined because they have relied on whatever the C implementation happens to do. For example, for the overshift a common C behavior is ``modulo shift'':

  1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1  # Common C behavior.
  # And the same for <<, while Perl now produces 0 for both.

Now these behaviors are well-defined under Perl, regardless of what the underlying C implementation does. Note, however, that you are still constrained by the native integer width: you need to know how far left you can go. You can use for example:

  use Config;
  my $wordbits = $Config{uvsize} * 8;  # Or $Config{uvsize} << 3.

If you need a more bits on the left shift, you can use for example the bigint pragma, or the Bit::Vector module from CPAN.

printf and sprintf now allow reordered precision arguments

That is, sprintf '|%.*2$d|', 2, 3 now returns |002|. This extends the existing reordering mechanism (which allows reordering for arguments that are used as format fields, widths, and vector separators).

More fields provided to sigaction callback with SA_SIGINFO

When passing the SA_SIGINFO flag to sigaction, the errno, status, uid, pid, addr and band fields are now included in the hash passed to the handler, if supported by the platform.

Hashbang redirection to Perl 6

Previously perl would redirect to another interpreter if it found a hashbang path unless the path contains ``perl'' (see the perlrun manpage). To improve compatibility with Perl 6 this behavior has been extended to also redirect if ``perl'' is followed by ``6''.


Security

Set proper umask before calling mkstemp(3)

In 5.22 perl started setting umask to 0600 before calling mkstemp(3) and restoring it afterwards. This wrongfully tells open(2) to strip the owner read and write bits from the given mode before applying it, rather than the intended negation of leaving only those bits in place.

Systems that use mode 0666 in mkstemp(3) (like old versions of glibc) create a file with permissions 0066, leaving world read and write permissions regardless of current umask.

This has been fixed by using umask 0177 instead. [perl #127322]

Fix out of boundary access in Win32 path handling

This is CVE-2015-8608. For more information see [perl #126755]

Fix loss of taint in canonpath

This is CVE-2015-8607. For more information see [perl #126862]

Avoid accessing uninitialized memory in win32 crypt()

Added validation that will detect both a short salt and invalid characters in the salt. [perl #126922]

Remove duplicate environment variables from environ

Previously, if an environment variable appeared more than once in environ[], %ENV would contain the last entry for that name, while a typical getenv() would return the first entry. We now make sure %ENV contains the same as what getenv returns.

Second, we remove duplicates from environ[], so if a setting with that name is set in %ENV, we won't pass an unsafe value to a child process.

[CVE-2016-2381]


Incompatible Changes

The autoderef feature has been removed

The experimental autoderef feature (which allowed calling push, pop, shift, unshift, splice, keys, values, and each on a scalar argument) has been deemed unsuccessful. It has now been removed; trying to use the feature (or to disable the experimental::autoderef warning it previously triggered) now yields an exception.

Lexical $_ has been removed

my $_ was introduced in Perl 5.10, and subsequently caused much confusion with no obvious solution. In Perl 5.18.0, it was made experimental on the theory that it would either be removed or redesigned in a less confusing (but backward-incompatible) way. Over the following years, no alternatives were proposed. The feature has now been removed and will fail to compile.

qr/\b{wb}/ is now tailored to Perl expectations

This is now more suited to be a drop-in replacement for plain \b, but giving better results for parsing natural language. Previously it strictly followed the current Unicode rules which calls for it to match between each white space character. Now it doesn't generally match within spans of white space, behaving like \b does. See \b{wb} in the perlrebackslash manpage

Regular expression compilation errors

Some regular expression patterns that had runtime errors now don't compile at all.

Almost all Unicode properties using the \p{} and \P{} regular expression pattern constructs are now checked for validity at pattern compilation time, and invalid ones will cause the program to not compile. In earlier releases, this check was often deferred until run time. Whenever an error check is moved from run- to compile time, erroneous code is caught 100% of the time, whereas before it would only get caught if and when the offending portion actually gets executed, which for unreachable code might be never.

qr/\N{}/ now disallowed under use re "strict"

An empty \N{} makes no sense, but for backwards compatibility is accepted as doing nothing, though a deprecation warning is raised by default. But now this is a fatal error under the experimental feature 'strict' mode in the re manpage.

Nested declarations are now disallowed

A my, our, or state declaration is no longer allowed inside of another my, our, or state declaration.

For example, these are now fatal:

   my ($x, my($y));
   our (my $x);

[perl #125587]

[perl #121058]

The /\C/ character class has been removed.

This regular expression character class was deprecated in v5.20.0 and has produced a deprecation warning since v5.22.0. It is now a compile-time error. If you need to examine the individual bytes that make up a UTF8-encoded character, then use utf8::encode() on the string (or a copy) first.

chdir('') no longer chdirs home

Using chdir('') or chdir(undef) to chdir home has been deprecated since perl v5.8, and will now fail. Use chdir() instead.

ASCII characters in variable names must now be all visible

It was legal until now on ASCII platforms for variable names to contain non-graphical ASCII control characters (ordinals 0 through 31, and 127, which are the C0 controls and DELETE). This usage has been deprecated since v5.20, and as of now causes a syntax error. The variables these names referred to are special, reserved by Perl for whatever use it may choose, now, or in the future. Each such variable has an alternative way of spelling it. Instead of the single non-graphic control character, a two character sequence beginning with a caret is used, like $^] and ${^GLOBAL_PHASE}. Details are at the perlvar manpage. It remains legal, though unwise and deprecated (raising a deprecation warning), to use certain non-graphic non-ASCII characters in variables names when not under use utf8. No code should do this, as all such variables are reserved by Perl, and Perl doesn't currently define any of them (but could at any time, without notice).

An off by one issue in $Carp::MaxArgNums has been fixed

$Carp::MaxArgNums is supposed to be the number of arguments to display. Prior to this version, it was instead showing $Carp::MaxArgNums + 1 arguments, contrary to the documentation.

Only blanks and tabs are now allowed within [...] within (?[...]).

The experimental Extended Bracketed Character Classes can contain regular bracketed character classes within them. These differ from regular ones in that white space is generally ignored, unless escaped by preceding it with a backslash. The white space that is ignored is now limited to just tab \t and SPACE characters. Previously, it was any white space. See Extended Bracketed Character Classes in the perlrecharclass manpage.


Deprecations

Using code points above the platform's IV_MAX is now deprecated

Unicode defines code points in the range 0..0x10FFFF. Some standards at one time defined them up to 2**31 - 1, but Perl has allowed them to be as high as anything that will fit in a word on the platform being used. However, use of those above the platform's IV_MAX is broken in some constructs, notably tr///, regular expression patterns involving quantifiers, and in some arithmetic and comparison operations, such as being the upper limit of a loop. Now the use of such code points raises a deprecation warning, unless that warning category is turned off. IV_MAX is typically 2**31 -1 on 32-bit platforms, and 2**63-1 on 64-bit ones.

Doing bitwise operations on strings containing code points above 0xFF is deprecated

The string bitwise operators treat their operands as strings of bytes, and values beyond 0xFF are nonsensical in this context. To operate on encoded bytes, first encode the strings. To operate on code points' numeric values, use split and map ord. In the future, this warning will be replaced by an exception.

sysread(), syswrite(), recv() and send() are deprecated on :utf8 handles

The sysread(), recv(), syswrite() and send() operators are deprecated on handles that have the :utf8 layer, either explicitly, or implicitly, eg., with the :encoding(UTF-16LE) layer.

Both sysread() and recv() currently use only the :utf8 flag for the stream, ignoring the actual layers. Since sysread() and recv() do no UTF-8 validation they can end up creating invalidly encoded scalars.

Similarly, syswrite() and send() use only the :utf8 flag, otherwise ignoring any layers. If the flag is set, both write the value UTF-8 encoded, even if the layer is some different encoding, such as the example above.

Ideally, all of these operators would completely ignore the :utf8 state, working only with bytes, but this would result in silently breaking existing code. To avoid this a future version of perl will throw an exception when any of sysread(), recv(), syswrite() or send() are called on handle with the :utf8 layer.


Performance Enhancements


Modules and Pragmata

Updated Modules and Pragmata


Documentation

Changes to Existing Documentation

the perlapi manpage

the perlcall manpage

the perlfunc manpage

the perlguts manpage

the perllocale manpage

the perlmodlib manpage

the perlop manpage

the perlpolicy manpage

the perlreftut manpage

the perlrebackslash manpage

the perlsub manpage

the perlsyn manpage

the perltie manpage

the perlunicode manpage

the perlvar manpage

the perlxs manpage


Diagnostics

The following additions or changes have been made to diagnostic output, including warnings and fatal error messages. For the complete list of diagnostic messages, see the perldiag manpage.

New Diagnostics

New Errors

New Warnings

Changes to Existing Diagnostics


Configuration and Compilation


Testing


Platform Support

Platform-Specific Notes

AmigaOS
Cygwin
EBCDIC
UTF-EBCDIC extended
UTF-EBCDIC is like UTF-8, but for EBCDIC platforms. It now has been extended so that it can represent code points up to 2 ** 64 - 1 on platforms with 64-bit words. This brings it into parity with UTF-8. This enhancement requires an incompatible change to the representation of code points in the range 2 ** 30 to 2 ** 31 -1 (the latter was the previous maximum representable code point). This means that a file that contains one of these code points, written out with previous versions of perl cannot be read in, without conversion, by a perl containing this change. We do not believe any such files are in existence, but if you do have one, submit a ticket at perlbug@perl.org, and we will write a conversion script for you.

EBCDIC cmp() and sort() fixed for UTF-EBCDIC strings
Comparing two strings that were both encoded in UTF-8 (or more precisely, UTF-EBCDIC) did not work properly until now. Since sort() uses cmp(), this fixes that as well.

EBCDIC tr/// and y/// fixed for \N{}, and use utf8 ranges
Perl v5.22 introduced the concept of portable ranges to regular expression patterns. A portable range matches the same set of characters no matter what platform is being run on. This concept is now extended to tr///. See tr///|perlop/tr/SEARCHLIST/REPLACEMENTLIST/cdsr.

There were also some problems with these operations under use utf8, which are now fixed

FreeBSD
IRIX
MacOS X
Solaris
Tru64
VMS
Win32
ppc64el
floating point
The floating point format of ppc64el (Debian naming for little-endian PowerPC) is now detected correctly.


Internal Changes


Selected Bug Fixes


Acknowledgements

Perl 5.24.0 represents approximately 11 months of development since Perl 5.24.0 and contains approximately 360,000 lines of changes across 1,800 files from 75 authors.

Excluding auto-generated files, documentation and release tools, there were approximately 250,000 lines of changes to 1,200 .pm, .t, .c and .h files.

Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.24.0:

Aaron Crane, Aaron Priven, Abigail, Achim Gratz, Alexander D'Archangel, Alex Vandiver, Andreas König, Andy Broad, Andy Dougherty, Aristotle Pagaltzis, Chase Whitener, Chas. Owens, Chris 'BinGOs' Williams, Craig A. Berry, Dagfinn Ilmari Mannsåker, Dan Collins, Daniel Dragan, David Golden, David Mitchell, Doug Bell, Dr.Ruud, Ed Avis, Ed J, Father Chrysostomos, Herbert Breunung, H.Merijn Brand, Hugo van der Sanden, Ivan Pozdeev, James E Keenan, Jan Dubois, Jarkko Hietaniemi, Jerry D. Hedden, Jim Cromie, John Peacock, John SJ Anderson, Karen Etheridge, Karl Williamson, kmx, Leon Timmermans, Ludovic E. R. Tolhurst-Cleaver, Lukas Mai, Martijn Lievaart, Matthew Horsfall, Mattia Barbon, Max Maischein, Mohammed El-Afifi, Nicholas Clark, Nicolas R., Niko Tyni, Peter John Acklam, Peter Martini, Peter Rabbitson, Pip Cet, Rafael Garcia-Suarez, Reini Urban, Ricardo Signes, Sawyer X, Shlomi Fish, Sisyphus, Stanislaw Pusep, Steffen Müller, Stevan Little, Steve Hay, Sullivan Beck, Thomas Sibley, Todd Rinaldo, Tom Hukins, Tony Cook, Unicode Consortium, Victor Adam, Vincent Pit, Vladimir Timofeev, Yves Orton, Zachary Storer, Zefram.

The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.

Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.

For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.


Reporting Bugs

If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at https://rt.perl.org/ . There may also be information at http://www.perl.org/ , the Perl Home Page.

If you believe you have an unreported bug, please run the perlbug program included with your release. Be sure to trim your bug down to a tiny but sufficient test case. Your bug report, along with the output of perl -V, will be sent off to perlbug@perl.org to be analysed by the Perl porting team.

If the bug you are reporting has security implications which make it inappropriate to send to a publicly archived mailing list, then see SECURITY VULNERABILITY CONTACT INFORMATION in the perlsec manpage for details of how to report the issue.


SEE ALSO

The Changes file for an explanation of how to view exhaustive details on what changed.

The INSTALL file for how to build Perl.

The README file for general stuff.

The Artistic and Copying files for copyright information.

 perl5240delta - what is new for perl v5.24.0