ECMAScript proposal: s
(dotAll
) flag for regular expressions
Status
This proposal is at stage 4 of the TC39 process.
Motivation
In regular expression patterns, the dot .
matches a single character, regardless of what character it is. In ECMAScript, there are two exceptions to this:
.
doesn’t match astral characters. Setting theu
(unicode
) flag fixes that..
doesn’t match line terminator characters.
ECMAScript recognizes the following line terminator characters:
- U+000A LINE FEED (LF) (
\n
) - U+000D CARRIAGE RETURN (CR) (
\r
) - U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
However, there are more characters that, depending on the use case, could be considered as newline characters:
- U+000B VERTICAL TAB (
\v
) - U+000C FORM FEED (
\f
) - U+0085 NEXT LINE
This makes the current behavior of .
problematic:
- By design, it excludes some newline characters, but not all of them, which often does not match the developer’s use case.
- It’s commonly used to match any character, which it doesn’t do.
The proposal you’re looking at right now addresses the latter issue.
Developers wishing to truly match any character, including these line terminator characters, cannot use .
:
/foo.bar/.test('foo\nbar');
// → false
Instead, developers have to resort to cryptic workarounds like [\s\S]
or [^]
:
/foo[^]bar/.test('foo\nbar');
// → true
Since the need to match any character is quite common, other regular expression engines support a mode in which .
matches any character, including line terminators.
- Engines that support constants to enable regular expression flags implement
DOTALL
orSINGLELINE
/s
modifiers. - Engines that support embedded flag expressions implement
(?s)
. - Engines that support regular expression flags implement the flag
s
.
Note the established tradition of naming these modifiers s
(short for singleline
) and dotAll
.
One exception is Ruby, where the m
flag (Regexp::MULTILINE
) also enables dotAll
mode. Unfortunately, we cannot do the same thing for the m
flag in JavaScript without breaking backwards compatibility.
Proposed solution
We propose the addition of a new s
flag for ECMAScript regular expressions that makes .
match any character, including line terminators.
/foo.bar/s.test('foo\nbar');
// → true
High-level API
const re = /foo.bar/s; // Or, `const re = new RegExp('foo.bar', 's');`.
re.test('foo\nbar');
// → true
re.dotAll
// → true
re.flags
// → 's'
FAQ
What about backwards compatibility?
The meaning of existing regular expression patterns isn’t affected by this proposal since the new s
flag is required to opt-in to the new behavior.
How does dotAll
mode affect multiline
mode?
This question might come up since the s
flag stands for singleline
, which seems to contradict m
/ multiline
— except it doesn’t. This is a bit unfortunate, but we’re just following the established naming tradition in other regular expression engines. Picking any other flag name would only cause more confusion. The accessor name dotAll
gives a much better description of the flag’s effect. For this reason, we recommend referring to this mode as dotAll
mode rather than singleline
mode.
Both modes are independent and can be combined. multiline
mode only affects anchors, and dotAll
mode only affects .
.
When both the s
(dotAll
) and m
(multiline
) flags are set, .
matches any character while still allowing ^
and $
to match, respectively, just after and just before line terminators within the string.
Specification
Implementations
- V8, shipping in Chrome 62
- JavaScriptCore, shipping in Safari Technology Preview 39a
- XS, shipping in Moddable as of the January 17, 2018 update
- regexpu (transpiler) with the
{ dotAllFlag: true }
option enabled - Compat-transpiler of RegExp Tree