Regular Expression Behaviours

The .NET Framework regular expression engine is a backtracking regular expression matcher that incorporates a traditional Nondeterministic Finite Automaton (NFA) engine such as that used by Perl, Python, Emacs, and Tcl. This distinguishes it from faster, but more limited, pure regular expression Deterministic Finite Automaton (DFA) engines such as those found in awk, egrep, or lex. This also distinguishes it from standardized, but slower, POSIX NFAs. The following section describes the three types of regular expression engines, and explains why regular expressions in the .NET Framework are implemented by using a traditional NFA engine.

Benefits of the NFA Engine

When DFA engines perform pattern matching, their processing order is driven by the input string. The engine begins at the beginning of the input string and proceeds sequentially to determine whether the next character matches the regular expression pattern. They can guarantee to match the longest string possible. Because they never test the same character twice, DFA engines do not support backtracking. However, because a DFA engine contains only finite state, it cannot match a pattern with backreferences, and because it does not construct an explicit expansion, it cannot capture subexpressions.

Unlike DFA engines, when traditional NFA engines perform pattern matching, their processing order is driven by the regular expression pattern. As it processes a particular language element, the engine uses greedy matching; that is, it matches as much of the input string as it possibly can. But it also saves its state after successfully matching a subexpression. If a match eventually fails, the engine can return to a saved state so it can try additional matches. This process of abandoning a successful subexpression match so that later language elements in the regular expression can also match is known as backtracking. NFA engines use backtracking to test all possible expansions of a regular expression in a specific order and accept the first match. Because a traditional NFA engine constructs a specific expansion of the regular expression for a successful match, it can capture subexpression matches and matching backreferences. However, because a traditional NFA backtracks, it can visit the same state multiple times if it arrives at the state over different paths. As a result, it can run exponentially slowly in the worst case. Because a traditional NFA engine accepts the first match it finds, it can also leave other (possibly longer) matches undiscovered.

POSIX NFA engines are like traditional NFA engines, except that they continue to backtrack until they can guarantee that they have found the longest match possible. As a result, a POSIX NFA engine is slower than a traditional NFA engine, and when you use a POSIX NFA engine, you cannot favor a shorter match over a longer one by changing the order of the backtracking search.

Traditional NFA engines are favored by programmers because they offer greater control over string matching than either DFA or POSIX NFA engines. Although, in the worst case, they can run slowly, you can steer them to find matches in linear or polynomial time by using patterns that reduce ambiguities and limit backtracking. In other words, although NFA engines trade performance for power and flexibility, in most cases they offer good to acceptable performance if a regular expression is well-written and avoids cases in which backtracking degrades performance exponentially.

.NET Framework Engine Capabilities

To take advantage of the benefits of a traditional NFA engine, the .NET Framework regular expression engine includes a complete set of constructs to enable programmers to steer the backtracking engine. These constructs can be used to find matches faster or to favor specific expansions over others.

Other features of the .NET Framework regular expression engine include the following:

Lazy quantifiers: ??, *?, +?, {n,m}?. These constructs tell the backtracking engine to search the minimum number of repetitions first. In contrast, ordinary greedy quantifiers try to match the maximum number of repetitions first.
Positive lookahead: (?=subexpression). This feature allows the backtracking engine to return to the same spot in the text after matching a subexpression. It is useful for searching throughout the text by verifying multiple patterns that start from the same position. It also allows the engine to verify that a substring exists at the end of the match without including the substring in the matched text.
Negative lookahead: (?!subexpression). This feature adds the ability to match an expression only if a subexpression fails to match. This is particularly powerful for pruning a search, because it is often simpler to provide an expression for a case that should be eliminated than an expression for cases that must be included. For example, it is difficult to write an expression for words that do not begin with "non".
Conditional evaluation: (?(expression)yes|no) and (?(name)yes|no), where expression is a subexpression to match, name is the name of a capturing group, yes is the string to match if expression is matched or name is a valid, non-empty captured group, and no is the subexpression to match if expression is not matched or name is not a valid, non-empty captured group. This feature allows the engine to search by using more than one alternate pattern, depending on the result of a previous subexpression match or the result of a zero-width assertion. This allows a more powerful form of backreference that permits, for example, matching a subexpression based on whether a previous subexpression was matched.
Balancing group definitions: (?<name1-name2> subexpression). This feature allows the regular expression engine to keep track of nested constructs such as parentheses or opening and closing brackets.
Nonbacktracking subexpressions (also known as greedy subexpressions): (?>subexpression). This feature allows the backtracking engine to guarantee that a subexpression matches only the first match found for that subexpression, as if the expression were running independent of its containing expression. If you do not use this construct, backtracking searches from the larger expression can change the behavior of a subexpression. For example, the regular expression (a+)\w matches one or more "a" characters, along with a word character that follows the sequence of "a" characters, and assigns the sequence of "a" characters to the first capturing group, However, if the final character of the input string is also an "a", it is matched by the \w language element and is not included in the captured group.
Right-to-left matching, which is specified by supplying the RegexOptions.RightToLeft option to a Regex class constructor or static instance matching method. This feature is useful when searching from right to left instead of from left to right, or in cases where it is more efficient to begin a match at the right part of the pattern instead of the left. As the following example illustrates, using right-to-left matching can change the behavior of greedy quantifiers. The example conducts two searches for a sentence that ends in a number. The left-to-right search that uses the greedy quantifier + matches one of the six digits in the sentence, whereas the right-to-left search matches all six digits.
Positive and negative lookbehind: (?<=subexpression) for positive lookbehind, and (?<!subexpression) for negative lookbehind. This feature is similar to lookahead, which is discussed earlier in this topic. Because the regular expression engine allows complete right-to-left matching, regular expressions allow unrestricted lookbehinds. Positive and negative lookbehind can also be used to avoid nesting quantifiers when the nested subexpression is a superset of an outer expression. Regular expressions with such nested quantifiers often offer poor performance.

Scanning of Valid Email Format

Examples of Scanning of Valid Email Format.

ASP.NET Code Input:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "dahhhvid.johgnes@prghjjosehware.com"
               Dim xmatchstr As String = ""
               lbl01.Text = xmatchstr
           End Sub
           Function showresult(xstring,xpattern)
               Dim xmatch As Match
               Dim xcaptures As CaptureCollection
               Dim ycaptures As CaptureCollection
               Dim xgroups As GroupCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               Dim yint As Integer
               Dim zint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Match(string,""" & Replace(xpattern,"<","&lt;") & """): """
               xmatch = Regex.Match(xstring,xpattern)
               xmatchstr = xmatchstr & xmatch.value & """<br />"
               xcaptures = xmatch.Captures
               xmatchstr = xmatchstr & "->Result of CaptureCollection.Count: """
               xmatchstr = xmatchstr & xcaptures.Count & """<br />"
               For xint = 0 to xcaptures.Count - 1
                   xmatchstr = xmatchstr & "->->Result of CaptureCollection("& xint & ").Value, Index, Length: """
                   xmatchstr = xmatchstr & xcaptures(xint).Value & ", " & xcaptures(xint).Index & ", " & xcaptures(xint).Length & """<br />"
                   xgroups = xmatch.Groups
                   xmatchstr = xmatchstr & "->Result of GroupCollection.Count: """
                   xmatchstr = xmatchstr & xgroups.Count & """<br />"
                   For yint = 0 to xgroups.Count - 1
                       xmatchstr = xmatchstr & "->->Result of GroupCollection("& yint & ").Value, Index, Length: """
                       xmatchstr = xmatchstr & xgroups(yint).Value & ", " & xgroups(yint).Index & ", " & xgroups(yint).Length & """<br />"
                       ycaptures = xgroups(yint).Captures
                       xmatchstr = xmatchstr & "->->->Result of CaptureCollection.Count: """
                       xmatchstr = xmatchstr & ycaptures.Count & """<br />"
                       For zint = 0 to ycaptures.Count - 1
                           xmatchstr = xmatchstr & "->->->->Result of CaptureCollection("& zint & ").Value, Index, Length: """
                           xmatchstr = xmatchstr & ycaptures(zint).Value & ", " & ycaptures(zint).Index & ", " & ycaptures(zint).Length & """<br />"
                       Next
                   Next
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Scanning of of Valid Email Format</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>

HTML Web Page Embedded Output:

Source/Referencee

https://docs.microsoft.com/en-us/dotnet/standard/base-types/details-of-regular-expression-behavior

Content

Regular Expression Behaviours

Benefits of the NFA Engine

.NET Framework Engine Capabilities

Scanning of Valid Email Format

See also

Source/Referencee