Java 正则表达式|极客教程

Java 正则表达式，正则表达式用于定义可用于搜索，操作和编辑文本的字符串模式。这些表达式也称为 Regex （正则表达式的简写形式）。

让我们举个例子来更好地理解它

在下面的示例中，正则表达式.*book.*用于搜索文本中字符串“book”的出现。

import java.util.regex.*;  
class RegexExample1{  
   public static void main(String args[]){  
      String content = "This is Chaitanya " +
        "from Beginnersbook.com.";

      String pattern = ".*book.*";

      boolean isMatch = Pattern.matches(pattern, content);
      System.out.println("The text contains 'book'? " + isMatch);
   }
}

输出：

The text contains 'book'? true

在本教程中，我们将学习如何定义模式以及如何使用它们。java.util.regex API（我们在处理 Regex 时需要导入的包）有两个主要类：

1）java.util.regex.Pattern – 用于定义模式

2）java.util.regex.Matcher – 用于使用模式对文本执行匹配操作

`java.util.regex.Pattern`类：

`Pattern.matches()`

我们已经在上面的例子中看到过这种方法的用法，我们在给定的文本中搜索了字符串"book"。这是使用 Regex 在文本中搜索String的最简单和最简单的方法之一。

String content = "This is a tutorial Website!";
String patternString = ".*tutorial.*";
boolean isMatch = Pattern.matches(patternString, content);
System.out.println("The text contains 'tutorial'? " + isMatch);

如您所见，我们使用Pattern类的matches()方法来搜索给定文本中的模式。模式.*tutorial.*在字符串"tutorial"的开头和结尾允许零个或多个字符（表达式.*用于零个或多个字符）。

限制：通过这种方式，我们可以搜索文本中单个出现的模式。要匹配多次出现，您应该使用Pattern.compile()方法（在下一节中讨论）。

`Pattern.compile()`

在上面的例子中，我们在文本中搜索了一个字符串"tutorial"，这是一个区分大小写的搜索，但是如果你想进行大小写不敏感搜索或者想要多次搜索，那么你可能需要在用文本搜索之前，先使用Pattern.compile()。这就是这种方法可以用于这种情况的方法。

String content = "This is a tutorial Website!";
String patternString = ".*tuToRiAl.";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);

这里我们使用了一个标志Pattern.CASE_INSENSITIVE来进行不区分大小写的搜索，还有其他几个标志可以用于不同的目的。要阅读有关此类标志的更多信息，请参阅此文档。

现在：我们已经获得了一个Pattern实例，但是如何匹配呢？为此，我们需要一个Matcher实例，我们可以使用Pattern.matcher()方法。让我们讨论一下。

`Pattern.matcher()`方法

在上一节中，我们学习了如何使用compile()方法获取Pattern实例。在这里，我们将学习如何使用matcher()方法从Pattern实例获取Matcher实例。

String content = "This is a tutorial Website!";
String patternString = ".*tuToRiAl.*";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(content);
boolean isMatched = matcher.matches();
System.out.println("Is it a Match?" + isMatched);

输出：

Is it a Match?true

`Pattern.split()`

要根据分隔符将文本拆分为多个字符串（这里使用正则表达式指定分隔符），我们可以使用Pattern.split()方法。这是如何做到的。

import java.util.regex.*;  
class RegexExample2{  
public static void main(String args[]){  
    String text = "ThisIsChaitanya.ItISMyWebsite";
    // Pattern for delimiter
    String patternString = "is";
    Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
    String[] myStrings = pattern.split(text);
    for(String temp: myStrings){
        System.out.println(temp);
    }
    System.out.println("Number of split strings: "+myStrings.length);
}}

输出：

Th

Chaitanya.It
MyWebsite
Number of split strings: 4

第二个拆分字符串在输出中为null。

`java.util.regex.Matcher`类

我们已经讨论过上面的Matcher类了。让我们回忆一下：

创建`Matcher`实例

String content = "Some text";
String patternString = ".*somestring.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(content);

主要方法

matches()：它在创建 Matcher 实例时将正则表达式与传递给Pattern.matcher()方法的整个文本进行匹配。

...
Matcher matcher = pattern.matcher(content);
boolean isMatch = matcher.matches();

lookingAt()：类似于matches()方法，只是它匹配正则表达式只对着文本的开头，而matches()搜索整个文本。

find()：搜索文本中正则表达式的出现次数。主要用于搜索多次出现的情况。

start()和end()：这两种方法通常与find()方法一起使用。它们用于获取使用find()方法找到的匹配项的开始和结束索引。

让我们举个例子使用`Matcher`方法来找出多次出现：

package beginnersbook.com;
import java.util.regex.*;  
class RegexExampleMatcher{  
public static void main(String args[]){  
  String content = "ZZZ AA PP AA QQQ AAA ZZ";

  String string = "AA";
  Pattern pattern = Pattern.compile(string);
  Matcher matcher = pattern.matcher(content);

  while(matcher.find()) {
     System.out.println("Found at: "+ matcher.start()
            + 
            " - " + matcher.end());
  }
}
}

输出：

Found at: 4 - 6
Found at: 10 - 12
Found at: 17 - 19

现在我们熟悉Pattern和Matcher类以及将正则表达式与文本匹配的过程。让我们看一下我们定义正则表达式的各种选项：

字符串字面值

假设您只想在文本中搜索特定字符串，例如“abc”然后我们可以简单地编写这样的代码：这里的文本和正则表达式都是相同的。
Pattern.matches("abc", "abc")

字符类

字符类将输入文本中的单个字符与字符类中的多个允许字符进行匹配。例如[Cc]haitanya将匹配所有出现的字符串"chaitanya"，带有小写或大写C。更多例子：
Pattern.matches("[pqr]", "abcd");它会给出错误，因为文中没有p，q或r
Pattern.matches("[pqr]", "r");当找到r时返回true
Pattern.matches("[pqr]", "pq");返回false，因为其中任何一个都可以在文本中不是两个。

以下是各种字符类构造的完整列表：
[abc]：如果文本中包含其中一个（a，b或c）且只有一次，它将与文本匹配。
[^abc]：除a，b或c以外的任何单个字符（^表示否定）
[a-zA-Z]：a到z，或A到Z，包括（范围）
[a-d[m-p]]：a到d，或m到p：[a-dm-p]（联合）
[a-z&&[def]]：它们中的任何一个（d，e或f）
[a-z&&[^bc]]：a到z，除了b和c：[ad-z]（减法）
[a-z&&[^m-p]]：a到z，除了m到p：[a-lq-z]（减法）

预定义字符类 – 元字符

这些就像在编写正则表达式时可以使用的短代码。

Construct   Description
.   ->  Any character (may or may not match line terminators)
\d  ->  A digit: [0-9]
\D  ->  A non-digit: [^0-9]
\s  ->  A whitespace character: [ \t\n\x0B\f\r]
\S  ->  A non-whitespace character: [^\s]
\w  ->  A word character: [a-zA-Z_0-9]
\W  ->  A non-word character: [^\w]

对于例如
Pattern.matches("\\d", "1");将返回true
Pattern.matches("\\D", "z");返回true
Pattern.matches(".p", "qp");返回true，点（.）代表任何字符

边界匹配器

^   Matches the beginning of a line.
$   Matches then end of a line.
\b  Matches a word boundary.
\B  Matches a non-word boundary.
\A  Matches the beginning of the input text.
\G  Matches the end of the previous match
\Z  Matches the end of the input text except the final terminator if any.
\z  Matches the end of the input text.

例如
Pattern.matches("^Hello$", "Hello")：返回true，以Hello开始和结束
Pattern.matches("^Hello$", "Namaste! Hello")：返回false，不以Hello开始
Pattern.matches("^Hello$", "Hello Namaste!")：返回false，不以Hello结束

量词

Greedy  Reluctant   Possessive  Matches
X?  X?? X?+ Matches X once, or not at all (0 or 1 time).
X*  X*? X*+ Matches X zero or more times.
X+  X+? X++ Matches X one or more times.
X{n}    X{n}?   X{n}+   Matches X exactly n times.
X{n,}   X{n,}?  X{n,}+  Matches X at least n times.
X{n, m) X{n, m)? X{n, m)+   Matches X at least n time, but at most m times.

几个例子

import java.util.regex.*;  
class RegexExample{  
public static void main(String args[]){  
   // It would return true if string matches exactly "tom"
   System.out.println(
     Pattern.matches("tom", "Tom")); //False

   /* returns true if the string matches exactly 
    * "tom" or "Tom"
    */
   System.out.println(
     Pattern.matches("[Tt]om", "Tom")); //True
   System.out.println(
     Pattern.matches("[Tt]om", "Tom")); //True

   /* Returns true if the string matches exactly "tim" 
    * or "Tim" or "jin" or "Jin"
    */
   System.out.println(
     Pattern.matches("[tT]im|[jJ]in", "Tim"));//True
   System.out.println(
     Pattern.matches("[tT]im|[jJ]in", "jin"));//True

   /* returns true if the string contains "abc" at 
    * any place
    */
   System.out.println(
     Pattern.matches(".*abc.*", "deabcpq"));//True

   /* returns true if the string does not have a 
    * number at the beginning
    */
   System.out.println(
     Pattern.matches("^[^\\d].*", "123abc")); //False
   System.out.println(
     Pattern.matches("^[^\\d].*", "abc123")); //True

   // returns true if the string contains of three letters
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aPz"));//True
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aAA"));//True
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "apZx"));//False

   // returns true if the string contains 0 or more non-digits
   System.out.println(
     Pattern.matches("\\D*", "abcde")); //True
   System.out.println(
     Pattern.matches("\\D*", "abcde123")); //False

   /* Boundary Matchers example
    * ^ denotes start of the line
    *  $denotes end of the line */ System.out.println( Pattern.matches("^This$ ", "This is Chaitanya")); //False
   System.out.println(
     Pattern.matches("^This $", "This")); //True System.out.println( Pattern.matches("^This$ ", "Is This Chaitanya")); //False
}
}