Java regex


Java RegexViews 3512

In this tutorial, we will discuss what is a Java Regular expression and how to use java substring regex for pattern matching using the pattern.matcher along with different examples. We will also cover various java regex special characters that we use for java pattern matches.

What is a Regular expression (Java regex)?

A regular expression is a technique that we use to search for particular patterns in a string. It can be either a single character or a sequence of characters. We can use the java regex to perform any type of string search and replace operation.

In order to use the java regular expression, we can import the java.util.regex package.

java.util.regex package

The java.util.regex package contains 1 interface and 3 classes as listed below:

  • MatchResult interface
  • Matcher class
  • Pattern class
  • PatternSyntaxException class

Java Regular expression or RegEx Pattern

Pattern class

The Pattern class is used to implement the java regular expression. It has a compile() method that accepts the regular expression as an argument and returns a pattern object that we can use to perform a pattern match.

Below are the commonly used methods of the Pattern class:

MethodDescription
Matcher matcher(CharSequence input)Creates a matcher that matches the input with the given pattern
String pattern()Returns a regular expression from which the pattern was compiled
String[] split(CharSequence input)Splits the input sequence around the pattern match
Pattern compile(String regex)Compiles the regular expression as a pattern
boolean matches(String regex, CharSequence input,Compiles the regular expression and performs a pattern match.

The compile method has an option flag parameter that denotes how to perform a pattern java match:

  • Pattern.CASE_INSENSITIVE: Ignores the case of letters during the pattern search
  • Pattern.LITERAL: Treats the special characters as ordinary characters during the pattern search
  • Pattern.UNICODE_CASE: Used along with CASE_INSENSITIVE to ignore the case of letters outside the English alphabets.

Matcher class

The Matcher class implements the MatchResult interface and performs pattern matches on a sequence of characters. We can create a Matcher object using the matcher method on the Pattern object.

Below are the different methods that are present in the Matcher class:

MethodDescription
int end()Returns the offset of the last character that is matched
boolean find()Finds the next subsequence of the input that matches the pattern
boolean find(int start)Resets the matcher and finds the next subsequence of the input that matches the pattern starting from the specified index
String group()Returns the input subsequence that matches the expression
int groupCount()Returns the number of capturing groups in the matcher's pattern
boolean matches()Finds the match against the pattern
Pattern pattern()Returns the pattern interpreted by the matcher
Matcher region(int start, int end)Sets the limit of the region to perform pattern match
String replaceAll(String replacement)Replaces all the subsequence that matches the pattern with the given new string
Matcher reset()Resets the matcher

Regular Expression Patterns

We can check for either alphabet or numeric regular expression patterns in an input string. The compile method of the pattern class accepts this regular expression as the first parameter. The different combinations of patterns or character classes are below:

PatternDescription
[abc]Finds a character from the options provided in the bracket
[^abc]Finds a character that is not between the options provided in the bracket
[0-9]Finds a character in the range 0-9
[a-zA-Z]Finds a character between a to z of both cases
[a-g[k-r]]Finds a character between a to g and k to r (union)
[a-z&&[lmn]]Finds a character between a to z that has l,m,n - intersection
[a-z&&[^de]]Finds a character between a and z except d and e - subtraction
[a-z&&[^h-k]]Finds a character between a and z except in the range h and k

Metacharacters

We can also use metacharacters as part of the regular expression patterns which have a special meaning.

MetacharacterDescription
|Finds a match for any one of the patterns separated by |
.Finds a single instance of any character
^Finds a match at the beginning of the string
$Finds a match at the end of the string
\dFinds a digit
\sFinds a whitespace character
\bFinds a match at either beginning or end of the word
\uxxxxFinds a unicode character specified by the hexadecimal number xxxx
\DAny non digit equivalent to [^0-9]
\SAny non-whitespace character which is equivalent to [^\s]
\wAny word character which is equivalent to [a-zA-Z_0-9]
\WAny non-word character which is equivalent to [^\w]

Quantifiers

We can use quantifiers to define the quantity or number of occurrences of the specified character in the regular expression pattern.

QuantifierDescription
a+a occurs one or more times
a*a occurs zero or more times
a?a occurs zero or once
a{n}a occurs n times
a{n,}a occurs n or more times
a{m,n}a occurs atleast m times but less than n times

Java Regular expressions examples

Now, let’s see various java regex examples that demonstrate different java patterns.

Example: Find a string

Below is a simple example to find a java pattern with the string “java” in the input text. It uses the java pattern.matcher method to check for the required pattern. If the pattern is found, it returns true else it returns false.

import java.util.regex.*;

public class RegExDemo {

  public static void main(String[] args) {
    Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher("Welcome to Java tutorial");
    
    boolean bfound = m.find();
    if(bfound)
      System.out.println("Pattern found");
    else
      System.out.println("Pattern not found");
  }

}
Pattern found

Example: Different ways of writing a regular expression

There are different ways of writing regular expression patterns in java. The 1st method uses a combination of Pattern and Matcher class with Pattern.matcher method and the matches method in different statements. The 2nd method uses the same combination but in a single statement while the third method uses only Pattern.matches to search for the regular expression pattern.

In this example, we check for the pattern with the 2nd character as ‘a’, and the remaining characters can be any letters.

import java.util.regex.*;
public class RegExDemo2 {

  public static void main(String[] args) {
    Pattern p = Pattern.compile(".a..");
    Matcher m = p.matcher("java");
    System.out.println(m.matches());
    
    boolean b = Pattern.compile(".a..").matcher("java").matches();
    System.out.println(b);
    
    boolean bm = Pattern.matches(".a..", "java");
    System.out.println(bm);

  }

}
true
true
true

Example: Regular expression pattern using . (dot)

The below example shows different demonstrations of using the .(dot) character for a regular expression. The 1st output is true since it matches the input having 2nd character as i. The 2nd output is false since it does not match with the given expression since there is no ‘i’ in the 2nd character. The 3rd output is false since there are more than 3 characters. The last 2 statements are true since the 1st character is ‘h’ and the last character is ‘e’ respectively matching the number of character length as well.

import java.util.regex.*;
public class RegExDemo3 {

  public static void main(String[] args) {
    System.out.println(Pattern.matches(".i", "hi"));
    System.out.println(Pattern.matches(".i", "at"));
    System.out.println(Pattern.matches(".a.", "java"));
    System.out.println(Pattern.matches("h.", "hi"));
    System.out.println(Pattern.matches("..e", "bye"));

  }

}
true
false
false
true
true

Example: Regular expression character class

In this example, we use the characters as a regular expression pattern. If the pattern is present in the input string, it returns true else it returns false.

import java.util.regex.*;
public class RegExDemo4 {

  public static void main(String[] args) {
    System.out.println(Pattern.matches("[abc]", "bag"));
    System.out.println(Pattern.matches("[abc]", "a"));
    System.out.println(Pattern.matches("[a-c][p-u]", "ar"));
    System.out.println(Pattern.matches(".*come.*", "welcome"));
    System.out.println(Pattern.matches("java", "Java"));
  }

}
false
true
true
true
false

Example: Regular expression quantifier

In the below example, we use various quantifiers like ‘?’ that checks if the character occurs only once, ‘+’ checks if the character occurs more than once, and ‘*’ checks if the character occurs zero or more times.

import java.util.regex.*;
public class RegExDemo5 {

  public static void main(String[] args) {
    System.out.println(Pattern.matches("[lmn]?", "l"));
    System.out.println(Pattern.matches("[lmn]?", "hello"));
    System.out.println(Pattern.matches("[lmn]+", "llmmn"));
    System.out.println(Pattern.matches("[lmn]*", "java"));
    System.out.println(Pattern.matches("[lmn]*", "lln"));
  }

}
true
false
true
false
true

Example: Find multiple occurrences using the matcher method

The below example illustrates the multiple occurrences of the pattern in the input string using the Pattern.matcher method. It displays the locations at which the character ‘a’ occurs in the string.

import java.util.regex.*;
public class RegExDemo6 {

  public static void main(String[] args) {
    Pattern p = Pattern.compile("a");
    Matcher m = p.matcher("Welcome to java tutorial");
    
    while(m.find()) {
      System.out.println("Occurs at: " + m.start() + " - " + m.end());
    }

  }

}
Occurs at: 12 - 13
Occurs at: 14 - 15
Occurs at: 22 - 23

Example: Boundary matches

This is one of the java pattern examples that check for boundary matches. This is a type of java regex special characters in the search pattern. The 1st output is true since the pattern matches the beginning of the string while the second one is false since it does not begin with the pattern.

import java.util.regex.*;
public class RegExDemo7 {

  public static void main(String[] args) {
    System.out.println(Pattern.matches("^Java$","Java"));
    System.out.println(Pattern.matches("^Java$","Welcome to java"));
    
  }

}
true
false

Example: Regular expression with digits

This example uses a digits pattern in the regular expression. It checks for a match with any digit that follows the word “Java”. Hence the 1st 2 output is true since it contains a digit while the last output is false since it does not contain any digit.

import java.util.regex.*;
public class RegExDemo7 {

  public static void main(String[] args) {
    String regex = "Java\\d";
    System.out.println(Pattern.matches(regex, "Java5"));
    System.out.println(Pattern.matches(regex, "Java8"));
    System.out.println(Pattern.matches(regex, "JavaScript"));
    
  }

}
true
true
false

Example: Using logical operators in regular expression pattern

We can also use logical operators like AND, OR in patterns. By default, it considers and AND operator when we have more than one character in the regular expression pattern. For example, in the below code, the output is true if the first 2 characters are ‘c’ and ‘h’. Hence the 1st 2 output is true and the last output is false.

import java.util.regex.*;
public class RegExDemo8 {

  public static void main(String[] args) {
    String regex = "[Cc][h].*";
    String s = "cheque";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    
    System.out.println(m.matches());
    
    s = "Chart";
    m = p.matcher(s);
    System.out.println(m.matches());

    s = "color";
    m = p.matcher(s);
    System.out.println(m.matches());
  }

}
true
true
false

We can use the OR operator by using the ‘|’ symbol to check for the matching patterns. In this example, the output is true if the input string contains either the text “Java” or “JavaScript”.

import java.util.regex.*;
public class RegExDemo8 {

  public static void main(String[] args) {
    
    String regex = ".*Java.*|.*JavaScript.*";
    String s = "Welcome to Java tutorial";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    System.out.println(m.matches());
    
    s = "JavaScript tutorial";
    m = p.matcher(s);
    System.out.println(m.matches());
    
    s = "C tutorial";
    m = p.matcher(s);
    System.out.println(m.matches());
  }

}
true
true
false

The above two examples also illustrate the use of java substring regex in pattern search since we check for a substring in the input string.

Conclusion

In this tutorial, we have learned Java Regular expression pattern matching using Pattern.matcher and other methods with examples along with how to use Java regex special characters and java substring regex in pattern search.

Reference

Translate »