How to match repeated patterns?

I would like to match:

some.name.separated.by.dots

But I don't have any idea how.

I can match a single part like this

 \w+\.

How can I say "repeat that"

4

4 Answers

Try the following:

\w+(?:\.\w+)+

The + after (?: ... ) tell it to match what is inside the parenthesis one or more times.

Note that \w only matches ASCII characters, so a word like café wouldn't be matches by \w+, let alone words/text containing Unicode.

EDIT

The difference between [...] and (?:...) is that [...] always matches a single character. It is called a "character set" or "character class". So, [abc] does not match the string "abc", but matches one of the characters a, b or c.

The fact that \w+[\.\w+]* also matches your string is because [\.\w+] matches a . or a character from \w, which is then repeated zero or more time by the * after it. But, \w+[\.\w+]* will therefor also match strings like aaaaa or aaa............

The (?:...) is, as I already mentioned, simply used to group characters (and possible repeat those groups).

More info on character sets:

More info on groups:

EDIT II

Here's an example in Java (seeing you post mostly Java answers):

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main { public static void main(String[] args) { String text = "some.text.here only but not Some other " + "there some.name.separated.by.dots and.we are done!"; Pattern p = Pattern.compile("\\w+(?:\\.\\w+)+"); Matcher m = p.matcher(text); while(m.find()) { System.out.println(m.group()); } }
}

which will produce:

some.text.here
some.name.separated.by.dots
and.we

Note that m.group(0) and m.group() are equivalent: meaning "the entire match".

10

This will also work:

(\w+(\.|$))+
0

You can use ? to match 0 or 1 of the preceeding parts, * to match 0 to any amount of the preceeding parts, and + to match at least one of the preceeding parts.

So (\w\.)? will match w. and a blank, (\w\.)* will match r.2.5.3.1.s.r.g.s. and a blank, and (\w\.)+ will match any of the above but not a blank.

If you want to match something like your example, you'll need to do (\w+\.)+, which means 'match at least one non whitespace, then a period, and match at least one of these'.

3
(\w+\.)+

Apparently, the body has to be at least 30 characters. I hope this is enough.

1

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like