Python                                                          Home : www.sharetechnote.com

 

 

 

 

Python - Regular Expression 

 

Regular Expression is a kind of standard defining a pattern for a string. This is most widely used to find a substring using a pattern. You can easily google a lot of information on Regular Expression (I think Wikipedia : Regular Expression would be pretty good start), but it would be almost impossible to understand the details without practicing on your own.

I will just keep posting a bunch of examples as my own cheatsheet and practice. I hope this helps you as well.

 

 

 

Importing Regular Expression Package

 

import re

 

 

 

Finding Patterns

 

There are several different ways to find a pattern using regular expression. search(), match(), findall()

 

>>> re.search(r'\d','1 little 10 little 1000 little indians')

 

<re.Match object; span=(0, 1), match='1'>

 

 

>>> re.search(r'\d','1 little 10 little 1000 little indians').group()

 

'1'

 

 

>>> p = re.compile('\d')

>>> p.findall('1 little 10 little 1000 little indians'))

 

['1', '1', '0', '1', '0', '0', '0']

 

 

>>> re.findall(r'\d','1 little 10 little 1000 little indians')

 

['1', '1', '0', '1', '0', '0', '0']

 

 

 

Find All Integers

 

>>> re.findall(r'\d','1 little 10 little 1.234 little indians')

 

['1', '1', '0', '1', '2', '3', '4']

 

 

>>> re.findall(r'\d.','1 little 10 little 1.234 little indians')

 

['1 ', '10', '1.', '23', '4 ']

 

 

>>> re.findall(r'\d?','1 little 10 little 1.234 little indians')

['1', '', '', '', '', '', '', '', '', '1', '0', '', '', '', '', '', '', '', '', '1', '', '2', '3', '4', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

 

 

>>> re.findall(r'\d??','1 little 10 little 1.234 little indians')

 

['', '1', '', '', '', '', '', '', '', '', '', '1', '', '0', '', '', '', '', '', '', '', '', '', '1', '', '', '2', '', '3', '', '4', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

 

 

>>> re.findall(r'\d{2}','1 little 10 little 1.234 little indians')

 

['10', '23']

 

 

>>> re.findall(r'\d{3}','1 little 10 little 1.234 little indians')

 

['234']

 

 

>>> re.findall(r'\d+','1 little 10 little 1000 little indians')

 

['1', '10', '1000']

 

 

 

Find All Floating Numbers

 

>>> re.findall(r'\d?\.[0-9]*','1 little 10 little 1.234 little indians')

 

['1.234']

 

 

>>> re.findall(r'\d?\.[0-9]*','1 little -1.234 little 1.234 little indians')

 

['1.234', '1.234']

 

 

>>> re.findall(r'[-+]?[0-9]+.?[0-9]*','1 little -1.234 little 1.234 little indians')

 

['1 ', '-1.234', '1.234']

 

 

>>> re.findall(r'[-+]?[0-9]+\.?[0-9]*','1 little -1.234 little 1.234 little indians')

 

['1', '-1.234', '1.234']

 

 

>>> re.findall(r'[-+]?[0-9]+[\.]?[0-9]*','1 little -1.234 little 1.234 little indians')

 

['1', '-1.234', '1.234']

 

 

 

Find All Numbers (Integer and Floating numbers)

 

 

>>> re.findall(r'[-+]?[0-9]+.?[0-9]*','1 little -1.234 little 1.234 little indians')

 

['1 ', '-1.234', '1.234']

 

 

 

Finding a Time/Date Pattern

 

 

>>> re.findall(r'\d{2}\:\d{2}\:\d{2}\.\d{3}','now time is 11:23:45.123 or 11.23.45:123 or 11-23-45.123')

 

['11:23:45.123']

 

 

>>> re.findall(r'\d{2}\:\d{2}\:\d{2}\.\d{3}','now time is 11:23:45.1 or 11:23:45.12 or 11:23:45.123')

 

['11:23:45.123']

 

 

>>> re.findall(r'\d{2}\:\d{2}\:\d{2}\.\d+','now time is 11:23:45.1 or 11:23:45.12 or 11:23:45.123')

 

['11:23:45.1', '11:23:45.12', '11:23:45.123']

 

 

 

Match the pattern

 

# ^ : start

# $ : end

#[...] : group

# * : zero or more characters

 

import re

 

p = re.compile('^[a-z]*$')

print("re.compile('^[a-z]*$');p.match('helloworld')", "\n\t" , p.match('helloworld'))

 

p = re.compile('^[a-z]*$')

print("re.compile('^[a-z]*$');p.match('hello world')", "\n\t"  , p.match('hello world'))

 

p = re.compile('^[a-z\s]*$')

print("re.compile('^[a-z\s]*$');p.match('hello world')", "\n\t" , p.match('hello world'))

 

p = re.compile('^[a-z]*$')

print("re.compile('^[a-z]*$');p.match('HelloWorld')", "\n\t"  , p.match('HelloWorld'))

 

p = re.compile('^[a-zA-Z]*$')

print("re.compile('^[a-zA-Z]*$');p.match('HelloWorld')", "\n\t"  , p.match('HelloWorld'))

 

p = re.compile('^[a-zA-Z\s]*$')

print("re.compile('^[a-zA-Z\s]*$');p.match('Hello World')", "\n\t"  , p.match('Hello World'))

 

p = re.compile('^[a-zA-Z\s]*$')

print("re.compile('^[a-zA-Z\s]*$');p.match('Hello1234 World')", "\n\t"  , p.match('Hello1234 World'))

 

p = re.compile('^[a-zA-Z0-9\s]*$')

print("re.compile('^[a-zA-Z0-9\s]*$');p.match('Hello1234 World')", "\n\t"  , p.match('Hello1234 World'))

 

p = re.compile('^[a-zA-Z0-9\s]*$')

print("re.compile('^[a-zA-Z0-9\s]*$');p.match('###Hello1234 World')", "\n\t"  , p.match('###Hello1234 World'))

 

p = re.compile('[a-zA-Z0-9#\s]*$')

print("re.compile('[a-zA-Z0-9#\s]*$');p.match('###Hello1234 World')", "\n\t"  , p.match('###Hello1234 World'))

 

p = re.compile('.*[a-zA-Z0-9\s]*$')

print("re.compile('.*[a-zA-Z0-9\s]*$');p.match('###Hello1234 World')", "\n\t"  , p.match('###Hello1234 World'))

 

 

Result :----------------------------------------------------------------------

 

// ^[a-z]*$ <-- alphabet only and all lower case any number of characters, no space

//'helloworld'  match this criteria

re.compile('^[a-z]*$');p.match('helloworld')

     <_sre.SRE_Match object; span=(0, 10), match='helloworld'>

 

// ^[a-z]*$ <-- alphabet only and  all lower case, any number of characters, no space

//'hello world' doesn't match this criteria since there is a space in it

re.compile('^[a-z]*$');p.match('hello world')

     None

 

// ^[a-z\s]*$ <-- alphabet only and all lower case, any number of characters, any number of white space

//'hello world' match this criteria

re.compile('^[a-z\s]*$');p.match('hello world')

     <_sre.SRE_Match object; span=(0, 11), match='hello world'>

 

// ^[a-z]*$ <-- alphabet only and all lower case any number of characters, no space

//'HelloWorld' doesn't match this criteria since it has capital letters in it.

re.compile('^[a-z]*$');p.match('HelloWorld')

     None

 

// ^[a-zA-Z]*$ <-- alphabet only and lower or upper case, any number of characters, no space

//'HelloWorld' match this criteria.

re.compile('^[a-zA-Z]*$');p.match('HelloWorld')

     <_sre.SRE_Match object; span=(0, 10), match='HelloWorld'>

 

// ^[a-zA-Z\s]*$ <-- alphabet only and lower or upper case, any number of characters, any number of space

//'Hello World' match this criteria

re.compile('^[a-zA-Z\s]*$');p.match('Hello World')

     <_sre.SRE_Match object; span=(0, 11), match='Hello World'>

 

// ^[a-zA-Z\s]*$ <-- alphabet only and lower or upper case, any number of characters, any number of space

//'Hello1234 World' doesn't match this criteria because it has numbers in it.

re.compile('^[a-zA-Z\s]*$');p.match('Hello1234 World')

     None

 

// ^[a-zA-Z0-9\s]*$ <-- alphabet only and lower or upper case, numbers, any number of characters,

//any number of space

//'Hello1234 World' match this criteria .

re.compile('^[a-zA-Z0-9\s]*$');p.match('Hello1234 World')

     <_sre.SRE_Match object; span=(0, 15), match='Hello1234 World'>

 

// ^[a-zA-Z0-9\s]*$ <-- alphabet only and lower or upper case, numbers, any number of characters,

//any number of space

//'###Hello1234 World' doesn't match this criteria because it has non-alphanet characters (#) in it

re.compile('^[a-zA-Z0-9\s]*$');p.match('###Hello1234 World')

     None

 

// ^[a-zA-Z0-9#\s]*$ <-- alphabet only and lower or upper case, numbers, any number of characters,

//any number of #,any number of space

//'###Hello1234 World' match this criteria

re.compile('[a-zA-Z0-9#\s]*$');p.match('###Hello1234 World')

     <_sre.SRE_Match object; span=(0, 18), match='###Hello1234 World'>

 

//.*[a-zA-Z0-9#\s]*$ <-- ignore any number of characters before the specified pattern is found.

//alphabet only and lower or upper case, numbers, any number of characters, any number of #,any number of space

//'###Hello1234 World' match this criteria

re.compile('.*[a-zA-Z0-9\s]*$');p.match('###Hello1234 World')

     <_sre.SRE_Match object; span=(0, 18), match='###Hello1234 World'>

 

 

 

Reference :

 

[1] Quick-Start: Regex Cheat Sheet

[2] Python Regular Expression Tutorial

[3] Python Regular Expressions