Common Use Cases of Regular Expressions (regex)
Extracting Information from Structured Emails using Regex
In this page we will use a sample email that has structured data (fields) in its body. We will then look at various regular expressions to extract required pieces of data from email body.
The email we will use as sample has following body:
Name: John Smith
Description: This is a multi-line
description field. This can have
as many lines as needed.
Company: XYZ Inc
Email: sample@domain.com
Extract "Last Name" and/or "First Name"
To extract First and Last name from the sample email. This regex can be used: Name:\s*(\S+)\s(\S+)
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- First Name in Capturing Group 1 (Green)
- Last Name in Capturing Group 2 (Red)
Please note that above regex works if First Name and Last Name both are present (separated by a space).
Sometimes email may not have both First and Last name but only one name (which is usually considered as Last Name). To handle that situation another regex can be added (in addition to the one above) to capture single name in Last Name field. In that case this regex can be used: (?m)Name:\s*(\S+)$
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- First Name in Capturing Group 1 (Green)
- Last Name in Capturing Group 2 (Red)
Extract "Description"
To extract a field that has a multiline value such as Description in our sample email, this regex can be used: (?s)Description:\s*(.*?)Company:
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- Description in Capturing Group 1 (Green)
Please note that here we have used the next field (e.g. Company) as a delimiter to stop capturing from Description because it is a multi-line value.
Extract "Company"
To extract a field that has a single line value such as Company in our sample email, this regex can be used: Company:\s*(.*)
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- Company in Capturing Group 1 (Green)
Extract "Email"
To extract a field that has an email address such as Email in our sample email, this regex can be used: Email:\s*([a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- Email in Capturing Group 1 (Green)
Sometimes you might want to extract email address's user name, domain and TLD separately and store in different fields. In that case this regex can be used: Email:\s*([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- Email user name in Capturing Group 1 (Green)
- Email domain in Capturing Group 2 (Red)
- Email domain in Capturing Group 3 (Orange)
Extract "Phone"
To extract a field that has a single line value such as Phone in our sample email, this regex can be used: Phone:\s([0-9\+\ \(\)]+)
In the screenshot below (taken from www.regex101.com) you can see that it extracts:
- Phone in Capturing Group 1 (Green)
Please contact us at support@ortooapps.com for any questions.
★★★★★ - EXCELLENT
★★★★☆ - GOOD
★★★☆☆ - OK
★★☆☆☆ - POOR
★☆☆☆☆ - RUBBISH