Building my First RegEx!

Julia Zhou
4 min readJul 6, 2020

As a new programming student learning Ruby, I was constantly getting stumped when manipulating string values until a fellow classmate of mine at Flatiron School suggested I use RegEx — short for Regular Expressions — to solve the problems. Through Google searching, I quickly fell into a rabbit hole and discovered how complex the world of RegEx was. I was absolutely baffled by this cryptic use of syntax! A language inside of another programming language?

What is RegEx?

After diving further into the rabbit hole, I began to realize how incredibly useful (and intimidating) of a tool this is for extracting data from any string of text by searching for matches defined by a specific search pattern. RegEx is commonly used for validation and cleaning/parsing data. An interesting feature is that this tool can be used with the same syntax in almost all programming languages!

Although I can always Google search for the patterns matching the needs of the given task, the pattern often needs to be customized or updated down the line when it no longer works which then brings me back to the drawing board for a new one. This cycle made me realize I should get at least a basic grasp of the structure of the expressions so I can construct a few simple ones on my own.

Where do I begin?

I started myself off with a simple task of validating a (fake) Social Security Number within a string with an expression using Regexr.com as an expression editor and RexEgg.com’s Cheat Sheet. (Don’t worry, I’ll mask the number with the same expression!) I found Regexr’s expression editor to be the most user-friendly sandbox allowing me to test different expressions and also includes a a search function for quick syntax look-up on the left hand panel.

  • Here is my string of data — consisting of a phone number and a “Social Security Number”. Based on the format, we will need to build an expression that will validate a string of nine digits separated by dashes after the third digit and after the fifth digit.
  • First, I created a variable and initialized with forward slashes and ended with a forward slash to escape. I looked up and entered the character which encompasses all digits from 0–9 which is ‘\d’ and separated the characters with ‘[-]’ which searches specifically for a dash. Note that the brackets must be included because a dash by itself will signify that you’re searching for a range (ie. characters from a-z).
  • But wait! I found a way to make this expression abstract by replacing the extra ‘\d’ characters with quantifiers
  • Now I can use a ternary operator using my RegEx stored as a variable to validate the “Social Security Number” within the student variable.
Return value from IRB
  • Perfect! No errors raised and the statement is evaluated as true and returned as “Valid”. Now, I want to mask this data using the #sub method and the same RegEx.
Return value from IRB
  • From there, I can update the same expression to create a template for our phone number. Note that I added ‘\(‘ to the pattern which allows the expression to search for parentheses around the area code then followed by a ‘?’ which makes it optional so that different phone numbers written in different formats can be picked up.
Masking phone number with parentheses
Masking phone number with periods as separators

Mission accomplished!

Did You Know?💡 If you want to search for RegEx patterns within your text editor, hit CTRL+F and click on the ‘.*’ option and all data matching the pattern will be highlighted. (Exclude the beginning and ending forward slashes.)

Although, this was a basic expression I built, I hope this walkthrough helps you see that trying to understand RegEx syntax doesn’t have to be so daunting . It will take time and practice to build your way up to complex expressions but it’s worth the investment upfront. After-all, it’s a handy mini-language to keep in your holster regardless of what language you’re using.

Further Readings:

Mastering Ruby Regular Expressions

--

--