How to Get All Uppercase Words from a String in JavaScript

And all lowercase words, too

On rare occasions, Google surprises you by returning no useful information for your search. Extracting numbers and retrieving all uppercase words from a given string in JavaScript were two such problems. In the latter case, there were all kinds of regex examples to find capitalized words or uppercase letters, but not the uppercase words.

In this post we discuss why getting uppercase (or lowercase) words from a string is not that straightforward, and how can we attempt it.

If you’re not interested in details and just want this functionality in one of your JavaScript apps, consider using my NPM package case-study .

Interactive Example

To get you an idea about what do we want to achieve, below is an an example string in the textarea that you can modify and see the changing results:

Uppercase Words

Lowercase Words

Some Background

While searching for a solution, I came across a python related stackoverflow answer that addressed this matter:

This has nothing to do with splitting and punctuation; you just care about the letters (and numbers), and just want a regular expression …

It further revealed:

If you don’t care about numbers, replace \w with [A-Za-z] for just letters, or [A-Za-z'] to include contractions

Admittedly, I’ve never been good with regular expressions, but if finding words from a string is as simple as [A-Za-z], then surely finding only uppercase words should be similarly easy too?

Unfortunately, it entirely depends on what you consider a valid uppercase word, as discussed below.

Why the complexity?

1 - Numbers - Are numbers uppercase or lowercase? Would you consider 1, 2, 3, 77, B2B, B2C, HBD2U, PS2, H2O, 3G, 4G, 5G, and PS4 uppercase? (Check this thread for scholarly and detailed debate, most of which went straight over my head). Personally, I’d say it’s highly subjective.

2 - Special characters and punctuation marks - Is F.B.I as a whole uppercase? Similarly, FILET-O-FISH or OK! ?

3 - Contraction and double contractions - what about YOU’VE, HAVEN’T, SHOULD’NT’VE, D’Y’ALL? (I guess everyone would agree they should be considered uppercase or lowercase depending on the case of the alphabets)

What’s the solution?

Again, there’s no right or wrong solution for that. It simply comes down to what qualifies as uppercase for you.

For me, a word consisting of contraction, double contraction or alphanumeric qualifies as uppercase if ALL the alphabets are uppercase, and as lowercase when all of them are lowercase. The numbers alone, I believe, are neither upper nor lower case.

The Regexes

Below are the regexes for different criteria:

1- Alphabet-only uppercase

const str = "HERE'S AN UPPERCASE PART of the string";
const upperCaseWords = str.match(/(\b[A-Z][A-Z]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'HERE', 'S', 'AN', 'UPPERCASE', 'PART' ]

Regex explained:

  • \b - start of the word
  • [A-Z] - Any character between capital A and Z
  • [A-Z]+ - Any character between capital A and Z, but for one of more times and without interruption
  • | - OR
  • \b - start of the word
  • [A-Z] - Any character between capital A and Z
  • \b - end of the word
  • g - global (run multiple times till string ends, not just for one instance)

Essentially, the right side of OR handles single uppercase letters which are words in themselves like “I” and “A”. The left side handles the stream.

2- Alphabet-only uppercase with contraction and double contraction

To include contraction and double contractions, make a small addition of ' in the original regex:

const str = "HERE'S AN UPPERCASE PART of the string, What D'Y'ALL think?";
const upperCaseWords = str.match(/(\b[A-Z]['A-Z]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'HERE\'S', 'AN', 'UPPERCASE', 'PART', 'D\'Y\'ALL' ]

3- Alphabet-only lowercase

The get all lowercase version will be:

const str = "here's a lowercase string for you.";
const lowerCaseWords = str.match(/(\b[a-z][a-z]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'here', 's', 'a', 'lowercase', 'string', 'for', 'you' ]

4- Alphabet-only lowercase with contraction and double contraction

const str = "here's a lowercase string for you. What d'y'all think?";
const lowerCaseWords = str.match(/(\b[a-z]['a-z]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'here\'s', 'a', 'lowercase', 'string', 'for', 'you', 'd\'y\'all', 'think' ]

5- Alphanumeric-only uppercase

const str = "K2, H2O, B2B, B2C, AK47, 3G, G8, 7UP, gr8, 9, 99";
const upperCaseWords = str.match(/(\b[A-Z0-9][A-Z0-9]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'K2', 'H2O', 'B2B', 'B2C', 'AK47', '3G', 'G8', '7UP', '99' ]

The only addition we have made is 0-9 at two places. Notice, we don’t add 0-9 on the right side of | (OR), because that would count single digit number as Uppercase. Unfortunately though, the regex does count two or more digit numbers as uppercase (‘99’). I haven’t been able to solve this, yet. But a non-regex workaround would be to loop through the array and remove any number.

6- Alphanumeric-only lowercase

const str = "gr8, 1to1, one2one, 8pm";
const lowerCaseWords = str.match(/(\b[a-z0-9][a-z0-9]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'gr8', '1to1', 'one2one', '8pm' ]

7- Alphanumeric-only uppercase contraction and double contraction

const str = "K2'S PEAK CLIMBED!";
const upperCaseWords = str.match(/(\b[A-Z0-9]['A-Z0-9]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'K2\'S', 'PEAK', 'CLIMBED' ]

8- Alphanumeric-only lowercase contraction and double contraction

const str = "5g's the latest tech.";
const lowerCaseWords = str.match(/(\b[a-z0-9]['a-z0-9]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ '5g\'s', 'the', 'latest', 'tech' ]

And … we’re done!

See also