How to Get All Uppercase Words from a String in JavaScript

And all lowercase words, too

On rare occasions Google surprises you by returning no useful information for your query. One such case was finding a way to extract all uppercase words from a given string in javascript. Try as I might, I’d hit a dead end. There were all kinds of regex examples to find capitalized words or uppercase letters, but not uppercase words.

If you’re not interested in details and just want this functionality in one of your javascript apps, consider using my npm package case-study .

Finally, with different search words, I came across a python related stackoverflow answer that was about extracting words from a string, excluding punctuation marks. The opening sentence ended my exasperation:

This has nothing to do with splitting and punctuation; you just care about the letters (and numbers), and just want a regular expression …

It further revealed:

If you don’t care about numbers, replace \w with [A-Za-z] for just letters, or [A-Za-z’] to include contractions

Admittedly, I’ve never been good with regular expressions, but if finding words from a string is as simple as [A-Za-z], then surely finding only uppercase words should be similarly easy, too?

Unfortunately, it entirely depends on what you consider a valid uppercase word, as discussed below.

Why the complexity?

1 - Numbers - Are numbers uppercase or lowercase? Would you consider 1, 2, 3, 77, B2B, B2C, HBD2U, PS2, H2O, 3G, 4G, 5G, and PS4 uppercase? (Check this thread for scholarly and detailed debate, most of which went straight over my head). Personally, I’d say it’s highly subjective.

2 - Special characters and punctuation marks - Is F.B.I as a whole uppercase? Similarly, FILET-O-FISH or OK! ?

3 - Contraction and double contractions - what about YOU’VE, HAVEN’T, SHOULD’NT’VE, D’Y’ALL? (I guess everyone would agree they should be considered uppercase or lowercase depending on the case of the alphabets)

What’s the solution?

Again, there’s no right or wrong solution for that. It simply comes down to what qualifies as uppercase for you.

For me a word consisting of contraction, double contraction or alphanumeric qualifies as uppercase if ALL the alphabets are uppercase, and as lowercase when all of them are lowercase. The numbers alone, I believe, are neither upper nor lower case.

The Regexes

Below are the regexes for different criteria:

1- Alphabet-only uppercase

const str = "HERE'S AN UPPERCASE PART of the string";
const upperCaseWords = str.match(/(\b[A-Z][A-Z]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'HERE', 'S', 'AN', 'UPPERCASE', 'PART' ]

Regex explained:

  • \b - start of the word
  • [A-Z] - Any character between capital A and Z
  • [A-Z]+ - Any character between capital A and Z, but for one of more times and without interruption
  • | - OR
  • \b - start of the word
  • [A-Z] - Any character between capital A and Z
  • \b - end of the word
  • g - global (run multiple times till string ends, not just for one instance)

Essentially, the right side of OR handles single uppercase letters which are words in themselves like “I” and “A”. The left side handles the stream.

2- Alphabet-only uppercase with contraction and double contraction

To include contraction and double contractions, make a small addition of ‘ in the original regex:

const str = "HERE'S AN UPPERCASE PART of the string, What D'Y'ALL think?";
const upperCaseWords = str.match(/(\b[A-Z]['A-Z]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'HERE\'S', 'AN', 'UPPERCASE', 'PART', 'D\'Y\'ALL' ]

3- Alphabet-only lowercase

The get all lowercase version will be:

const str = "here's a lowercase string for you.";
const lowerCaseWords = str.match(/(\b[a-z][a-z]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'here', 's', 'a', 'lowercase', 'string', 'for', 'you' ]

4- Alphabet-only lowercase with contraction and double contraction

const str = "here's a lowercase string for you. What d'y'all think?";
const lowerCaseWords = str.match(/(\b[a-z]['a-z]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'here\'s', 'a', 'lowercase', 'string', 'for', 'you', 'd\'y\'all', 'think' ]

5- Alphanumeric-only uppercase

const str = "K2, H2O, B2B, B2C, AK47, 3G, G8, 7UP, gr8, 9, 99";
const upperCaseWords = str.match(/(\b[A-Z0-9][A-Z0-9]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'K2', 'H2O', 'B2B', 'B2C', 'AK47', '3G', 'G8', '7UP', '99' ]

The only addition we have made is 0-9 at two places. Notice, we don’t add 0-9 on the right side of | (OR), because that would count single digit number as Uppercase. Unfortunately, though, the regex does count two or more digit numbers as uppercase (‘99’). I haven’t been able to solve this, yet. But a non-regex workaround would be to loop through the array and remove any number.

6- Alphanumeric-only lowercase

const str = "gr8, 1to1, one2one, 8pm";
const lowerCaseWords = str.match(/(\b[a-z0-9][a-z0-9]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ 'gr8', '1to1', 'one2one', '8pm' ]

7- Alphanumeric-only uppercase contraction and double contraction

const str = "K2'S PEAK CLIMBED!";
const upperCaseWords = str.match(/(\b[A-Z0-9]['A-Z0-9]+|\b[A-Z]\b)/g);

console.log(upperCaseWords);

// => [ 'K2\'S', 'PEAK', 'CLIMBED' ]

8- Alphanumeric-only lowercase contraction and double contraction

const str = "5g's the latest tech.";
const lowerCaseWords = str.match(/(\b[a-z0-9]['a-z0-9]+|\b[a-z]\b)/g);

console.log(lowerCaseWords);

// => [ '5g\'s', 'the', 'latest', 'tech' ]

And … we’re done!