Remove Whitespaces Between Markdown Markers

To fix Bold, Italic, and Strikethrough tags.

Markdown Whitespace Inconsistencies

Markdown spec outlines markers, such as the asterisk (*), underscore (_), and double tilde (~~), for bold, italic and strikethrough, (or otherwise known as strong, emphasis, and delete), but requires these markers to wrap the text content on left and right without any whitespace(s). Any space on either side makes the markers appear as is instead of valid HTML tags after parsing/translation.

For example, the below markdown syntax:

**bold**
**italic**
~~strikethrough~~

Translates correctly into valid HTML:

<strong>bold</strong>
<em>italic</em>
<del>strikethrough</del>

Whereas

** bold**
**italic **
~~ strikethrough ~~

Does not, and appears as it is:

** bold**
**italic **
~~ strikethrough ~~

It usually happens when a layperson works with the markdown syntax and makes mistakes, or you copy rich text from somewhere (like Google Docs) and paste it to another system that does the translation. I have encountered this problem on Airtable, and found a similar discussion on Discourse Support Forum.

Following is a small and imperfect script that does help with these markers/tags.

Code

const MD_STRONG = "**";
const MD_EMPHASIS = "_";
const MD_DELETE = "~~";

const isEven = (num) => num % 2 === 0;

const removeLineInconsistencies = (line, marker) => {
  let splitOnMarker = line.split(marker);
  if (isEven(splitOnMarker.length)) {
    return line;
  }
  splitOnMarker = splitOnMarker.map((markerEnclosedText, index) => {
    if (!isEven(index)) {
      return markerEnclosedText.trim();
    }
    return markerEnclosedText;
  });

  return splitOnMarker.join(marker);
}

const removeMarkdownInconsistencies = (text) => {
  let markdownLines = text.split("\n");

  markdownLines =
    markdownLines
      .map(line => removeLineInconsistencies(line, MD_STRONG))
      .map(line => removeLineInconsistencies(line, MD_EMPHASIS))
      .map(line => removeLineInconsistencies(line, MD_DELETE));

  return markdownLines.join("\n");
};

Code Explained

  • Split on a new line (\n) and then process because all these markers work on the same line.
  • If any markers are in odd numbers on a single line, do not process them. The reason is that we cannot tell which spaces to remove in cases like ** Hello** World**.
  • Then split on the marker and trim everything on the odd index using JavaScript's trim, which removes whitespaces on either side of the string. The reason is that at odd indexes the marker enclosed text is found, starting from 1. (For example: spliting ** Hello World** ** !!!** on ** gives us [ '', ' Hello World', ' ', ' !!!', '' ], in which we need to trim 1st and 3rd indexes only).
  • Join the text on the same marker and return.
  • Repeat that for all the markers (**, _, or ~~).
  • Join the final text on \n and return.

Sample Text And Its Result After Processing

Now, let’s test our method removeMarkdownInconsistencies().

Pass the below dummy markdown text to it:

const text = `

Hello World!

** strong**

**bold   **

** bold2 **

_   emphasis_

_italic _

_ italic2 _

~~ delete~~

~~strikethrough ~~

~~ strikethrough2 ~~

** _ strong emphasis _ **

_** italic bold **_

** ~~ bold strikethrough ~~ **

~~ ** delete strong ** ~~

~~ _ strikethrough italic _   ~~

_ ~~ italic delete ~~_

** ignore ** odd number bold **

_ignore odd number italic _ _ 

ignore odd number strikethroughs ~~ ~~ ~~

Combination of ~~bold~~, ~~italic~~ and ~~strikethrough~~

Finally, an unordered list:
- a
- b
- c

and an ordered list:
1. First
2. Second
3. Third


`;


console.log(
  removeMarkdownInconsistencies(text)
);

After processing, the result will be:


Hello World!

**strong**

**bold**

**bold2**

_emphasis_

_italic_

_italic2_

~~delete~~

~~strikethrough~~

~~strikethrough2~~

**_strong emphasis_**

_**italic bold**_

**~~bold strikethrough~~**

~~**delete strong**~~

~~_strikethrough italic_~~

_~~italic delete~~_

** ignore ** odd number bold **

_ignore odd number italic _ _ 

ignore odd number strikethroughs ~~ ~~ ~~

Combination of ~~bold~~, ~~italic~~ and ~~strikethrough~~

Finally, an unordered list:
- a
- b
- c

and an ordered list:
1. First
2. Second
3. Third

When we put the above fixed code to a markdown parser such as this, we get valid HTML tags:

<p>Hello World!</p>
<p><strong>strong</strong></p>
<p><strong>bold</strong></p>
<p><strong>bold2</strong></p>
<p><em>emphasis</em></p>
<p><em>italic</em></p>
<p><em>italic2</em></p>
<p><del>delete</del></p>
<p><del>strikethrough</del></p>
<p><del>strikethrough2</del></p>
<p><strong><em>strong emphasis</em></strong></p>
<p><em><strong>italic bold</strong></em></p>
<p><strong><del>bold strikethrough</del></strong></p>
<p><del><strong>delete strong</strong></del></p>
<p><del><em>strikethrough italic</em></del></p>
<p><em><del>italic delete</del></em></p>
<p>** ignore ** odd number bold **</p>
<p>_ignore odd number italic _ _ </p>
<p>ignore odd number strikethroughs ~~ ~~ ~~</p>
<p>Combination of <del>bold</del>, <del>italic</del> and <del>strikethrough</del></p>
<p>Finally, an unordered list:</p>
<ul>
<li>a</li>
<li>b</li>
<li>c</li>
</ul>
<p>and an ordered list:</p>
<ol>
<li>First</li>
<li>Second</li>
<li>Third</li>
</ol>

Shortcomings

In addition to ignoring lines with an odd number of markers, this code cannot remove italics that are written with a single * on either side, which is also a valid markdown. We can tweak the above code and handle a single asterisk as well. However, it might mess up bullet points that use the same * but with a space after it. Although, like italic/emphasis, markdown offers an alternative dash - syntax for bullet points, so if that’s what is being used instead of *, you should be able to handle single asterisks for emphasis without much trouble, using the same above code by passing adding another map operation:

// ...
const MD_EMPHASIS_ASTERISK = "*";

// ...
// ...
// ...

  markdownLines =
    markdownLines
      .map(line => removeLineInconsistencies(line, MD_STRONG))
      .map(line => removeLineInconsistencies(line, MD_EMPHASIS_ASTERISK))
      .map(line => removeLineInconsistencies(line, MD_EMPHASIS))
      .map(line => removeLineInconsistencies(line, MD_DELETE));



See also

When you purchase through links on techighness.com, I may earn an affiliate commission.