Character Replacement

The character replacement module is a mechanism that uses configurable character set rules to sanitize messages passing through a scheme pack. It supports both simple one-to-one character replacements (e.g., ä > a as commonly used in English) and more complex mappings (e.g., ä > ae or ß > ss as preferred in German). These rules are customizable to align with language specific conventions or project requirements.

Example usage

The module uses spring configuration to define the rules used. The interface CharacterReplacer has a default implementation provided by spring, so to use it, simply add the csm-character-replacement module to maven and then dependency injection will pull the implementation through.

Then we simply pass the message (in string format) to our characterReplacer, this will return a sanitized string.

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ"))

# prints AAA

Configuration

custom-replacer

There are three types of configuration available when specifying a custom-replacer:

  1. Char-to-char - replace a single character with another

  2. list-to-char - replace any character in a list to a defined character

  3. regex-to-char - replace any character in a regular expression to a defined character

You can specify one or more. Any combination of these three is possible.

Sample Config

character-replacements {
  custom-replacer {
    enabled = true

    char-to-char-replacements = [
      {character = À, replaceWith = A},
      {character = ï, replaceWith = i, replaceInDomOnly = true},
    ]
    list-to-char-replacements = [
      {list = [È, É, Ê, Ë], replaceWith = E}
    ]
    regex-to-char-replacements = [
      {regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true}
    ]
  }
}

Config Fields

Config Type Default Description

character-replacements.custom-replacer.enabled

Boolean

false

Flag to enable config

character-replacements.custom-replacer.char-to-char-replacements.character

Character

Character to be replaced

character-replacements.custom-replacer.char-to-char-replacements.replaceWith

Character

Character replacement

character-replacements.custom-replacer.char-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.custom-replacer.list-to-char-replacements.list

List<Character>

List of character’s to be replaced

character-replacements.custom-replacer.list-to-char-replacements.replaceWith

Character

Character replacement for list

character-replacements.custom-replacer.list-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.custom-replacer.regex-to-char-replacements.regex

String

Regular expression for replaced characters

character-replacements.custom-replacer.regex-to-char-replacements.replaceWith

Character

Character replacement for regex

character-replacements.custom-replacer.regex-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

lookup-table-replacer

lookup-table-replacer uses a lookup table specified in a CSV to look up what string each character should be replaced with. Keeping the replacements in the CSV decreases the clutter in the configuration files and is an easy format to produce and maintain with other tools.

CSV Format

The following is an example of the csv

character codepoint,replacement
\\u00c6,A
\\u00c7,C
\\u00c8,E
\\u00c9,some string
  • Both columns are required

  • character codepoint — The Unicode codepoint of a character to replace. The string must start with \\u followed by the hex value of the codepoint

  • replacement — any string that you wish to replace the character with

The column binding is flexible enough so that it can accommodate unknown columns as well. This may be useful to make the CSV more human-readable. Here’s an example with some extra columns that are purely informational.

character,description,character codepoint,replacement
Æ,LATIN CAPITAL LETTER AE,\\u00c6,A
Ç,LATIN CAPITAL LETTER C WITH CEDILLA,\\u00c7,C
È,LATIN CAPITAL LETTER E WITH GRAVE,\\u00c8,E
É,LATIN CAPITAL LETTER E WITH ACUTE,\\u00c9,some string

Overrides

Optional overrides to the CSV replacements can be specified directly in the config.

Fallback Replacement

An optional fallback replacement string can be provided. This string will be used as the replacement when a matching character cannot be found in the CSV source nor the overrides.

Sample Base Config

character-replacements {
  lookup-table-replacer {
    enabled = true
    csv-source = "file:/filesystem/path/sample-config.csv"
    fallback-replacement = "."
    overrides += {character-codepoint = "\\u00c6", replacement = "some override"}
    overrides += {character-codepoint = "\\u00c9", replacement = "another override"}
  }
}

Sample Downstream Config (Merging with the base config)

character-replacements {
  lookup-table-replacer {
    overrides += {character-codepoint = "\\u00ca", replacement = "a third override"}
  }
}

By only specifying the overrides, we are piggybacking off of the existing config and appending a third override which will be applied to the CSV source, in addition to the previous two.

Sample Downstream Config (Wiping out the base config)

character-replacements {
  lookup-table-replacer = null (1)
  lookup-table-replacer {
    enabled = true
    csv-source = "classpath:completely_different_source.csv"
    overrides += {character-codepoint = "\\u0062", replacement = "ONLY OVERRIDE"} (2)
  }
}
  1. This wipes out the base config, meaning every single field in lookup-table-replacer are removed

  2. The resulting overrides list will have a size of one as this is the only override that survives the config merge

Config Fields

Config Type Default Description

character-replacements.lookup-table-replacer.enabled

Boolean

false

Flag to enable config

character-replacements.lookup-table-replacer.csv-source

String

The path to the CSV source. Must be Specified in the Spring Resource syntax. Only classpath: and file: are supported

character-replacements.lookup-table-replacer.fallback-replacement

String

null

Replacement when the character encountered by the replacer does not have an entry in the lookup table. When not specified, the character is not replaced.

character-replacements.lookup-table-replacer.overrides

List

empty list

A list of overrides to apply onto the CSV source

character-replacements.lookup-table-replacer.overrides.character-codepoint

String

Same as the character codepoint specified in the CSV source

character-replacements.lookup-table-replacer.overrides.replacement

String

Same as the replacement specified in the CSV source

Deprecated Configuration

The following configuration is deprecated and exists only for backward compatibility. It will be unsupported in the 2026.3.0 release.
If both deprecated and new configurations are specified, the deprecated one will be loaded to ensure backwards compatibility.
character-replacements {
  char-to-char-replacements = [
    {character = À, replaceWith = A},
    {character = ï, replaceWith = i, replaceInDomOnly = true},
  ]
  list-to-char-replacements = [
    {list = [È, É, Ê, Ë], replaceWith = E}
  ]
  regex-to-char-replacements = [
    {regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true}
  ]
}
Config Type Default Description

character-replacements.char-to-char-replacements.character

Char

Character to be replaced

character-replacements.char-to-char-replacements.replaceWith

Character

Character replacement

character-replacements.char-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.list-to-char-replacements.list

List<Character>

List of character’s to be replaced

character-replacements.list-to-char-replacements.replaceWith

Character

Character replacement for list

character-replacements.list-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.regex-to-char-replacements.regex

String

Regular expression for replaced characters

character-replacements.regex-to-char-replacements.replaceWith

Character

Character replacement for regex

character-replacements.regex-to-char-replacements.replaceInDomOnly

Boolean

false

Flag indicating whether the replacement should happen only in the text nodes of the DOM

Error Handling

Startup Errors

The configuration is loaded on startup and will fail application startup if there are misconfigurations such as

  • Setting the enabled flag without specifying configuration

  • Setting only one of the required fields in a replacement (e.g., setting regex, but not replaceWith)

  • Invalid type (e.g., specifying a string of two or more characters for Character types)

  • Invalid format (e.g., not conforming to the Unicode codepoint format specified)

  • Failing to load a file from the classpath or file system