Documentation for a newer release is available. View Latest

Character Replacement

The character replacement module is a mechanism that uses configurable character set rules to sanitize messages passing through a scheme pack. It supports both simple one-to-one character replacements (e.g., ä > a as commonly used in English) and more complex mappings (e.g., ä > ae or ß > ss as preferred in German). These rules are customizable to align with language specific conventions or project requirements.

Example usage

The module uses spring configuration to define the rules used. The interface CharacterReplacer has a default implementation provided by spring, so to use it, simply add the csm-character-replacement module to maven and then dependency injection will pull the implementation through.

Then we simply pass the message (in string format) to our characterReplacer, this will return a sanitized string.

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ"))

# prints AAA

If you wish to only replace part of the message, you can include optional startFrom and endAfter strings, which will find the first instance of those strings; within the message; and will only replace characters within those boundaries, including the startFrom and endAfter strings.

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.empty())

# prints ÀÁÂ iii eee
(everything after and including íîï is replaced)

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.empty(), Optional.of(íîï))

# prints AAA iii éèê
(everything before and including íîï is replaced)

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.of(íîï))

# prints ÀÁÂ iii éèê
(only íîï is replaced)

Configuration

There are 3 types of configuration available:

  1. Char-to-char - replace a single character with another

  2. list-to-char - replace any character in a list to a defined character

  3. regex-to-char - replace any character in a regular expression to a defined character

Any combination of these 3 is possible if required.
character-replacements {
    char-to-char-replacements = [
        {character = À, replaceWith = A, replaceInDomOnly = true/false},
        {character = ï, replaceWith = i, replaceInDomOnly = true/false},
    ]
    list-to-char-replacements = [
        {list = [È,É,Ê,Ë], replaceWith = E, replaceInDomOnly = true/false}
    ]
    regex-to-char-replacements = [
        {regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true/false}
    ]
}
Config Type Description

character-replacements.char-to-char-replacements.character

Char

Character to be replaced

character-replacements.char-to-char-replacements.replaceWith

Character

Character replacement

character-replacements.char-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.list-to-char-replacements.list

List<Character>

List of character’s to be replaced

character-replacements.list-to-char-replacements.replaceWith

Character

Character replacement for list

character-replacements.list-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.regex-to-char-replacements.regex

String

Regular expression for replaced characters

character-replacements.regex-to-char-replacements.replaceWith

Character

Character replacement for regex

character-replacements.regex-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM