Documentation for a newer release is available. View Latest

Character Replacement

The character replacement module is a mechanism to use configurable character set rules to sanitize messages going via a scheme pack.

Example usage

The module uses spring configuration to define the rules used. The interface CharacterReplacer has a default implementation provided by spring, so to use simply add the csm-character-replacement module to maven and then dependency injection will pull the implementation through.

Then we simply pass the message (in string format) to our characterReplacer, this will return a sanitized string.

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ"))

# prints AAA

If you wish to only replace part of the message, you can include optional startFrom and endAfter strings, which will find the first instance of those strings; within the message; and will only replace characters within those boundaries, including the startFrom and endAfter strings.

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.empty())

# prints ÀÁÂ iii eee
(everything after and including íîï is replaced)

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.empty(), Optional.of(íîï))

# prints AAA iii éèê
(everything before and including íîï is replaced)

System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.of(íîï))

# prints ÀÁÂ iii éèê
(only íîï is replaced)

Configuration

There are 3 types of configuration available;

  1. Char-to-char - replace a single character with another

  2. list-to-char - replace any character in a list to a defined character

  3. regex-to-char - replace any character in a regular expression to a defined character

Any combination of these 3 is possible if required.
character-replacements {
    char-to-char-replacements = [
        {character = À, replaceWith = A, replaceInDomOnly = true/false},
        {character = ï, replaceWith = i, replaceInDomOnly = true/false},
    ]
    list-to-char-replacements = [
        {list = [È,É,Ê,Ë], replaceWith = E, replaceInDomOnly = true/false}
    ]
    regex-to-char-replacements = [
        {regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true/false}
    ]
}
Config Type Description

character-replacements.char-to-char-replacements.character

Char

Character to be replaced

character-replacements.char-to-char-replacements.replaceWith

Character

Character replacement

character-replacements.char-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.list-to-char-replacements.list

List<Character>

List of character’s to be replaced

character-replacements.list-to-char-replacements.replaceWith

Character

Character replacement for list

character-replacements.list-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM

character-replacements.regex-to-char-replacements.regex

String

Regular expression for replaced characters

character-replacements.regex-to-char-replacements.replaceWith

Character

Character replacement for regex

character-replacements.regex-to-char-replacements.replaceInDomOnly

Boolean

Flag indicating whether the replacement should happen only in the text nodes of the DOM