Character Replacement
The character replacement module is a mechanism that uses configurable character set rules to sanitize messages passing through a scheme pack. It supports both simple one-to-one character replacements (e.g., ä > a as commonly used in English) and more complex mappings (e.g., ä > ae or ß > ss as preferred in German). These rules are customizable to align with language specific conventions or project requirements.
Example usage
The module uses spring configuration to define the rules used. The interface CharacterReplacer has a default implementation provided by spring, so to use it, simply add the csm-character-replacement module to maven and then dependency injection will pull the implementation through.
Then we simply pass the message (in string format) to our characterReplacer, this will return a sanitized string.
System.out.println(characterReplacer.replaceCharacters("ÀÁÂ"))
# prints AAA
If you wish to only replace part of the message, you can include optional startFrom and endAfter strings, which will find the first instance of those strings; within the message; and will only replace characters within those boundaries, including the startFrom and endAfter strings.
System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.empty())
# prints ÀÁÂ iii eee
(everything after and including íîï is replaced)
System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.empty(), Optional.of(íîï))
# prints AAA iii éèê
(everything before and including íîï is replaced)
System.out.println(characterReplacer.replaceCharacters("ÀÁÂ íîï éèê", Optional.of(íîï), Optional.of(íîï))
# prints ÀÁÂ iii éèê
(only íîï is replaced)
Configuration
There are 3 types of configuration available:
-
Char-to-char - replace a single character with another
-
list-to-char - replace any character in a list to a defined character
-
regex-to-char - replace any character in a regular expression to a defined character
| Any combination of these 3 is possible if required. |
character-replacements {
char-to-char-replacements = [
{character = À, replaceWith = A, replaceInDomOnly = true/false},
{character = ï, replaceWith = i, replaceInDomOnly = true/false},
]
list-to-char-replacements = [
{list = [È,É,Ê,Ë], replaceWith = E, replaceInDomOnly = true/false}
]
regex-to-char-replacements = [
{regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true/false}
]
}
| Config | Type | Description |
|---|---|---|
character-replacements.char-to-char-replacements.character |
Char |
Character to be replaced |
character-replacements.char-to-char-replacements.replaceWith |
Character |
Character replacement |
character-replacements.char-to-char-replacements.replaceInDomOnly |
Boolean |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.list-to-char-replacements.list |
List<Character> |
List of character’s to be replaced |
character-replacements.list-to-char-replacements.replaceWith |
Character |
Character replacement for list |
character-replacements.list-to-char-replacements.replaceInDomOnly |
Boolean |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.regex-to-char-replacements.regex |
String |
Regular expression for replaced characters |
character-replacements.regex-to-char-replacements.replaceWith |
Character |
Character replacement for regex |
character-replacements.regex-to-char-replacements.replaceInDomOnly |
Boolean |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |