Character Replacement
The character replacement module is a mechanism that uses configurable character set rules to sanitize messages passing through a scheme pack. It supports both simple one-to-one character replacements (e.g., ä > a as commonly used in English) and more complex mappings (e.g., ä > ae or ß > ss as preferred in German). These rules are customizable to align with language specific conventions or project requirements.
Example usage
The module uses spring configuration to define the rules used. The interface CharacterReplacer has a default implementation provided by spring, so to use it, simply add the csm-character-replacement module to maven and then dependency injection will pull the implementation through.
Then we simply pass the message (in string format) to our characterReplacer, this will return a sanitized string.
System.out.println(characterReplacer.replaceCharacters("ÀÁÂ"))
# prints AAA
Configuration
custom-replacer
There are three types of configuration available when specifying a custom-replacer:
-
Char-to-char - replace a single character with another
-
list-to-char - replace any character in a list to a defined character
-
regex-to-char - replace any character in a regular expression to a defined character
| You can specify one or more. Any combination of these three is possible. |
Sample Config
character-replacements {
custom-replacer {
enabled = true
char-to-char-replacements = [
{character = À, replaceWith = A},
{character = ï, replaceWith = i, replaceInDomOnly = true},
]
list-to-char-replacements = [
{list = [È, É, Ê, Ë], replaceWith = E}
]
regex-to-char-replacements = [
{regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true}
]
}
}
Config Fields
| Config | Type | Default | Description |
|---|---|---|---|
character-replacements.custom-replacer.enabled |
Boolean |
false |
Flag to enable config |
character-replacements.custom-replacer.char-to-char-replacements.character |
Character |
Character to be replaced |
|
character-replacements.custom-replacer.char-to-char-replacements.replaceWith |
Character |
Character replacement |
|
character-replacements.custom-replacer.char-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.custom-replacer.list-to-char-replacements.list |
List<Character> |
List of character’s to be replaced |
|
character-replacements.custom-replacer.list-to-char-replacements.replaceWith |
Character |
Character replacement for list |
|
character-replacements.custom-replacer.list-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.custom-replacer.regex-to-char-replacements.regex |
String |
Regular expression for replaced characters |
|
character-replacements.custom-replacer.regex-to-char-replacements.replaceWith |
Character |
Character replacement for regex |
|
character-replacements.custom-replacer.regex-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
lookup-table-replacer
lookup-table-replacer uses a lookup table specified in a CSV to look up what string each character should be replaced with. Keeping the replacements in the CSV decreases the clutter in the configuration files and is an easy format to produce and maintain with other tools.
CSV Format
The following is an example of the csv
character codepoint,replacement \\u00c6,A \\u00c7,C \\u00c8,E \\u00c9,some string
-
Both columns are required
-
character codepoint— The Unicode codepoint of a character to replace. The string must start with\\ufollowed by the hex value of the codepoint -
replacement— any string that you wish to replace the character with
The column binding is flexible enough so that it can accommodate unknown columns as well. This may be useful to make the CSV more human-readable. Here’s an example with some extra columns that are purely informational.
character,description,character codepoint,replacement Æ,LATIN CAPITAL LETTER AE,\\u00c6,A Ç,LATIN CAPITAL LETTER C WITH CEDILLA,\\u00c7,C È,LATIN CAPITAL LETTER E WITH GRAVE,\\u00c8,E É,LATIN CAPITAL LETTER E WITH ACUTE,\\u00c9,some string
Fallback Replacement
An optional fallback replacement string can be provided. This string will be used as the replacement when a matching character cannot be found in the CSV source nor the overrides.
Sample Base Config
character-replacements {
lookup-table-replacer {
enabled = true
csv-source = "file:/filesystem/path/sample-config.csv"
fallback-replacement = "."
overrides += {character-codepoint = "\\u00c6", replacement = "some override"}
overrides += {character-codepoint = "\\u00c9", replacement = "another override"}
}
}
Sample Downstream Config (Merging with the base config)
character-replacements {
lookup-table-replacer {
overrides += {character-codepoint = "\\u00ca", replacement = "a third override"}
}
}
By only specifying the overrides, we are piggybacking off of the existing config and appending a third override which will be applied to the CSV source, in addition to the previous two.
Sample Downstream Config (Wiping out the base config)
character-replacements {
lookup-table-replacer = null (1)
lookup-table-replacer {
enabled = true
csv-source = "classpath:completely_different_source.csv"
overrides += {character-codepoint = "\\u0062", replacement = "ONLY OVERRIDE"} (2)
}
}
-
This wipes out the base config, meaning every single field in
lookup-table-replacerare removed -
The resulting
overrideslist will have a size of one as this is the only override that survives the config merge
Config Fields
| Config | Type | Default | Description |
|---|---|---|---|
character-replacements.lookup-table-replacer.enabled |
Boolean |
false |
Flag to enable config |
character-replacements.lookup-table-replacer.csv-source |
String |
The path to the CSV source. Must be Specified in the Spring |
|
character-replacements.lookup-table-replacer.fallback-replacement |
String |
|
Replacement when the character encountered by the replacer does not have an entry in the lookup table. When not specified, the character is not replaced. |
character-replacements.lookup-table-replacer.overrides |
List |
|
A list of overrides to apply onto the CSV source |
character-replacements.lookup-table-replacer.overrides.character-codepoint |
String |
Same as the |
|
character-replacements.lookup-table-replacer.overrides.replacement |
String |
Same as the |
Deprecated Configuration
| The following configuration is deprecated and exists only for backward compatibility. It will be unsupported in the 2026.3.0 release. |
| If both deprecated and new configurations are specified, the deprecated one will be loaded to ensure backwards compatibility. |
character-replacements {
char-to-char-replacements = [
{character = À, replaceWith = A},
{character = ï, replaceWith = i, replaceInDomOnly = true},
]
list-to-char-replacements = [
{list = [È, É, Ê, Ë], replaceWith = E}
]
regex-to-char-replacements = [
{regex = "[\\p{InLatin-1Supplement}]", replaceWith = ., replaceInDomOnly = true}
]
}
| Config | Type | Default | Description |
|---|---|---|---|
character-replacements.char-to-char-replacements.character |
Char |
Character to be replaced |
|
character-replacements.char-to-char-replacements.replaceWith |
Character |
Character replacement |
|
character-replacements.char-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.list-to-char-replacements.list |
List<Character> |
List of character’s to be replaced |
|
character-replacements.list-to-char-replacements.replaceWith |
Character |
Character replacement for list |
|
character-replacements.list-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
character-replacements.regex-to-char-replacements.regex |
String |
Regular expression for replaced characters |
|
character-replacements.regex-to-char-replacements.replaceWith |
Character |
Character replacement for regex |
|
character-replacements.regex-to-char-replacements.replaceInDomOnly |
Boolean |
false |
Flag indicating whether the replacement should happen only in the text nodes of the DOM |
Error Handling
Startup Errors
The configuration is loaded on startup and will fail application startup if there are misconfigurations such as
-
Setting the enabled flag without specifying configuration
-
Setting only one of the required fields in a replacement (e.g., setting
regex, but notreplaceWith) -
Invalid type (e.g., specifying a string of two or more characters for
Charactertypes) -
Invalid format (e.g., not conforming to the Unicode codepoint format specified)
-
Failing to load a file from the classpath or file system