Documentation for a newer release is available. View Latest

Esta página no está disponible actualmente en Español. Si lo necesita, póngase en contacto con el servicio de asistencia de Icon (correo electrónico)

Automated Retries

Recovery of the 'client implementation' can take shape in one of two ways;

Action retries - retry action after if transaction’s state has not changed in X seconds
Action revivals - retry action on newly started cluster shard, and using an exponential backoff starting with the initial duration of X seconds any transactions in a non-terminal state will be retried.

Action Retries

Action Retries and Action Timeouts are only applicable to External Domains and not Domain Functions. Handling of transient errors that may occur in domain functions should be handled in the adapter code.

Action retries are used to prevent transaction’s remaining stuck in a state, by issuing retries if an action does not change state within an acceptable (configurable) duration.

Retries are only cancelled for a completing request/response. For more information please see the Requests section of Concepts.

Interaction with Action timeouts

For action timeouts see Scheduling.

Timeouts would usually result in a new state (Terminal) and therefore would not be subjected to retrying.

When timeouts cause a new state (non-terminal) then a retry would be attempted on the ActionTimeout if it remains stuck in its state.

Action Retry Configuration

The configuration utilises the configuration policy of the ipf-scheduler (see Scheduling for configuration and action timeouts).

There are 3 configuration items necessary for action retries;

initial-retry-interval - the initial duration between retries, subsequent retries multiplied by a backoff factor of 2, i.e. if duration is 1 then 1,2,4,8.
max-retries - the maximum number of retries to attempt, i.e. [initial] + [max-retries]. 0 retries will effectively turn this functionality off.
jitter-factor - The percentage of randomness to use when retrying actions, default is 0.2.

For all Actions

application.conf

Any.Any.Any.timeout-duration=10s
Any.Any.Any.initial-retry-interval=3s
Any.Any.Any.max-retries=2
Any.Any.Any.jitter-factor=0.2

For Specific Action

application.conf

Any.Any.Any.timeout-duration=10s
Flow1.State1.Action1.initial-retry-interval=3s
Flow1.State1.Action1.max-retries=2

Using the above configuration would create the following effect for Action1;

The following assumes ActionTimeout will lead to a terminal state, or at least a change of state.

Time (t+seconds)	State	Action
0	State1	Action1
3	State1	ActionRetry (Action1)
6	State1	ActionRetry (Action1)
10	Timeout (or whatever state ActionTimeout causes)	ActionTimeout

Time (t+seconds)

State

Action

State1

Action1

State1

ActionRetry (Action1)

State1

ActionRetry (Action1)

Timeout (or whatever state ActionTimeout causes)

ActionTimeout

Action Revival

Action revival is designed to recover transactions on a failed node. They differ from Action retries in the fact that they only fire when the cluster is started or re-started, a scenario not covered by the Action retries.

Revival will utilise action retries and continue from any retry attempt history, i.e. if a behaviour had already attempted 1 of say the configured 2 attempts then only 1 retry will be attempted.

The revival process will not attempt to recover a transaction in INITIAL or any terminal states. This was to protect the system from attempting to recover on all newly started shards.

The revival process will not attempt revival if the state has changed before the actionRevivalOffset (see configuration) has complete, as the transaction will no longer be deemed stuck.

Action revival is based upon Akka recovery signals this means that a recovery of state will occur when any of the following happen;

An Event Sourced Behaviour (ESB) is initialised for the first time
An ESB is revived after having been passivated (happens automatically after 120s by default)
An ESB was killed by an exception
An ESB is rebalanced and therefore restarted on another node

Action Revival Configuration

There are 2 important configurations required to activate revival;

remember-entities - an Akka configuration item which causes Akka to automatically restart shards upon a shard restart. See akka docs.
action-recovery-delay - an offset configured as duration which is imposed upon the system to allow any actions to change state before sending additional requests. An offset of 0 will turn this functionality off.

application.conf

akka.cluster.sharding.remember-entities = on

application.properties

ipf.behaviour.config.action-recovery-delay=3s