Proposals to achieve better reactivity using different CP programming styles
The execution of the CP can be conceptually divided in three logical phases (not necessarily consecutive):
- Cache Update: before running on a node, the CP collects the node state and updates its global view of the system
- Decision Algorithm Execution: the CP runs the decision algorithm, according to the current values in the node state cache.
- Triggering Actions: the outcome of the decision algorithm execution is a set of orders issued to each node in the coordination problem.
Due to its sequential nature, the decision algorithm is deaf to any change in the node state (e.g. a node has changed its position in the meanwhile). This means that the computation is based upon the last global view available at the beginning of the decision algorithm phase. Any subsequent update is known to the PIM only at the next cache update, but they are of no use to the CP before the next execution of the decision algorithm. We are confident that, some coordination problems may be implemented so that the decision algorithm is constantly fed up by global view updates without impairing the semantics of the algorithm. Yet we are not going to tread this path because of its peculiarity. What we want to do here is collecting some new CP programming guidelines to better leverage the potentialities of the PIM architecture.
An outstanding strength of the PIM is assumed to be its 'global reactivity. Following the current CP programming approach, whereas the time spent executing the decision algorithm is longer than the round-trip time, the reactivity of the system is not limited by the PIM architecture, but by the CP decision algorithm (don't forget that the CP is deaf during the decision algorithm execution phase!). This bottleneck is particularly bad because it could lead to poor reactivity times, no matter how much we improve our underlying architecture.
Starting from the above considerations, we came up with three architectural proposals:
- the Local Model
- the Interrupt Model
- the Reactive Model (very experimental)
This model was born from the following observation:
As stressed before, the decision process is influenced by any update to the state of nodes only at the beginning of the CP algorithm. It is therefore deaf to any update occurred during the decision algorithm. The logical outcome is that the sole purpose of keeping the CP running from one node to another is to prepare the next global view (to be used during the next decision phase) and to give the last computed orders to the nodes, exploiting the output cache.
Starting from this assumption, we can conclude that we can reverse the model, i.e. run the decision algorithm on each node in parallel and let the CP cycle between the nodes just for the purpose of updating the local views and choosing which local results must be chosen. Be aware that each local result isn’t applied blindly to the local node, but has to be validated by the CP and thus extended to the whole system or ignored. This proposal should be seen as a sort of pipeline mechanism. In the traditional approach only the node running the CP is active, while the others are idle. In this approach we can reverse the situation, making all the nodes active and using the CP to coordinate the local results. The CP can decide which solution deserves to be extended in multiple ways. For example, it can look at the differences between its updated global view and the outdated global view of the node, or looking at the difference between the proposed decision and the last confirmed orders. The underlying principle is that if a proposed solution doesn’t differ greatly from the last validated one, the CP should refuse the proposal. In this way the CP might react very fast to any critical event, while react more slowly to ordinary events that wouldn’t remarkably change the overall behavior.
This model is similar to the current one, but it introduce the idea of critical event. When a critical event occurs, the decision algorithm is forced to restart from the beginning. Such events are supposed to be rare and particularly dangerous to the system (e.g. node near to collision) and for this reason they should be chosen very carefully. The idea is to invalidate the current ongoing "decision algorithm computation" because that computation has been triggered in an overly outdated context (one that is dramatically different from the current context). In this particular situation, the reactivity of the PIM would be slowed down by a computation that that will surely be too outdated when it will be completed. These conditions should be easily checked in the
arriveOnNode(int) method, without much effort or computational overhead. The decision algorithm can be stopped in two ways:
- wrapping around the code in a
try/catchblock and throwing an exception when a critical condition happens
- disrupting the past CP execution state (i.e. call stack), thus forcing the decision algorithm to restart from the beginning (
run()method) with refreshed data.
The following pictures are an example of the reactiveness improvement that this model can bring:
This model requires the CP programmer to adopt a new programming style, similar to real time programming. The idea is similar to a finite state machine (FSM) or to a sort of reactive model. The decision algorithm phase will have to be designed in a piecemeal fashion, i.e.
- arrive on node k
- update the global view, reading the state of node k
- run the decision algorithm on top of the current state view
- issue the computed orders without any caching mechanism, directly to the node
- leave the node when the job is done (proactive migration)
This means also abandoning the reactive migration by getting rid of the timeslice.
This way, the CP would be able to execute instantly any taken decision, in the fastest possible way. This approach is feasible only if we can write a very fast decision algorithm capable of taking decisions only for a single node, based on the view of the system and on the node state. This proposal follows the idea of bringing the decision close to the data. Two kinds of criticisms can be done to this model:
- from our standpoint, it makes strong migration lose its strategic role, because we don’t have to move the computational state of the CP (its frames), but only the global view of the system (weak mobility).
- from the programmer's standpoint, writing a distributed algorithm like a FSM is considerably tougher than writing a simple sequential CP thread.
Please, note that this approach is still work in progress and must be motivated in the future.