The current approach establishes a static binding between each node and its associated number, as declared in the configuration file. The node number is currently used by the CP code to take decisions about where to go and what to do. For example, the CaptureTheFlag algorithm uses the node number to know where the CP is currently running and to move the CP to a particular node at some point.
The current approach cannot handle node failures and new nodes joining the PIM, because the number of nodes (N) and the index of each node (k) would change dynamically during the execution of the coordination algorithm.
We claim that in the future PIM programming model the role played by both N and k will have to change:
- N will still be the current overall number of nodes. However, it can change over time and the algorithm should always check its last known value, before using it. It would be moreover advisable to specify a minimum and a maximum value for N. The minimum value (Nmin) implies that, when N goes lower than Nmin, the algorithm should stop temporarily, waiting for N to be at least Nmin. The maximum value (Nmax) will cause that a newly arrived node will be either ignored by the PIM or cached into a "pending nodes" list.
- the node number k will instead be dismissed, simply because it violates the key feature of the PIM (see the AAAI-08 article): To attain robustness to component failure, PIM programs must be written in terms of component capabilities, rather than specific components. This way, if a component is disabled, then another component with similar capabilities can take its place. In order to realize this idea, we suggest that a new approach is used instead, which relies upon the current state of the node rather than its associated node number. Each node is then characterized by what it can do rather than who it is.
If we follow the two rules above, the model can attain the idea that the physical connection among nodes has no computational significance
The idea can be implemented as follows:
- the CP has to extend an abstract CoordinatingProcess class, with a "boolean shouldBeSkipped()" method. During its cyclic jumping among the N nodes (N dynamic!), the CP is deserialized by the PIM runtime on the target node, who has to decide whether the CP should be run locally or sent directly to the next one. This decision depends on the return value of the shouldBeSkipped() method, which is supposed to check the current node state, trying to figure out whether or not this is the desired node. Please note that, in the light of what said before, the waitToArrivedOnNode(int index) may be dropped or simply replaced by a more generic waitToArriveOnNextNode(), without specifying a specific index.
- the concept of node number is replaced by the more flexible UUID given by the GroupManager component (GM). Using the UUID, each node in the PIM can maintain a dynamic peers list, containing the UUIDs of all the nodes. This list is of course a partial view of the system, which may not always be up-to-date. Thanks to this expedient, we can avoid using the configuration file to identify each node in the PIM.