Reducing network traffic, and, as a result, power consumption, is one of our primary goals. For doing so, we have to work in two different directions:
- reducing the CP execution state (e.g. with a new caching mechanism);
- avoiding the use of ping or keep-alive packets as much as we can.
As for the latter point, we are planning to get rid of the GroupManager, using a more economic approach to detect the arrival and departure of nodes. Since these events should not be so frequent, we prefer to reduce the network traffic during the normal execution time, even risking to enlarge it during critical (and rare) events. In particular, the new NodeMonitor will behave in this way:
- a ping packet is broadcasted if the CP doesn't arrive at the local node in a given time
- a ping packet is broadcasted every time a new node is detected
- a ping packet is broadcasted if some other module requires to do so
Moreover, the MigrationManager during the sending phase will behave this way:
- broadcast the UUID of the chosen next node
- broadcast the serialized CP execution state
- wait for communication from the chosen next node
- retransmit lost fragments
- consider next node dead if it doesn't answer at all (and choose another next node)
This way, the system will recover slowly in case of multiple node failures or sudden network partitioning (see Tradeoffs) but it will perform better in the normal use case.