From Agentgroup
Revision as of 00:29, 10 June 2008 by Hanza (Talk | contribs)

Jump to: navigation, search

Reducing network traffic, and, by consequence, power consumption, is one of our primary objectives. For doing so, we have to work on two different ways:

  • reducing the CP execution state (e.g. with a new caching mechanism)
  • avoiding the use of ping or keep-alive packets as much as we can

Regarding the last point, we are evaluating to get rid of the GroupManager, using a more economic apporach to detect the arrival or departure of nodes. Since these events should not be so frequent, we prefer to reduce the network traffic during the normal execution time, even risking to enlarge it during critical (and rare) events. In particular, the new NodeMonitor will behave in this way:

  • a ping packet is broadcasted if the CP doesn't arrive at the local node in a given time
  • a ping packet is broadcasted every time a new node is detected
  • a ping packet is broadcasted if some other modules requires to do so

Moreover, the MigrationManager during the sending phase will behave this way:

  • broadcast the UUID of the chosen next node
  • broadcast the serialized CP execution state
  • wait for communication from the chosen next node
  • retransmit lost fragments
  • consider next node dead if it doesn't answer at all (and choose another next node)

This way, the system will recover slowly in case of multiple node failures or sudden network partitioning (see Tradeoffs) but it will perform better in the normal use case.