Understanding the evolvable POF objects in Coherence

Understanding the evolvable POF objects in Coherence

Serialization and deserialization are cornerstone elements in a cache cluster, as cached objects are transferred over the wire in the form of serialized data within the cluster. Coherence supports the Portable Object Format (POF), a serialized format of data more efficient, more compact, and language agnostic, compared to the standard serialized format in Java.

Basically, POF uses type identifiers to denote classes, and indexes to denote fields in classes. Each user-defined class is associated with a type identifier, a serializer and a deserializer. POF also supports evolvable objects, which is the topic we will discuss in this post.

The evolvable objects allow the hot replacement of the corresponding classes while these objects are still alive in a Coherence cluster. Say, we have a cluster running several nodes, and there have been evolvable Person objects in the cluster. Now suppose we have a requirement to modify the Person class, which is analogous to an ALTER TABLE in a relational database. What we will do is: We shut down one of the nodes in the cluster, replace the Person class with the modified one, and then bring up the node to join in the cluster again. The same procedure is repeated for each of the nodes, including the application nodes (They are part of the cluster, but usually do not provide storage). Coherence allows this, and the cluster is still functioning across the whole process; No shutdown of the cluster, no data are lost.

The trick is that when a node is shut down, all the objects on the node are transferred to other nodes; When the node comes back, the objects in the cluster are redistributed to include the node, and then another node can be shut down. What POF helps here is to provide the backward compatibility when the node comes back to the cluster with the new Person class, and to provide the forward compatibility when other nodes receive objects from the node with the new Person class during a redistribution.

Before we go into the details of the backward and the forward compatibility, let us define the implementation version (ImplVersion), which is the version of the Person class. We assume that the ImplVersion goes from lower to higher when changes are made to the Person class; Let us also define the data version (DataVersion), which is the version of the serialized data of a Person object. A deserialized Person object has the same DataVersion as the serialized data it was deserialized from, no matter the ImplVersion.

If no object was evolvable, ImplVersion and DataVersion could have been a single value all the time. But because of the evolvable objects, there can be multiple ImplVersion and DataVersion in a cluster. The backward compatibility is that implementations with higher ImplVersion should 1) provide reasonable defaults for object attributes not available in serialized data with lower DataVersion, and 2) keep the wholeness of serialized data even for the parts not used in the implementations, because those parts may still be needed by implementations with lower ImplVersion. The forward compatibility is that implementations with the lower ImplVersion should keep the wholeness of serialized data even for the parts not recognized by the implementations, because those parts may come from implementations with higher ImplVersion. Both the backward and the forward compatibility can be achieved by just passing through the unused and the unrecognized serialized data, which are interchangeably called the Remainder and the FutureData in POF. That is why we see object.setFutureData(reader.readRemainder()) in serializers and writer.writeRemainder(object.getFutureData()) in deserializers.

It looks like the backward and the forward compatibility requires that no serialized data are ever removed, limiting the changes we could make on a class to only add to the serialized data. That is true, but the implementation of the class still has the flexibility of choosing which fields in the class are exposed as the attributes through the getters and the setters.

Only add to the serialized data of objects implies that a serialization index can only represent one field in a class across evolution: The index cannot represent another field; Neither the type nor the meaning of the field should be changed. Otherwise, it is impossible for implementations with lower ImplVersion to know what the same indexes represent in serialized data with higher DataVersion, potentially breaking the backward and the forward compatibility.

Only add to the serialized data also implies that the DataVersion of the serialized data produced by the serialization of an object (NewDataVersion) is at least the DataVersion of the object, because the backward and the forward compatibility requires only add to the serialized data; There is no going back of versions. On the other hand, implementations produce serialized data. Thus the NewDataVersion should be at least the ImplVersion. Combining the two, the NewDataVersion of serialized data produced by the serialization of an object should be set to the higher one between the DataVersion and the ImplVersion. That is why we see writer.setVersionId(Math.max(object.getImplVersion(), object.getDataVersion())) in serializers.

Now that DataVersion is maintained across evolution, applications can be aware of the differences between DataVersion and ImplVersion, then act accordingly. Note that, ImplVersion changes when we replace classes in clusters.

Below summerizes how to modify an evolvable class without breaking the backward and the forward compatibility:

Table 1.  Summary of modifications without breaking the backward and the forward compatibility.
Modification Steps
Applications use getImplVersion()/getDataVersion() to be aware of different versions.
Add an attribute
  1. Increment the ImplVersion.
  2. Add a private field.
  3. Create the getter and the setter for that field.
  4. Assign the next increasing index for that field.
  5. Append that field in the serializer and the deserializer.
Remove an attribute
  1. Increment the ImplVersion.
  2. Remove only the getter and the setter for the attribute.
Change an attribute
  1. Increment the ImplVersion.
  2. If only the name of the attribute (getter/setter) is changed, stop here.
  3. Follow "Remove an attribute" without incrementing the ImplVersion.
  4. Follow "Add an attribute" without incrementing the ImplVersion.
Add a sub-class
  1. If the ImplVersion is not global, implement getImplVersion() in the sub-class as, e.g., return this.ImplVersion + super.getImplVersion().
  2. Assign a fixed index idx in the sub-class for the super-class.
  3. Serialize/deserialize the super-class using PofWriter.createNestedPofWriter(idx) / PofReader.createNestedPofReader(idx) in the serializer/deserializer of the sub-class to let them evolve independently.
  4. Assign the next increasing index for that field.
  5. Append that field in the serializer and the deserializer.

To test an evolvable class, we can setup two projects containing the older and the newer implementation, respectively, then exchange serialized data between the projects. To simulate a node with the newer implementation being put back to a cluster with the older implementation, the older implementation can instantiate an object and pass the serialized data to the newer implementation. The newer implementation deserializes the data and then passes back new serialized data after reserialization. The older implementation deserializes the new data to a resulting object, and compares the equality with the original object; To simulate an object instantiated by the newer implementation being transferred in a cluster with the older implementation, simply swap the older and the newer implementation in the previous simulation.

This post only discusses the support for evolvable objects in POF, please read the manual for more information.

Comments

Hi Heshan, Quite useful by Vidyasagar Venkatachalam (not verified)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options

To prevent automated spam submissions leave this field empty.