Google protocol buffers vs java serialization
I am working on a distributed system at work and I was trying to decide on a remoting and data transfer scheme for it.
Our system had to very performance intensive so XML/RPC over XML (conventional SOAP based web services or RESTful web services) were out of the question. The choice was between vanilla java serialization over HTTP using a remoting framework (spring or jboss) or Google protocol buffers. Protocol buffers is a cross-language (has bindings in C++, java and pyton) serialization framework that provides very efficient object representation and out performs XML binding by orders of magnitude. The question therefore was choosing one over the other. But which one was better?
The factors that make one better than the other were:
- Serialization performance and efficiency
- Payload
- Programming model
- Testability and maturity
- Flexibility
We are a 100% java shop so the cross-language serialization did not appeal to me. As far as maturity is concerned spring/RMI etc. are much more mature and provide good error handling (serialization of exception traces) etc. So it came down to performance and programming model. I looked around on the Internet and found nothing so I decided to write a test from scratch to find out what is going on.
The plan was to have a similar object created, serialized and de-serialized many times to figure out what the numbers look like. Serialization in java is notoriously slow so I expected protobuf to blow me away.
Here is the sample java object:
public class Host implements Serializable {
int id;
String os;
String name;
String vendor;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getOs() {
return os;
}
public void setOs(String os) {
this.os = os;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getVendor() {
return vendor;
}
public void setVendor(String vendor) {
this.vendor = vendor;
}
}
The equivalent proto file:
message Host {
required int32 id = 1;
required string name = 2;
required string os = 3;
required string vendor = 4;
}
After compiling this proto file - I quickly wrote this test:
public class Main {
/**
* @param args
*/
public static void main(String[] args) throws Exception {
File tempDir = new File(System.getProperty("java.io.tmpdir"));
StopWatch sw = new StopWatch("Proto Buffer");
List protoHosts = new ArrayList();
sw.start("Constructing 1000 objects");
for (int x = 0; x<1000; x++) {
protoHosts.add(getRandomTestHost());
}
sw.stop();
sw.start("Serialization");
int x = 1;
for (Test.Host host : protoHosts) {
File file = new File(tempDir, "tempFile" + x++);
FileOutputStream str = new FileOutputStream(file);
//serialize
host.writeTo(str);
str.flush();
str.close();
}
sw.stop();
long payLoadProto = getPayload();
sw.start("Deserialization");
x = 1;
for (Test.Host host : protoHosts) {
File file = new File(tempDir, "tempFile" + x++);
InputStream is = new FileInputStream(file);
Test.Host prHost = Test.Host.parseFrom(is);
is.close();
assert prHost.getVendor().equals(host.getVendor());
}
sw.stop();
StopWatch sw2 = new StopWatch("Conventional Serialization");
List hosts = new ArrayList();
sw2.start("Constructing 1000 objects");
for (x = 0; x<1000; x++) {
hosts.add(getRandomHost());
}
sw2.stop();
sw2.start("Serialization");
x = 1;
for (Host host : hosts) {
File file = new File(tempDir, "tempFile" + x++);
ObjectOutputStream os = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
os.writeObject(host);
os.flush();
os.close();
}
sw2.stop();
long payloadConventional = getPayload();
sw2.start("Deserialization");
x = 1;
for (Host host : hosts) {
File file = new File(tempDir, "tempFile" + x++);
ObjectInputStream is = new ObjectInputStream(new FileInputStream(file));
Host prHost = (Host)is.readObject();
is.close();
assert prHost.getVendor().equals(host.getVendor());
}
sw2.stop();
System.out.println(sw.prettyPrint());
System.out.println(sw2.prettyPrint());
System.out.println("payload proto buffer : " + payLoadProto);
System.out.println("payload conventional : " + payloadConventional);
}
public static long getPayload() {
File tempDir = new File(System.getProperty("java.io.tmpdir"));
long payload = 0;
for (int x=1; x<=1000; x++) {
File file = new File(tempDir, "tempFile" + x++);
payload += file.length();
}
return payload;
}
public static Test.Host getRandomTestHost() {
int someInt = 3423;
Test.Host host = Test.Host.newBuilder().setId(someInt).setName(random() + "")
.setOs(random() + "").setVendor(random() + "").build();
return host;
}
public static Host getRandomHost() {
Host host = new Host();
host.setId(3423);
host.setName(random() + "");
host.setOs(random() + "");
host.setVendor(random() + "");
return host;
}
}
And here is the result:
StopWatch 'Proto Buffer': running time (millis) = 1531 ----------------------------------------- ms % Task name ----------------------------------------- 00140 009% Constructing 1000 objects 01110 073% Serialization 00281 018% Deserialization StopWatch 'Conventional Serialization': running time (millis) = 1578 ----------------------------------------- ms % Task name ----------------------------------------- 00015 001% Constructing 1000 objects 01188 075% Serialization 00375 024% Deserialization payload proto buffer : 31917 payload conventional : 74917
So here are the conclusions:
- The serialization and de-serialization is not that different. Proto buffers are slightly faster
- Object creation in protocol buffers is very slow
- The payload generated by protocol buffers is excellent - serialized java object are more than twice the size due to associated metadata
- Programming model of protocol buffers is extremely clumsy - I don’t understand what the builder business is all about.
In the end I have decided to stick to java serialization mostly because we have an established object model and replicating that information in .proto files and generating a parallel object model does not seem right. I hope this helps any person trying to evaluate protocol buffers for their own project.

Jasper Siepkes:
Nice comparison!
A quick note however about the programming model of protocol buffers which you call “is extremely clumsy - I don’t understand what the builder business is all about.”. The builder pattern is not something that is Protocol Buffers specific. The pattern itself is described in “Effectiva Java” by Joshua Bloch. I won’t go into details but its a bit like the composite pattern and has some advantages over using constructors and/or setter methods.
May 27, 2009, 5:02 am