A format for Graph Streaming
Posted: 08 Jun 2010 08:55
Hi all,
André (GSOC Student for Graph Streaming) and I started a discussion about how to format graphs in order to be streamed. Help is more than welcome, as this is a difficult question.
The Graph Streaming project aims to be able to stream data in and out Gephi, with the ideal use-case of two Gephi instances synchronizing over the network. That asks many questions and we though that is concerning the future of the GEXF format, as one of it's goal is to fulfill dynamic networks needs.
The question is simple, what format should we use to stream graphs over a network? The idea behind graph streaming is not only pushing, but also updates and deletes. Therefore we face a synchronization and serialization problem.
Some of global aims we identified for such a format
- The format should support graph topology and attributes
- It has to have an event model, where ADD, DELETE and UPDATE are event types.
- It could have additional events, like CLEAR, to avoid millions of deletes
- The format should be compact and minimize network transfer
About the serialization problem we think we could propose a GEXF format working with JSON. The idea is not to change GEXF but to propose a new format, inspired from GEXF but having different aims. JSON would lower the size of messages a lot and fit more to the "network world" than XML. Do you agree and how do you think that is possible? Please share your experience about JSON.
For synchronization issues, feel free to comment this point as well. Read the wiki page and imagine possible use cases. For instance if several instances of Gephi synchronize, how to make versionning and keep the data consistent and up to date everywhere? Do you have in mind other projects or articles that could help to see issues?
André (GSOC Student for Graph Streaming) and I started a discussion about how to format graphs in order to be streamed. Help is more than welcome, as this is a difficult question.
The Graph Streaming project aims to be able to stream data in and out Gephi, with the ideal use-case of two Gephi instances synchronizing over the network. That asks many questions and we though that is concerning the future of the GEXF format, as one of it's goal is to fulfill dynamic networks needs.
The question is simple, what format should we use to stream graphs over a network? The idea behind graph streaming is not only pushing, but also updates and deletes. Therefore we face a synchronization and serialization problem.
Some of global aims we identified for such a format
- The format should support graph topology and attributes
- It has to have an event model, where ADD, DELETE and UPDATE are event types.
- It could have additional events, like CLEAR, to avoid millions of deletes
- The format should be compact and minimize network transfer
About the serialization problem we think we could propose a GEXF format working with JSON. The idea is not to change GEXF but to propose a new format, inspired from GEXF but having different aims. JSON would lower the size of messages a lot and fit more to the "network world" than XML. Do you agree and how do you think that is possible? Please share your experience about JSON.
For synchronization issues, feel free to comment this point as well. Read the wiki page and imagine possible use cases. For instance if several instances of Gephi synchronize, how to make versionning and keep the data consistent and up to date everywhere? Do you have in mind other projects or articles that could help to see issues?