Dynamic attributes and statistics
-
- Gephi Community Manager
- Posts:964
- Joined:09 Dec 2009 14:41 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
This is the thread for asking more details about the Dynamic Attributes and Statistics proposal.
Re: Dynamic attributes and statistics
Hi,
I would like to know what do you think about my idea concerning a data structure hosting dynamic attributes. I'll try to explain it briefly.
First of all I would like to use the modified PR-Tree data structure described here: http://moezchaabouni.com/Documents/PR%20Tree.pdf. The modified version is necessary due to a possibility of overlapping between time intervals.
I'll show you what I would like to do with this structure using a simple example:
Let's say we have got this graph:
We can identify a few time intervals here:
[03.10, 03.20], [03.01, 03.10], [03.01, 03.20] and [01.01, 03.01].
Let's transform them to:
A = [69, 79], B = [60, 69], C = [60, 79], D = [0, 60].
It's the PR-Tree for these intervals (an original one, not modified due to simplification):

Each time interval should contain a vector of "nodes". I wrote nodes, but I mean containers of "attributes" associated with concrete nodes and time intervals. Such attributes should have got only ids and values (to decrease space usage). In our example we have got:
A: (1,2,1)
B: (1,2,2)
C: (2,2,1), (3,2,1)
D: (2,2,0), (3,2,0)
in the format of X: (node_id, atrr_id, atrr_value).
Thanks to the vector structure we can access "nodes" with constant time cost.
Now we want to return attributes values array for a given node (2 for instance) and time interval (let's say 0, 78):
Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken). We can see that the only time intervals which contain 78 are A and C.
Step 2. Now we can easily get the values we are looking for. It's only (2,2,1) from C interval.
If we want to return static graph copy for given time interval, it's even easier. We can build such graph from intervals we get in the step 2.
What do you think about it?
And another question:
I would like to know what do you think about my idea concerning a data structure hosting dynamic attributes. I'll try to explain it briefly.
First of all I would like to use the modified PR-Tree data structure described here: http://moezchaabouni.com/Documents/PR%20Tree.pdf. The modified version is necessary due to a possibility of overlapping between time intervals.
I'll show you what I would like to do with this structure using a simple example:
Let's say we have got this graph:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gephi.org/gexf/1.1draft" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gephi.org/gexf/1.1draft http://gephi.org/gexf/1.1draft.xsd" version="1.1">
<meta lastmodifieddate="2009-03-20">
<creator>Gexf.net</creator>
<description>A Web network changing over time</description>
</meta>
<graph mode="dynamic" defaultedgetype="directed" start="2009-01-01" end="2009-03-20">
<attributes class="node" mode="static">
<attribute id="0" title="url" type="string"/>
<attribute id="1" title="frog" type="boolean">
<default>true</default>
</attribute>
<attributes class="node" mode="dynamic">
<attribute id="2" title="indegree" type="float"/>
</attributes>
<nodes>
<node id="0" label="Gephi" start="2009-03-01">
<attvalues>
<attvalue for="0" value="http://gephi.org"/>
<attvalue for="2" value="1"/>
</attvalues>
</node>
<node id="1" label="Webatlas">
<attvalues>
<attvalue for="0" value="http://webatlas.fr"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>
<node id="2" label="RTGI">
<attvalues>
<attvalue for="0" value="http://rtgi.fr"/>
<attvalue for="2" value="0" end="2009-03-01"/>
<attvalue for="2" value="1" start="2009-03-01"/>
</attvalues>
<slices>
<slice end="2009-03-01">
<slice start="2009-03-05" end="2009-03-10">
</slices>
</node>
<node id="2" label="BarabasiLab">
<attvalues>
<attvalue for="0" value="http://barabasilab.com"/>
<attvalue for="1" value="false"/>
<attvalue for="2" value="0" end="2009-03-01"/>
<attvalue for="2" value="1" start="2009-03-01"/>
</attvalues>
</node>
</nodes>
<edges>
<edge id="0" source="0" target="1" start="2009-03-01"/>
<edge id="1" source="0" target="2" start="2009-03-01" end="2009-03-10"/>
<edge id="2" source="1" target="0" start="2009-03-01"/>
<edge id="3" source="2" target="1" end="2009-03-10"/>
<edge id="4" source="0" target="3" start="2009-03-01"/>
</edges>
</graph>
</gexf>
[03.10, 03.20], [03.01, 03.10], [03.01, 03.20] and [01.01, 03.01].
Let's transform them to:
A = [69, 79], B = [60, 69], C = [60, 79], D = [0, 60].
It's the PR-Tree for these intervals (an original one, not modified due to simplification):

Each time interval should contain a vector of "nodes". I wrote nodes, but I mean containers of "attributes" associated with concrete nodes and time intervals. Such attributes should have got only ids and values (to decrease space usage). In our example we have got:
A: (1,2,1)
B: (1,2,2)
C: (2,2,1), (3,2,1)
D: (2,2,0), (3,2,0)
in the format of X: (node_id, atrr_id, atrr_value).
Thanks to the vector structure we can access "nodes" with constant time cost.
Now we want to return attributes values array for a given node (2 for instance) and time interval (let's say 0, 78):
Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken). We can see that the only time intervals which contain 78 are A and C.
Step 2. Now we can easily get the values we are looking for. It's only (2,2,1) from C interval.
If we want to return static graph copy for given time interval, it's even easier. We can build such graph from intervals we get in the step 2.
What do you think about it?
And another question:
Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?Adapt Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.
- mbastian
- Gephi Architect
- Posts:728
- Joined:10 Dec 2009 10:11
- Location:San Francisco, CA [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Dynamic attributes and statistics
Hi,
Yes the data structure for dynamic attributes is something close from this. I also found this paper when I was doing research. The problem is that the search query is a unique point, not an interval. What about Interval tree ?
Yes the data structure for dynamic attributes is something close from this. I also found this paper when I was doing research. The problem is that the search query is a unique point, not an interval. What about Interval tree ?
The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweeness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?
Re: Dynamic attributes and statistics
Hmm, so I was mistaken here:mbastian wrote: The problem is that the search query is a unique point, not an interval.
I just read the article http://www.cmu.edu/joss/content/article ... McFarland/ and noticed interesting sentences:Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken).
This is something that should be done. All things considered Interval Tree is necessary in fact. I can implement this instead of PR-Tree and in the step in which we get intervals that overlap the interval we look for, I should do something with "ties". I think that for numbers a potential user should have got a possibility to choose what to do with them (sum, maximum, count and average as described in the article). For non-numeric attributes he should choose between something like the first occurrence, the last occurrence etc.[...] Thick slices have start and end times that define an interval, and include all events that either occur within the interval or whose duration intersect the interval (see figure 4c).9 Thick slices query the data similar to the way a questionnaire might ask about ongoing or past relationships. For example, "show me all the loans among firms that took place during 1994.” It is not a picture of a network at a point in time but rather a period in time, a collection of data within a certain range, as in the bin of a histogram. Thick slices are a way of discretizing (pseudo-) continuous data, and thin slices are a way to sample continuous or real-valued versions of discrete-interval data.
A slice is a “bin” that contains a set of events in time, but for most network operations, the time information is not used and the network data must be collapsed to form a matrix or arc-list. Depending on the duration or "thickness" of the slice, it is possible that there is more than one arc between a given pair of nodes (or there may be multiple kinds of ties) so it is important to consider how these ties will be aggregated. For most variables, common operations like sum, maximum, count, and average can be used. (Non-numeric categorical attributes will need more sophisticated treatment in the future.)
So, this is "only" querying The Dynamic API to get graphs for time intervals and computing metrics of such graphs. What about efficiency? Sometimes it could take much time to compute metrics of a single "snapshot"...The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweenness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.
EDIT: OK, I know what you would like to get :P It should be only possibility to compute metrics of a single snapshot, not computing them automatically while changing time intervals...
EDIT 2: I've been thinking about this:
It's a bit challenging, because it depends on abstract sense of attributes very much. Taking this into consideration I think a potential user should decide what kind of visualization should be used for each attribute. For example:Propose idea for dynamic visualization of attributes (color, size, label color, label size, edge weight).
Let's say that some graph's nodes contain an attribute like priority. A potential user could choose between several possibilities like: nothing (it means this attribute shouldn't be visualized), color (for min and max value, it could be combined with "ties" problem, i.e. in case of average this color could be gradient or something like that), stroke thickness etc.
Of course this idea is connected with UI development. Is it something that should be a part of my proposal?
- mbastian
- Gephi Architect
- Posts:728
- Joined:10 Dec 2009 10:11
- Location:San Francisco, CA [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Dynamic attributes and statistics
Yes, right!
Yes with intervals instead of points, several slices could match. It's nice to be able to get sum, average and these kind of functions for numerical indeed.
Yes with intervals instead of points, several slices could match. It's nice to be able to get sum, average and these kind of functions for numerical indeed.
Yes, add all your reflexions in your proposal. Reviewing what other softwares propose could be also useful.Of course this idea is connected with UI development. Is it something that should be a part of my proposal?
-
- Posts:3
- Joined:08 Apr 2010 19:19 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Dynamic attributes and statistics
With regarding your comment on my proposal, (link to your proposal), I would like to describe the API by listing all the functions, their uses, and a high level view of how they would be implemented. In this way I can present all the methods that other developers can utilize my segment of code. What do you think?
- mbastian
- Gephi Architect
- Posts:728
- Joined:10 Dec 2009 10:11
- Location:San Francisco, CA [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Dynamic attributes and statistics
Sure, I agree with this approach
-
- Posts:3
- Joined:08 Apr 2010 19:19 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Dynamic attributes and statistics
I have posted the answer on the comment section of my proposal.
Have a nice day:)
Have a nice day:)