Gephi forums

Re: Dynamic attributes and statistics

2010-04-21T12:27:32+01:00

I have posted the answer on the comment section of my proposal.

Have a nice day:)

Statistics:Posted by Cristina Grigoruta — 21 Apr 2010 12:27

Re: Dynamic attributes and statistics

2010-04-20T14:17:39+01:00

Sure, I agree with this approach

Statistics:Posted by mbastian — 20 Apr 2010 14:17

Re: Dynamic attributes and statistics

2010-04-20T01:12:00+01:00

With regarding your comment on my proposal, (link to your proposal), I would like to describe the API by listing all the functions, their uses, and a high level view of how they would be implemented. In this way I can present all the methods that other developers can utilize my segment of code. What do you think?

Statistics:Posted by Cristina Grigoruta — 20 Apr 2010 01:12

Re: Dynamic attributes and statistics

2010-04-02T16:48:55+01:00

Yes, right!

Yes with intervals instead of points, several slices could match. It's nice to be able to get sum, average and these kind of functions for numerical indeed.

Of course this idea is connected with UI development. Is it something that should be a part of my proposal?

Yes, add all your reflexions in your proposal. Reviewing what other softwares propose could be also useful.

Statistics:Posted by mbastian — 02 Apr 2010 16:48

Re: Dynamic attributes and statistics

2010-04-01T14:44:45+01:00

mbastian wrote:The problem is that the search query is a unique point, not an interval.

Hmm, so I was mistaken here:

Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken).

I just read the article http://www.cmu.edu/joss/content/article ... McFarland/ and noticed interesting sentences:

[...] Thick slices have start and end times that define an interval, and include all events that either occur within the interval or whose duration intersect the interval (see figure 4c).9 Thick slices query the data similar to the way a questionnaire might ask about ongoing or past relationships. For example, "show me all the loans among firms that took place during 1994.” It is not a picture of a network at a point in time but rather a period in time, a collection of data within a certain range, as in the bin of a histogram. Thick slices are a way of discretizing (pseudo-) continuous data, and thin slices are a way to sample continuous or real-valued versions of discrete-interval data.

A slice is a “bin” that contains a set of events in time, but for most network operations, the time information is not used and the network data must be collapsed to form a matrix or arc-list. Depending on the duration or "thickness" of the slice, it is possible that there is more than one arc between a given pair of nodes (or there may be multiple kinds of ties) so it is important to consider how these ties will be aggregated. For most variables, common operations like sum, maximum, count, and average can be used. (Non-numeric categorical attributes will need more sophisticated treatment in the future.)

This is something that should be done. All things considered Interval Tree is necessary in fact. I can implement this instead of PR-Tree and in the step in which we get intervals that overlap the interval we look for, I should do something with "ties". I think that for numbers a potential user should have got a possibility to choose what to do with them (sum, maximum, count and average as described in the article). For non-numeric attributes he should choose between something like the first occurrence, the last occurrence etc.

The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweenness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.

So, this is "only" querying The Dynamic API to get graphs for time intervals and computing metrics of such graphs. What about efficiency? Sometimes it could take much time to compute metrics of a single "snapshot"...

EDIT: OK, I know what you would like to get :P It should be only possibility to compute metrics of a single snapshot, not computing them automatically while changing time intervals...

EDIT 2: I've been thinking about this:

Propose idea for dynamic visualization of attributes (color, size, label color, label size, edge weight).

It's a bit challenging, because it depends on abstract sense of attributes very much. Taking this into consideration I think a potential user should decide what kind of visualization should be used for each attribute. For example:

Let's say that some graph's nodes contain an attribute like priority. A potential user could choose between several possibilities like: nothing (it means this attribute shouldn't be visualized), color (for min and max value, it could be combined with "ties" problem, i.e. in case of average this color could be gradient or something like that), stroke thickness etc.

Of course this idea is connected with UI development. Is it something that should be a part of my proposal?

Statistics:Posted by cezar_1 — 01 Apr 2010 14:44

Re: Dynamic attributes and statistics

2010-04-01T13:06:12+01:00

Hi,

Yes the data structure for dynamic attributes is something close from this. I also found this paper when I was doing research. The problem is that the search query is a unique point, not an interval. What about Interval tree ?

Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?

The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweeness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.

Statistics:Posted by mbastian — 01 Apr 2010 13:06

Re: Dynamic attributes and statistics

2010-03-31T20:59:25+01:00

Hi,

I would like to know what do you think about my idea concerning a data structure hosting dynamic attributes. I'll try to explain it briefly.

First of all I would like to use the modified PR-Tree data structure described here: http://moezchaabouni.com/Documents/PR%20Tree.pdf. The modified version is necessary due to a possibility of overlapping between time intervals.

I'll show you what I would like to do with this structure using a simple example:

Let's say we have got this graph:

CODE:

            Gexf.net        A Web network changing over time                                                        true

We can identify a few time intervals here:
[03.10, 03.20], [03.01, 03.10], [03.01, 03.20] and [01.01, 03.01].
Let's transform them to:
A = [69, 79], B = [60, 69], C = [60, 79], D = [0, 60].

It's the PR-Tree for these intervals (an original one, not modified due to simplification):

Each time interval should contain a vector of "nodes". I wrote nodes, but I mean containers of "attributes" associated with concrete nodes and time intervals. Such attributes should have got only ids and values (to decrease space usage). In our example we have got:

A: (1,2,1)
B: (1,2,2)
C: (2,2,1), (3,2,1)
D: (2,2,0), (3,2,0)

in the format of X: (node_id, atrr_id, atrr_value).

Thanks to the vector structure we can access "nodes" with constant time cost.

Now we want to return attributes values array for a given node (2 for instance) and time interval (let's say 0, 78):

Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken). We can see that the only time intervals which contain 78 are A and C.
Step 2. Now we can easily get the values we are looking for. It's only (2,2,1) from C interval.

If we want to return static graph copy for given time interval, it's even easier. We can build such graph from intervals we get in the step 2.

What do you think about it?

And another question:

Adapt Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?

Statistics:Posted by cezar_1 — 31 Mar 2010 20:59

Dynamic attributes and statistics

2010-03-22T09:41:40+01:00

This is the thread for asking more details about the Dynamic Attributes and Statistics proposal.

Statistics:Posted by admin — 22 Mar 2010 09:41