Indexed Attributes API using Lucene
-
- Gephi Community Manager
- Posts:964
- Joined:09 Dec 2009 14:41 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
This is the thread for asking more details about the Indexed Attributes API using Lucene proposal.
-
- Gephi Plugin Developer
- Posts:3
- Joined:30 Mar 2011 22:32 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Indexed Attributes API using Lucene
Hi,
I'm interested in working with the Lucene proposal over the summer and have implemented a very simple proof of concept. Anyone interested can pull the code from https://code.launchpad.net/~eaneiros/gephi/lucene1. This is not meant to showcase anything related to design or architecture but just to get an idea of what path to follow and get a feeling of how the integration between the Lucene and the Attributes API might work.
To test the feature follow the steps below:
1 - Download the branch, open in Netbeans, compile & run
2 - Download the test case attached to this post and open it. I use this one because the node columns contain text data like country, name, programming language, etc which are useful to test Lucene.
3 - In the Data Laboratory Node view, you will see a button "Index" to the left of the filter textbox. Click it and Lucene will index all the nodes.
4 - Your are now ready to use Lucene in Gephi!! Enter your Lucene queries in the Filter textbox hit enter and see the results appear in the data table.
5 - To reset the data table erase the textbox and press enter.
Here are some interesting queries that you can try:
language:lisp - All developers that use some flavor of Lisp
location:"United States" AND language:Ruby - All developers from the United States that use Ruby
location:United AND (language:Ruby OR language:Javascript) - All developers from either the United States or the United Kingdom that use either Ruby or Javascript
Note: since this is such a primitive implementation any problems you find check stdout first for any clues as to what the problem might be. If it persists use the only true way of fixing bugs, restart Gephi and pray, or change your query
I'm interested in working with the Lucene proposal over the summer and have implemented a very simple proof of concept. Anyone interested can pull the code from https://code.launchpad.net/~eaneiros/gephi/lucene1. This is not meant to showcase anything related to design or architecture but just to get an idea of what path to follow and get a feeling of how the integration between the Lucene and the Attributes API might work.
To test the feature follow the steps below:
1 - Download the branch, open in Netbeans, compile & run
2 - Download the test case attached to this post and open it. I use this one because the node columns contain text data like country, name, programming language, etc which are useful to test Lucene.
3 - In the Data Laboratory Node view, you will see a button "Index" to the left of the filter textbox. Click it and Lucene will index all the nodes.
4 - Your are now ready to use Lucene in Gephi!! Enter your Lucene queries in the Filter textbox hit enter and see the results appear in the data table.
5 - To reset the data table erase the textbox and press enter.
Here are some interesting queries that you can try:
language:lisp - All developers that use some flavor of Lisp
location:"United States" AND language:Ruby - All developers from the United States that use Ruby
location:United AND (language:Ruby OR language:Javascript) - All developers from either the United States or the United Kingdom that use either Ruby or Javascript
Note: since this is such a primitive implementation any problems you find check stdout first for any clues as to what the problem might be. If it persists use the only true way of fixing bugs, restart Gephi and pray, or change your query

- Attachments
-
- github-profiles.gdf
- Test case for Lucene proof of concept
- (2.42MiB)Downloaded 945 times
- eduramiba
- Gephi Code Manager
- Posts:1064
- Joined:22 Mar 2010 15:30
- Location:Madrid, Spain [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Indexed Attributes API using Lucene
Hi eaneiros,
Well this is a nice start as a proof of concept, you already made changes to Gephi code
The real implementation of indexing should be done on Attributes API and implementation be flexible enough to work like it does now or with indexing.
For the future implementation some key features that I consider important are:
Also remember that these previous specifications draft can be useful to you http://wiki.gephi.org/index.php/Core_ev ... API_Future
Well this is a nice start as a proof of concept, you already made changes to Gephi code

The real implementation of indexing should be done on Attributes API and implementation be flexible enough to work like it does now or with indexing.
For the future implementation some key features that I consider important are:
- Ability to choose normal/indexed attributes from the start when opening a graph file. This is important to be able to load graphs with very large amounts of data that can't be stored all in memory.
Provide an API for other modules like data laboratory to use for enabling/disabling the index, changing some behaviour (columns to store for example), perform a search...
Possible usage to improve ranking/partition/filters.
An easy way to build queries could be nice.
Also remember that these previous specifications draft can be useful to you http://wiki.gephi.org/index.php/Core_ev ... API_Future
-
- Gephi Plugin Developer
- Posts:3
- Joined:30 Mar 2011 22:32 [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Re: Indexed Attributes API using Lucene
Hi Eduardo,
definitely, those requirements are an absolute must, I had already included some of them in my draft. I read and analyzed the previous proposal and found some really valuable ideas. I'm building mine with a different concept in mind because I think that flexibility and ease of use are the two main goals that the API must achieve, performance will come later and if the design is right it shouldn't be a problem.
I'm giving the final touches to my proposal and will submit it soon!!! Thanks for the fast reply and best regards,
ernesto.
definitely, those requirements are an absolute must, I had already included some of them in my draft. I read and analyzed the previous proposal and found some really valuable ideas. I'm building mine with a different concept in mind because I think that flexibility and ease of use are the two main goals that the API must achieve, performance will come later and if the design is right it shouldn't be a problem.
I'm giving the final touches to my proposal and will submit it soon!!! Thanks for the fast reply and best regards,
ernesto.