2016-11-22 62 views
1

爲了測試其查詢性能,我正在使用moderate graph填充Titan 1.0.0單實例。我使用Cassandra 2.0.17作爲存儲後端。Cassandra存儲後端的Titan索引問題

事情是我無法創建節點索引,因此最佳查詢結果。我已閱讀docs,我正在努力仔細地關注他們,但沒有取得太大的成功。我使用下面的Groovy腳本的模式定義,數據人口和創建索引:

import com.thinkaurelius.titan.core.*; 
import com.thinkaurelius.titan.core.schema.*; 
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem; 
import java.time.temporal.ChronoUnit; 

graph = TitanFactory.open('conf/my-titan.properties'); 
mgmt = graph.openManagement(); 

// Build graph schema 
//  Node properties 
idProp = mgmt.containsPropertyKey('userId') ? 
    mgmt.getPropertyKey('userId') : mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE); 
isPublicProp = mgmt.containsPropertyKey('isPublic') ? 
    mgmt.getPropertyKey('isPublic') : mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE); 
completionPercentageProp = mgmt.containsPropertyKey('completionPercentage') ? 
    mgmt.getPropertyKey('completionPercentage') : mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE); 
genderProp = mgmt.containsPropertyKey('gender') ? 
mgmt.getPropertyKey('gender') : mgmt.makePropertyKey('gender').dataType(String.class).cardinality(Cardinality.SINGLE); 
regionProp = mgmt.containsPropertyKey('region') ? 
mgmt.getPropertyKey('region') : mgmt.makePropertyKey('region').dataType(String.class).cardinality(Cardinality.SINGLE); 
lastLoginProp = mgmt.containsPropertyKey('lastLogin') ? 
mgmt.getPropertyKey('lastLogin') : mgmt.makePropertyKey('lastLogin').dataType(String.class).cardinality(Cardinality.SINGLE); 
registrationProp = mgmt.containsPropertyKey('registration') ? 
mgmt.getPropertyKey('registration') : mgmt.makePropertyKey('registration').dataType(String.class).cardinality(Cardinality.SINGLE); 
ageProp = mgmt.containsPropertyKey('age') ? mgmt.getPropertyKey('age') : mgmt.makePropertyKey('age').dataType(Integer.class).cardinality(Cardinality.SINGLE); 
mgmt.commit(); 

nUsers = 0 
println 'Starting nodes population...'; 
// Load users 
new File('/home/jarandaf/soc-pokec-profiles.txt').eachLine { 
    try { 
    fields = it.split('\t').take(8); 
    userId = fields[0]; 
    isPublic = fields[1] == '1' ? true : false; 
    completionPercentage = fields[2] 
    gender = fields[3] == '1' ? 'male' : 'female'; 
    region = fields[4]; 
    lastLogin = fields[5]; 
    registration = fields[6]; 
    age = fields[7] as int; 
    graph.addVertex('userId', userId, 'isPublic', isPublic, 'completionPercentage', completionPercentage, 'gender', gender, 'region', region, 'lastLogin', lastLogin, 'registration', registration, 'age', age); 
    } catch (Exception e) { 
    // Silently skip... 
    } 
    nUsers += 1 
    if (nUsers % 100000 == 0) println String.valueOf(nUsers) + ' loaded...'; 
}; 
graph.tx().commit(); 
println 'Nodes population finished'; 

// Index users by userId, gender and age 
println 'Getting node properties...'; 
mgmt = graph.openManagement(); 
userId = mgmt.getPropertyKey('userId'); 
gender = mgmt.getPropertyKey('gender'); 
age = mgmt.getPropertyKey('age'); 

println 'Building byUserId index...'; 
if (mgmt.getGraphIndex('byUserId') == null) mgmt.buildIndex('byUserId', Vertex.class).addKey(userId).buildCompositeIndex(); 
println 'Building byGender index...'; 
if (mgmt.getGraphIndex('byGender') == null) mgmt.buildIndex('byGender', Vertex.class).addKey(gender).buildCompositeIndex(); 
println 'Building byAge index...'; 
if (mgmt.getGraphIndex('byAge') == null) mgmt.buildIndex('byAge', Vertex.class).addKey(age).buildCompositeIndex(); 
mgmt.commit(); 

// Wait for the indexes to become available 
println 'Awaiting byUserId graph index status...'; 
ManagementSystem.awaitGraphIndexStatus(graph, 'byUserId') 
    .status(SchemaStatus.REGISTERED) 
    .timeout(10, ChronoUnit.MINUTES) 
    .call(); 
println 'Awaiting byGender graph index status...'; 
ManagementSystem.awaitGraphIndexStatus(graph, 'byGender') 
    .status(SchemaStatus.REGISTERED) 
    .timeout(10, ChronoUnit.MINUTES) 
    .call(); 

println 'Awaiting byAge graph index status...'; 
ManagementSystem.awaitGraphIndexStatus(graph, 'byAge') 
    .status(SchemaStatus.REGISTERED) 
    .timeout(10, ChronoUnit.MINUTES) 
    .call(); 

// Reindex the existing data 
mgmt = graph.openManagement(); 
println 'Reindexing data by byUserId index...'; 
mgmt.updateIndex(mgmt.getGraphIndex('byUserId'), SchemaAction.REINDEX).get(); 
println 'Reindexing data by byGender index...'; 
mgmt.updateIndex(mgmt.getGraphIndex('byGender'), SchemaAction.REINDEX).get(); 
println 'Reindexing data by byAge index...'; 
mgmt.updateIndex(mgmt.getGraphIndex('byAge'), SchemaAction.REINDEX).get(); 
mgmt.commit(); 

// Enable indexes 
println 'Enabling byUserId index...' 
mgmt.awaitGraphIndexStatus(graph, 'byUserId').status(SchemaStatus.ENABLED).call(); 
println 'Enabling byGender index...' 
mgmt.awaitGraphIndexStatus(graph, 'byGender').status(SchemaStatus.ENABLED).call(); 
println 'Enabling byAge index...' 
mgmt.awaitGraphIndexStatus(graph, 'byAge').status(SchemaStatus.ENABLED).call(); 

graph.close(); 

我得到的誤差以下,並與重新索引階段有關:

08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [[email protected]] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]] 
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [[email protected]] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]] 
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [[email protected]] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]] 

任何提示對此將不勝感激。

回答

2

您收到的錯誤表明您在嘗試修改架構時有打開的事務。 Titan需要等待所有事務完成才能修改模式。有關更多信息,請參閱answer from Matthias Broecheler on the mailing list

一般來說,如果可能的話,你應該避免重新索引,因爲它需要泰坦到步行遍歷所有頂點,看他們是否需要被添加到應該更新的索引。 The documentation包含有關此過程的更多信息。

對於您的用例,您可以在加載任何數據之前簡單地創建所有索引。當您在所有索引準備好之後添加數據時,它們將被簡單地添加到索引中。這樣,你應該能夠立即使用索引。

在Groovy中模式創建一個最小的例子(但應在Java中基本相同):

import com.thinkaurelius.titan.core.TitanFactory; 
import com.thinkaurelius.titan.core.Multiplicity; 
import com.thinkaurelius.titan.core.Cardinality; 

graph = TitanFactory.open('conf/my-titan.properties') 

mgmt = graph.openManagement() 

id = mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE) 

// some other properties that will not be indexed 
mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE) 
mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE) 

// I prefer to use vertex labels to differentiate between different 'types' of vertices but this isn't necessary 
User = mgmt.makeVertexLabel('User').make() 

mgmt.buildIndex('UserById',Vertex.class).addKey(id).indexOnly(user).buildCompositeIndex() 

mgmt.commit() 

我刪除了所有的檢查已經存在的簡單模式元素,但你當然可以再次添加它們。 模式創建後,您可以像以前一樣添加數據。

有關索引管理的最終節點:嘗試始終定義要在創建索引的同一事務中索引的屬性鍵。否則,Titan無法知道是否已有數據需要添加到新索引中,這又需要對所有數據進行完整掃描。這可能需要爲屬性選擇不同的名稱。例如,當您添加新頂點標籤時,則可能需要使用新名稱,如postId,而不是再次使用屬性id以避免掃描所有現有數據。

+0

嗨弗洛裏安,謝謝你的回答。出於某種原因,我無法訪問郵件列表線程。 – jarandaf

+1

對不起,鏈接已損壞。它現在應該工作。順便說一下,Titan的索引管理問題相當頻繁。所以你應該在郵件列表中找到很多類似的問題。 –

+0

您能否在數據填充階段之前提供一個架構定義和索引創建的最小示例(爲了完整起見)? :-) – jarandaf