您的數據集可以視爲可能斷開連接的有向圖。在我看來,你想爲每個弱連接的子圖設置節點。自己寫這個並不難:
- 我們將該圖視爲無向圖。
- 我們將邊緣存儲在散列中,以便條目
$edge{$a}{$b}
是從頂點$a
到$b
的有向邊。
- 現在我們需要的是一個迭代搜索,在我們去的時候刪除所有訪問的邊。
示例代碼:
use strict; use warnings; use feature qw/say/;
# build the graph
my %edge;
while (<>) {
my ($from, $to) = split;
$edge{$from}{$to} = $edge{$to}{$from} = undef;
}
while (my ($start) = keys %edge) {
my @seen = ($start);
my @stack = ($start);
while (@stack) {
my $vertex = pop @stack;
# delete edges from and to this vertex
# NB: any connections to seen vertices are already removed.
my @reachable = keys %{ delete($edge{$vertex}) // {} };
delete $edge{$_}{$vertex} for @reachable;
# mark new vertices as seen, and enqueue them
push @seen, @reachable;
push @stack, @reachable;
}
my $nodes = join ', ', sort @seen;
say "node set: {$nodes}";
}
輸出爲數據:
node set: {B, F, Q, R}
node set: {D, K, P}
node set: {A, C, E, I, S, Y}
該算法已相當最佳的,並運行在Ö(ñ·ķ)時間和空間(其中k是鄰居的平均數量)。
當然,已經有一個模塊實現圖算法。不言而喻地,它被稱爲Graph
。上面的代碼等同於:
use strict; use warnings; use feature qw/say/;
use Graph;
my $graph = Graph::Undirected->new;
while (<>) {
my ($from, $to) = split;
$graph->add_edge($from, $to);
}
for my $nodes_array ($graph->connected_components) {
my $nodes = join ', ', sort @$nodes_array;
say "node set: {$nodes}";
}
有可能在更短的存儲器和可能的更短的時間通過跟蹤所連接的部件的,因爲我們建立圖以進行這些計算。爲此,我們有一個將頂點映射到子圖的散列。
- 如果一個邊的兩個頂點都是未知的,他們將創建一個新的子圖。
- 如果只有一個頂點已知,則另一個節點映射到第一個節點的子圖,並在那裏作爲成員列出。
- 如果兩個頂點是已知的,那麼
- 如果它們指向同一個子圖,什麼都不會發生。
- 如果它們指向不同的子圖,則所有條目都更新爲指向相同的子圖,該子圖現在包含先前子圖的組合節點。
代碼:
use strict; use warnings; use feature qw/say/;
my %subgraph_by_id;
my %subgraph_by_vertex;
while(<>) {
my ($x, $y) = split;
# case 1:
# If an both vertices of an edge are unknown, they create a new subgraph.
if (not exists $subgraph_by_vertex{$x} and not exists $subgraph_by_vertex{$y}) {
my $new = [$x, $y];
$subgraph_by_id{0+ $new} = $new;
$subgraph_by_vertex{$_} = $new for $x, $y;
}
# case 2:
# If exactly one vertex is known, the other node maps to the subgraph of the
# first node, and is listed there as a member.
elsif (not exists $subgraph_by_vertex{$x} or not exists $subgraph_by_vertex{$y}) {
my ($known, $unknown) = (exists $subgraph_by_vertex{$x}) ? ($x, $y) : ($y, $x);
my $subgraph = $subgraph_by_vertex{$unknown} = $subgraph_by_vertex{$known};
push @$subgraph, $unknown;
}
# case 3:
# both vertices are known. If they point to different subgraphs, all entries
# are updated to point to the same subgraph which now contains the combined
# nodes of the previous subgraphs.
# Except all that copying would make for a horrible worst case.
# Instead, we just add a reference to the smaller list, flattening it later.
else {
my $prev_x = $subgraph_by_vertex{$x};
my $prev_y = $subgraph_by_vertex{$y};
# don't test for inequality directly to allow subgraph nesting
if ($subgraph_by_id{0+ $prev_x} != $subgraph_by_id{0+ $prev_y}) {
my ($new, $old) = (@$prev_x > @$prev_y) ? ($prev_x, $prev_y) : ($prev_y, $prev_x);
push @$new, $old;
# $old not needed on top level any longer – associate it with $new by id
$subgraph_by_id{0+ $old} = 0+ $new;
}
}
}
# skip symbolic IDs
for my $nodes_array (grep ref, values %subgraph_by_id) {
my $nodes = join ', ', flatten($nodes_array);
say "node set: {$nodes}";
}
sub flatten {
return map { ref $_ ? flatten($_) : $_ } @{ shift() };
}
這僅使用Ø(ñ)的空間和時間,採用了很多尷尬的招數。在構建子圖的過程中,我不合並兩個相關的子圖,而是推遲到最後。否則,一個邊緣情況(自下而上建立的平衡樹 - 對於每個非葉節點,樹的一半將被複制)可能需要指數時間 - 我沒有做完整的分析。 0+
「venus」僞操作符對其參數進行了編號,用於此處獲取數組引用的ID。
您能否使用您提到的邏輯來解釋您是如何到達集合'(ACEISY),(BFQR)和(DKP)'的? – Anand
對於第一組 - 在行1中,A與C相關聯。在行2中,A與S相關聯。在行6中,C與I相關聯。在行11中,I與Y相關聯。在行8中, E與Y相關聯,因此該集合是(ACEISY)。 – user2674514
無法在'AC AS CI IY EY => ACEISY'中看到關係 –