如何在Erlang中有效地讀取STDIN中的數千行代碼？

從STDIN讀取數千行時，我偶然發現了一個問題。這將是一個假想的邊緣情況，直到我發現this problem的一些測試需要讀取STDIN中的數千行。起初我認爲我的算法不是最優的，只是偶然發現只有沒有任何計算的讀線才能使測試的一半時間結束。如何在Erlang中有效地讀取STDIN中的數千行代碼？

下面是部分代碼超時：

process_queries(0, _) -> ok; 
process_queries(N, A) -> 
    case io:fread("", "~s~d~d") of 
     {ok, _} -> process_queries(N - 1, A) 
     %{ok, ["Q", L, R]} -> process_queries(N - 1, apply_q(L, R, A)); 
     %{ok, ["U", Idx, Val]} -> process_queries(N - 1, apply_u(Idx, Val, A)) 
    end 
.

我故意留下評論表明，所有的計算被禁用。所以這個代碼超時給出N=7984。

有沒有更好的方式來讀取和處理Erlang中的數千行STDIN？

io:get_line一次只得到一行。
io:get_chars要求你知道要獲得多少個字符。

來源

2017-09-29 m3nthal

我懷疑在Erlang問題的實際世界中不存在的hackerrank相關問題會激發很多回應。如果您在這裏沒有找到答案，請嘗試IRC或真實世界的問題。 – zxq9

我建議將stdio切換爲二進制，然後使用io:get_line。通過分割空白並將兩個值轉換爲整數，您的數據格式非常易於解析。下面的代碼在簡單的基準測試中比我的代碼運行速度快了10倍。我使用escript進行基準測試，這意味着它很可能差不多是10次以上，因爲escript在運行時解析和編譯代碼。

process_queries_2(0, _) -> ok; 
process_queries_2(N, A) -> 
    Line = io:get_line(""), 
    [X, Y0, Z0, _] = binary:split(Line, [<<$\s>>, <<$\n>>], [global]), 
    Y = binary_to_integer(Y0), 
    Z = binary_to_integer(Z0), 
    % use X, Y, Z 
    process_queries_2(N - 1, A).

下面是我用的基準代碼：

main(["1"]) -> 
    ok = io:setopts(standard_io, [binary]), 
    process_queries(10000, {}); 
main(["2"]) -> 
    ok = io:setopts(standard_io, [binary]), 
    process_queries_2(10000, {}).%

$ time yes 'Q 100 200' | escript a.erl 1 
yes 'Q 100 200' 4.64s user 0.11s system 93% cpu 5.063 total 
escript a.erl 1 4.67s user 1.44s system 120% cpu 5.062 total 
$ time yes 'Q 100 200' | escript a.erl 2 
yes 'Q 100 200' 0.36s user 0.01s system 77% cpu 0.466 total 
escript a.erl 2 0.40s user 0.10s system 106% cpu 0.464 total

的原因加速是Erlang的字符串是鏈表，這是非常低效既爲CPU時間和內存與二進制文件相比，這是一個連續的內存塊。

來源

2017-09-29 20:00:55 Dogbert

從我的解決方案摘錄。有幾個技巧如何做到真正有效。

read_command(CP) -> 
    {ok, Line} = file:read_line(standard_io), 
    [C, A, B] = binary:split(Line, CP, [global, trim_all]), 
    {case C of <<"Q">> -> 'Q'; <<"U">> -> 'U' end, 
    binary_to_integer(A), 
    binary_to_integer(B)}. 

read_commands(N, CP) -> 
    [ read_command(CP) || _ <- lists:seq(1, N) ]. 

execute(Array, L) -> 
    lists:foldl(fun({'Q', F, T}, A) -> 
         {Val, A2} = query(A, F, T), 
         file:write(standard_io, [integer_to_binary(Val), $\n]), 
         A2; 
        ({'U', I, V}, A) -> 
         update(A, I, V) 
       end, Array, L). 

read_int_line(CP) -> 
    {ok, Line} = file:read_line(standard_io), 
    [binary_to_integer(X) || X <- binary:split(Line, CP, [global, trim_all])]. 

main() -> 
    ok = io:setopts([binary]), 
    CP = binary:compile_pattern([<<" ">>, <<$\n>>]), 
    [N] = read_int_line(CP), 
    L = read_int_line(CP), 
    N = length(L), 
    [K] = read_int_line(CP), 
    execute(init(L), read_commands(K, CP)).

你必須寫自己init/1，update/3當然query/3。

來源

2017-09-29 20:13:54

謝謝，我有幾個問題。爲什麼需要'init/1'功能？它看起來像整個數組已經在'L'中。 – m3nthal

@ m3nthal：因爲我使用不同的內部表示。 'L'是一個列表，而不是一個數組。它具有不良的訪問特徵。而且我還使用數字的不同表示法以及更快的最小公倍數計算。 –

我做了一些研究，發現訪問和隨機更新時，字典和地圖會更快。你如何表示數字？在二進制格式？也許你也使用二進制gcd算法？ – m3nthal

如何在Erlang中有效地讀取STDIN中的數千行代碼？

回答

相關問題