以自由格式輸入文件爲TSV文件,不知道所有的列語義,這是一種編寫查詢的方法。請注意,我做出了評論中提供的假設。
@d =
EXTRACT path string,
user string,
num1 int,
num2 int,
start_date string,
end_date string,
flag string,
year int,
s string,
another_date string
FROM @"\users\temp\citypaths.txt"
USING Extractors.Tsv(encoding: Encoding.Unicode);
// I assume that you have only one DateTime format culture in your file.
// If it becomes dependent on the region or city as expressed in the path, you need to add a lookup.
@d =
SELECT new SqlArray<string>(path.Split('\\')) AS steps,
DateTime.Parse(end_date, new CultureInfo("fr-FR", false)).Date.ToString("yyyy-MM-dd") AS end_date
FROM @d;
// This assumes your paths have a fixed formatting/mapping into the city
@d =
SELECT steps[4].ToLowerInvariant() AS city,
end_date
FROM @d;
@res =
SELECT city,
end_date,
COUNT(*) AS count
FROM @d
GROUP BY city,
end_date;
OUTPUT @res
TO "/output/result.csv"
USING Outputters.Csv();
// Now let's pivot the date and count.
OUTPUT @res2
TO "/output/res2.csv"
USING Outputters.Csv();
@res2 =
SELECT city, MAP_AGG(end_date, count) AS date_count
FROM @res
GROUP BY city;
// This assumes you know exactly with dates you are looking for. Otherwise keep it in the first file representation.
@res2 =
SELECT city,
date_count["2016-11-21"]AS [2016-11-21],
date_count["2016-11-22"]AS [2016-11-22]
FROM @res2;
更新後得到了一些實例DATA IN私人電子郵件:基於數據
你發給我的(城市的提取和計數,你要麼可以用做後合併爲中概述Bob的回答是,您需要事先了解您的城市,或者從我的示例中的城市位置獲取字符串,您不需要事先知道城市),您想要將行集樞轉city, count, date
進入行集date, city1, city2, ...
的每行都包含每個城市的日期和計數。
你可以很容易地通過以下方式改變@res2
計算調整我上面的例子:
// Now let's pivot the city and count.
@res2 = SELECT end_date, MAP_AGG(city, count) AS city_count
FROM @res
GROUP BY end_date;
// This assumes you know exactly with cities you are looking for. Otherwise keep it in the first file representation or use a script generation (see below).
@res2 =
SELECT end_date,
city_count["istanbul"]AS istanbul,
city_count["midlands"]AS midlands,
city_count["belfast"] AS belfast,
city_count["acoustics"] AS acoustics,
city_count["amsterdam"] AS amsterdam
FROM @res2;
注意,在我的例子,你需要看它枚舉樞軸語句中的所有城市在SQL.MAP列中。如果這不是已知的,你將不得不首先提交一個腳本來爲你創建腳本。例如,假設您的city, count, date
行集位於文件中(或者您可以複製語句以在生成腳本和生成的腳本中生成行集),則可以將其寫爲以下腳本。然後將結果作爲實際處理腳本提交。
// Get the rowset (could also be the actual calculation from the original file
@in = EXTRACT city string, count int?, date string
FROM "https://stackoverflow.com/users/temp/Revit_Last2Months_Results.tsv"
USING Extractors.Tsv();
// Generate the statements for the preparation of the data before the pivot
@stmts = SELECT * FROM (VALUES
("@s1", "EXTRACT city string, count int?, date string FROM \"https://stackoverflow.com/users/temp/Revit_Last2Months_Results.tsv\" USING Extractors.Tsv();"),
("@s2", "SELECT date, MAP_AGG(city, count) AS city_count FROM @s1 GROUP BY date;")
) AS T(stmt_name, stmt);
// Now generate the statement doing the pivot
@cities = SELECT DISTINCT city FROM @in2;
@pivots =
SELECT "@s3" AS stmt_name, "SELECT date, "+String.Join(", ", ARRAY_AGG("city_count[\""+city+"\"] AS ["+city+"]"))+ " FROM @s2;" AS stmt
FROM @cities;
// Now generate the OUTPUT statement after the pivot. Note that the OUTPUT does not have a statement name.
@output =
SELECT "OUTPUT @s3 TO \"/output/pivot_gen.tsv\" USING Outputters.Tsv();" AS stmt
FROM (VALUES(1)) AS T(x);
// Now put the statements into one rowset. Note that null are ordering high in U-SQL
@result =
SELECT stmt_name, "=" AS assign, stmt FROM @stmts
UNION ALL SELECT stmt_name, "=" AS assign, stmt FROM @pivots
UNION ALL SELECT (string) null AS stmt_name, (string) null AS assign, stmt FROM @output;
// Now output the statements in order of the stmt_name
OUTPUT @result
TO "/pivot.usql"
ORDER BY stmt_name
USING Outputters.Text(delimiter:' ', quoting:false);
現在下載並提交它。
非常感謝你wBob,你真的讓我的工作變得簡單我只是用谷歌搜索找到一些方法來做到這一點。 Bob還有一件事,如果你看過我的輸出鏈接,你必須看到2個字段「位置」和「日期」,這意味着按日期位置的文件數量。如何也可以添加到您提供的上述解決方案中。請指教。再一次非常感謝你回覆我的帖子這麼快:-) –
好極了,你應該考慮把它當作答案! – wBob
日期在哪裏?從您的示例數據中不清楚。它在文件名中,還是你需要從文件本身收集它? – wBob