我收到預測給定類別值的每日文件。 其中FileDate = FcstDate,值FcstVal實際上是實際的實際值。 現在我正在使用Excel Power Query(XL'16:獲取&變換)輕鬆地將幾十個文件一起拖入類似下面的表格(400k +行,現實中的18個級別)中。匹配預測值與實際值
我需要說的是,對於1-1,1-2類別AA | AC | null預測分別爲1-3,但實際值爲43,對於所有其他行。大多數但不是全部的獨特行組合在文件之間是常見的。最終,我將不得不擔心處理重命名的級別...
Table.Partition,Table.FillUp,Table.FromPartitions Power Query函數完美地表達了邏輯,但Power Query太慢了,因爲它似乎多讀取每個非常大的.xlsx文件(+每行1個?!),因爲我需要一個包含所有不同類別級別的索引表&預測分區日期。
現在我降低到一個excel表使用如下公式: =SUMIFS([ActualVal], [Lvl1],[@[Lvl1]], [Lvl2],[@[Lvl2]], [Lvl3],[@[Lvl3]], [FileDt]],[@[FcstDt]], [@[Eq]]="Y")
然而,這要求所有空格設置爲「空」,改變與「=」或「>」,等,並且開始值需要小時來計算。
我一直在試圖學習PowerPivot/DAX,因爲我知道它能夠有效地過濾&計算大型數據集。我希望有一個解決方案,可以將DAX計算的「上下文」設置爲通過老式的Excel公式將同一行的引用設置爲&將值移到我想要的列中 - 但我還沒有弄清楚。
我非常喜歡PowerPivot解決方案,如果可能的話,但如果沒有,我有時可以理解python/pandas。但是,我們堅持使用來自第三方提供商的Excel輸入。
Lvl1 | Lvl2 | Lvl3 | FileDt | FcstDt | Eq | FcstVal | ActualVal | Wanted! 1-1: ________________________________________________________________________ AA AB AD 1-1 1-1 Y 100 100 100 AA AC AE 1-1 1-1 Y 50 50 50 AA AB (null) 1-1 1-2 110 105 AA AC (null) 1-1 1-2 (null) 45 AA AB (null) 1-1 1-3 120 105 AA AC (null) 1-1 1-3 70 43 1-2 file: ___________________________________________________________________ AA AB (null) 1-2 1-2 Y 105 105 105 AA AC (null) 1-2 1-2 Y 45 45 45 AA AB (null) 1-2 1-3 113 (null) AA AC (null) 1-2 1-3 44 43 1-3 file: ___________________________________________________________________ (missing row AA|AB!) 1-3 1-3 Y (null) (null) (null) AA AC (null) 1-3 1-3 Y 43 43 43 AA AB (null) 1-3 1-4 108 (null) AA AC (null) 1-3 1-4 42 (null)
編輯:
我會分享我的代碼,因爲某些部分可能是有用的人,和我的問題可能是在其他地方。
我的策略是根據打開的Excel中的表格加載一組工作簿。我應用了一個簡單的函數來從工作簿內容中提取我想要的表,然後還應用一個函數在儘可能多的表上進行處理,同時仍然分離,認爲多線程可能更好地利用,因爲它們仍然是獨立的是對的嗎?)。
這結束了第一個查詢:。我寧願停在這裏,如果可以完成剩下的工作(使用最終的Table.Combine,如果需要的話),可以使用PowerPivot。
在Power Query中,我必須組合這些表 - 兩次。第一個包含所有字段,第二個包含所有表中不同的分組字段(不包含值或As-of Date字段)。不能使用單個(即第一個)表格,因爲分組組合可能存在於不在第一個表格中的後續表格中,反之亦然。這個獨特的表格獲得一個索引。
我通過Table.NestedJoin加入第二個到第一個&只從聯合列中提取索引。這使我可以將數據劃分爲僅具有相同預測日期組的分區。在這裏,我可以填充按鈕,因爲在Prep_Data_Table函數中,按照日期降序對錶進行預先排序,所以實際值(如果有的話)向下流到同一組的其他部分,並且不再進一步。
之後,只需重新組合表格。
CODE:
FieldMetadata保持數據類型&爲字段的訂購信息。 源文件保存路徑名稱&是否加載指定的文件。
ImportParameters:
[ThisWB = Excel.CurrentWorkbook()
Sources = ThisWB{[Name="Sources"]}[Content],
FieldMetadata = ThisWB{[Name="FieldMetadata"]},
FieldTypes = Table.ToRows(GetCfg({"Type"})),
CategoryFields = List.Buffer(List.Transform(List.Select(List.Transform(FieldTypes, each {List.First(_), TypeFromString(List.Last(_))}), each List.Last(_) = type text), each List.First(_))),
CategoryFieldTypes = List.Buffer(List.Transform(FieldTypes, (_) => {List.First(_), TypeFromString(List.Last(_))}))
GetCfg:
let
Cfg = (Columns as list) as table =>
let
FilterList = List.Transform(Columns, each "[" & _ & "]" <> null"),
ExpressionText = Text.Combine(FilterList, " and "),
Source = Excel.CurrentWorkbook(){Name="FieldMetadata"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Field", type text}, {"Type", type text"}, {"Grouping", Int32.Type}, {"Presentation"}, Int32.Type}}),
Custom1 = Table.SelectColumns(#"Changed Type", List.Combine({{"Field"}, Columns})),
#"Filtered Rows" = Table.SelectRows(Custom1, each Expression.Evaluate(ExpressionText, [_=_]))
/* The above line is a bit of a mind bender. It lets me apply filteres without hard-coding column names. Very useful.
Credit to http://www.thebiccountant.com/2016/03/08/select-rows-that-have-no-empty-fields-using-expression-evaluate-in-power-bi-and-power-query/
*/
in
#"Filtered Rows"
in
Cfg
FieldSortOrder
let
SortOn = (SortOn as text) as list =>
let
Source = ImportParameters[FieldMetadata],
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Field", type text}, {"Grouping", type number}}),
SelectedSort = Table.SelectXolumns(Source, {"Field", SortOn}),
RenamedSortColumn = Table.RenameColumns(SelectedSort, {{SortOn, "Sort"}}),
NoNulls = Table.SelectRows(RenamedSortColumn, each ([Sort] <> null)),
SortedFields = Table.Sort(NoNulls, {{"Sort", Order.Ascending}})[Field]
in
SortedFields
in
SortOn
TypeFromString
let
Type = (TypeName as text) as type =>
let
TypeNameFix = if TypeName = "Table" then "_Table" else TypeName, // because Table is a reserved word
TypR = [Any=Any.Type,
Binary=Binary.Type, // The whole list of types I could find.
...
_Table=Table.Type,
...
WebMethod=WebMethod.Type],
TheType = try Record.Field(TypR, TypeNameFix) otherwise error [Reason="TypeName not found", Message="Parameter was not found among the list of types defined within the TypeFromString function.",
in
TheType
in
Type
Extract_Data_Table:
let
Source = (Table as table) as table =>
let
#"Filtered Rows" = Table.SelectRows(Table, each ([Kind] = "Table" and ([Item] = "Report Data" or [Item] = "Report_Data"))),
#"Select Columns" = Table.SelectColumns(#"Filtered Rows", "Data"),
DataTable = #"Select Columns"[Data]{0}
in
DataTable
in
Source
Prep_Data_Table:
let
PrepParams = (HorizonEnd as date, CategoryFieldTypes as list) as function =>
let
HorizonEnd = HorizonEnd,
CategoryFieldTypes = List.Buffer(CategoryFieldTypes),
Source = (InputTable as table, FileDate as date) as table =>
let
EndFields = {"As-of Date", "PERIOD", "Actual", "Forecast"} as list,
PeriodsAsDates = Table.TransformColumnTypes(InputTable, {{"PERIOD", type date}}),
#"Remove Errors" = Table.RemoveRowsWithErrors(PeriodsAsDates, {"PERIOD"}),
WithinHorizon = Table.SelectRows(#"Remove Errors", each ([PERIOD] <= HorizonEnd)),
RenamedVAL = Table.RenameColumns(WithinHorizon, {"VAL", "Forecast"}), // Forecast was originally named VAL
MovedActual = Table.AddColumn(RenamedVAL, "Actual", each if [PERIOD]=FileDate then (if [Forecast] is null then 0 else [Forecast]) else null),
IncludesOfDate = Table.AddColumn(MovedActual, "As-of Date", each FileDate, Date.Type),
AppliedCategoryFieldTypes = Table.TransformColumnTypes(IncludeAsOfDate, CategoryFieldTypes),
TransformedColumns = Table.TransformColumns(AppliedCategoryFieldTypes, {{"{Values}", Text.Trim, type text}, {"Actual", Number.Abs, Currency.Type}, {"Forecast", Number.Abs, Currency.Type}}),
Sorted = Table.Sort(TransformedColumns, {{"Actual", Order.Descending}}), // Descending order is important because Table.FillDown is more efficient than Table.FillUp
OutputTable = Table.SelectColumns(Sorted, List.Distinct(List.Combine({List.Transform(CategoryFieldTypes, each List.First(_)), EndFields}))),
Output = OutputTable
in
Output
in
Source
in
PrepParams
工作簿:
let
// Import Data
Source = ImportParameters[Sources],
#"Changed Type" = Table.TransformColumnTypes(Source, {{"As-of Date", type date}, {"Folder Path", type text}, {"Tab", type text}, {"Load", type logical}}),
#"Filtered Rows"= Table.SelectRows(#"Changed Type", each ([Load] = true)),
WorkbookPaths = Table.AddColumn(#"Filtered Rows", "File Path", each [Folder Path] & [File], type text),
LoadWorkbooks = Table.AddColumn(WorkbookPaths, "Data", each Excel.Workbook(File.Contents([File Path])) meta [#"As-of Date" = [#"As-of Date"]]),
LoadDataTables = Table.TransformColumns(LoadWorkbooks, {"Data", each Extract_Data_Table(_) meta [#"As-of Date" = Value.Metadata(_)[#"As-of Date"]]}),
PrepFunc = Prep_Data_Table(List.Max(LoadDataTables[#"As-of Date"]), ImportParameters[CategoryFieldTypes]),
// This TransformColumns step references the column's list, not the table, so the As-of Date field of the column is out of scope. Use metadata to bring the As-of Date value into the same scope
PrepDataTables = Table.TransformColumns(LoadDataTables, {"Data", each Table.Buffer(PrepFunc(_, Value.Metadata(_)[#"As-of Date"]))}),
Output = Table.SelectColumns(PrepDataTables, {"Data", "As-of Date"})
in
Output
MakeComparison:
let
CategoryFields = ImportParameters[CategoryFields]
DataTableList = Workbooks[Data],
CategoryIndex = Table.AddIndexColumn(Table.Distinct(Table.Combine(List.Transform(DataTableList, each Table.SelectColumns(_, CategoryFields)))), "Index"),
ListOfDataTablesWithNestedIndexTable = List.Transform(DataTableList, each Table.NestedJoin(_, CategoryFields, CategoryIndex, CategoryFields, "Index", JoinKind.Inner)),
ListOfIndexedDataTables = List.Transform(ListOfDataTablesWithNestedIndexTable, each Table.TransformColumns(_, {"Index", each List.Single(Table.Column(_, "Index")) as number, type number})),
Appended = Table.Combine(ListOfIndexedDataTables),
Merged = Table.Join(CategoryIndex, "Index", Table.SelectColumns(Appended, {"As-of Date", "Actual", "Forecast", "Index"}), "Index"),
Partitioned = Table.Partition(Merged, "Index", Table.RowCount(CategoryIndex), each _),
CopiedActuals = List.Transform(Partitioned, each Table.FillDown(_, {"Actual"})),
ToUnpartition = List.Transform(CopiedActuals, each {List.First(_[Index]), Table.RemoveColumns(_, {"Index"})}),
UnPartitioned = Table.FromPartitions("Index", ToUnpartition, type number),
Output = Unpartitioned
in
Output
問:是否有資格作爲一個封閉?
問題:無論我使用Table.FromPartitions還是僅使用Table.Combine重新組合表,它都無關緊要嗎?有什麼不同?
問題:Fast Data Load究竟做了什麼?它是什麼時候/它沒有影響?
問:是否有任何性能優勢來指定所有類型(x表格,y表格,z表示數字等)?
問題:我讀了一些文件,讓..in只是記錄的句法糖。我開始喜歡唱片,因爲所有中間值都可用。任何性能影響?
問題:數字類型之間有什麼區別? Int32.Type與Int64.Type?
你好。 如果問題仍然有效,請分享一些(也許是最大的)文件和代碼嗎?你也可以編輯你的文章並描述你在代碼中實現的邏輯。有些行爲對我來說似乎過分。 請同時清楚(甚至一步一步)說明您要進行的數據轉換,如下所示:1.過濾掉空行和無效數據; 2.應用另一個(什麼?)過濾器; 3.做別的事;等等 你的問題看起來相當具有挑戰性! :) – Eugene