如果您提供
'folderA/folderB/folderC/mainfolder/*/*'
作爲輸入,並希望過濾掉特定的路徑,你可能想創建一個自定義PathFilter
在FileInputFormat你有這樣的功能 -
static void setInputPathFilter (JobConf conf, Class<? extends PathFilter> filter)
Info: Set a PathFilter to be applied to the input paths for the map-reduce job
例如
public static class CustomPathFilter implements PathFilter {
@Override
public boolean accept(Path path) {
//you can implement your logic for finding the valid range of paths here.
//The valid range of dates and days for directories and files can be input
//as arguments to the job.
//Return true if you find a match or else return false.
return false;
}
}
這樣註冊PathFilter -
FileInputFormat.setInputPathFilter(job, CustomPathFilter.class);
塔卡納我用在輸入文件路徑的正則表達式來指定一個範圍? – user2441441
@ user2441441:例如:如果您使用'folderA/folderB/folderC/mainfolder/*/*',則所有文件都將被視爲匹配/ */*,並且當您在PathFilter中通過自定義日期範圍進行篩選時,只有特定的文件纔會被處理。如果你正在尋找別的東西,請回復。謝謝! –