當我們在我們的SharePoint實例上執行一些搜索時,我們會在搜索結果中看到幾個文件中的「查看重複」鏈接。查找Sharepoint 2010中的所有重複文檔
有沒有辦法報告所有這些重複?
我見過有這個SQL這裏根據自己的MD5哈希查找重複:http://social.technet.microsoft.com/forums/en-US/sharepointsearch/thread/8a8b25d9-a3ac-45df-86de-2a3a7838a534和我已經糾正了SQL爲SharePoint 2010的兼容性這裏:
-- Step1 : get all files with short names, md5 signatures, and size
SELECT md5 ,
RIGHT(DisplayURL, CHARINDEX('/', REVERSE(DisplayURL)) - 1) AS ShortFileName ,
DisplayURL AS Url ,
llVal/1024 AS FileSizeKb
INTO #listingFilesMd5Size
FROM SearchServiceApplication_CrawlStore.dbo.MSSCrawlURL y
INNER JOIN SearchServiceApplication_PropertyStore.dbo.MSSDocProps dp ON (y.DocID = dp.DocID)
WHERE dp.pid = 58 -- File size
AND llVal > 1024 * 10 -- 10 Kb minimum in size
AND md5 <> 0
AND CHARINDEX('/', REVERSE(DisplayURL)) > 1
-- Step 2: Filter duplicated items
SELECT COUNT(*) AS NbDuplicates ,
md5 ,
ShortFileName ,
FileSizeKb
INTO #duplicates
FROM #listingFilesMd5Size
GROUP BY md5 ,
ShortFileName ,
FileSizeKb
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
DROP TABLE #listingFilesMd5Size
-- Step3 : show the report with search URLs
SELECT *,
NbDuplicates * FileSizeKb AS TotalSpaceKb ,
'http://srv-moss/SearchCenter/Pages/results.aspx?k=' + ShortFileName AS SearchUrl
FROM #duplicates
--ORDER BY NbDuplicates * FileSizeKb DESC
DROP TABLE #duplicates
但這僅匹配確切的重複,而我對SharePoint認爲基於搜索結果中的「查看重複」鏈接的副本感興趣。
我已經看到有託管屬性「DuplicateHash」但這沒有記錄在任何地方,我找不到通過對象模型訪問它的方式。
感謝
是啊,我意識到,查詢數據庫可以使我們在不支持的狀態,因此沒有信心,我們應該使用它。我只用它在我們的開發數據庫,所以現在很好。感謝您的回覆並澄清了有關duplicatehash屬性。 – soniiic 2011-05-20 10:41:48