2011-07-14 223 views
3

我不得不將RTF格式數據庫中保存的大量文本更改爲純文本。我使用的方法described in this MSDN article但是我認爲我發現了一個障礙(我不認爲它是在我的代碼,但.NET框架本身)。加速將RTF轉換爲純文本

我有以下功能

//convert RTF text to plain text 
    public static string RtfTextToPlainText(string FormatObject) 
    { 
     System.Windows.Forms.RichTextBox rtfBox = new System.Windows.Forms.RichTextBox(); 
     rtfBox.Rtf = FormatObject; 
     FormatObject = rtfBox.Text; //This is line 494 for later reference for the stack traces. 
     rtfBox.Dispose(); 

     return FormatObject; 
    } 

應該自我被完全包含和什麼也沒阻止。我正在做的項目有數百萬條需要處理的記錄,因此我正在分批分解工作並使用任務進行並行處理。它仍然很慢,所以我插入代碼並找到了它。

enter image description here

下面是等待任務

[In a sleep, wait, or join] 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x242 bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes 
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes  
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494 C# 

調用堆棧這裏是線程的調用堆棧816

[Managed to Native Transition] 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.WmWindowPosChanged(ref System.Windows.Forms.Message m) + 0x39 bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x51b bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes  
[Native to Managed Transition] 
[Managed to Native Transition] 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.WmCreate(ref System.Windows.Forms.Message m) + 0x1c bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x50b bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes  
[Native to Managed Transition] 
[Managed to Native Transition] 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x44c bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes 
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes  
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494 C# 

爲什麼任務2阻塞任務4 494行,他們不應該完全相互獨立嗎?


注意

我抓住這些堆棧跟蹤和屏幕截圖,同時在發佈模式,我似乎無法在正確的時間打暫停得到在調試模式下發生同樣的事情。這也可能是我緩慢的原因嗎?分析器表示我的程序花費了83.2%的時間在System.Windows.Forms.RichTextBox.set_Rtf(字符串)(這是一個子函數,由494行調用)

有關如何加速此分割過程的任何建議出rtf的格式將不勝感激。


P.S.

我目前正在改寫它,所以每個線程都會有一個文本框,不會在每次調用該函數時創建一個新文本框,我希望加快它的速度,我會更新詳細瞭解我做到了。


UPDATE

我解決我自己的問題(請參閱下面的回答),但這裏是我開始的任務

//create start consumer threads 
for (int i = 0; i < ThreadsPreProducer; i++) 
{ 
    //create worked and thread 
    WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule); 
    Task WorkerTask = new Task(NewWorkerObject.DoWork); 
    WorkerTasks.Add(WorkerTask); 
    WorkerTask.Start(); 
} 


//create/start producer thread 
ProducerObject NewProducerObject = new ProducerObject(colSource, SourceQuery, ConnectionString, PreProcessor, UpdateModule, RowNameIndex); 
Task ProducerTask = new Task(NewProducerObject.DoWork); 
WorkerTasks.Add(ProducerTask); 
ProducerTask.Start(); 


//block while producer runs 
ProducerTask.Wait(); 

//create post producer threads 
for (int i = 0; i < ThreadsPostProducer; i++) 
{ 
    //create worked and thread 
    WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule); 
    Task WorkerTask = new Task(NewWorkerObject.DoWork); 
    WorkerTasks.Add(WorkerTask); 
    WorkerTask.Start(); 
} 

//block until all tasks are done 
Task.WaitAll(WorkerTasks.ToArray()); 

它採用了生產者/消費者模式,我的個案,1個生產者和4個消費者(2個在開始時開始,2個在生產者完成後加速工作,在系統資源從生產者中解放出來後開始)。

+0

爲了得到你的代碼的完整圖片,你可以發佈你如何創建任務? –

+0

@Ramhound我更新了,以顯示我如何開始任務,但我通過使rtf框線程本地解決了我的問題。 –

回答

4

更改功能

static ThreadLocal<RichTextBox> rtfBox = new ThreadLocal<RichTextBox>(() => new RichTextBox()); 
//convert RTF text to plain text 
public static string RtfTextToPlainText(string FormatObject) 
{ 
    rtfBox.Value.Rtf = FormatObject; 
    FormatObject = rtfBox.Value.Text; 
    rtfBox.Value.Clear(); 

    return FormatObject; 
} 

改變了我的運行時間從幾分鐘到幾秒鐘。

我不會處理這些對象,因爲它們將用於整個程序的整個生命週期。

+0

對你有好處! 很高興你有它的工作。 +1用於發佈自己問題的答案,以便其他人可以獲益。 –