2014-10-01 131 views
0

我在解析C#中的一些XML數據時遇到了一些麻煩。XML文檔中存在錯誤(155,23)。沒有錯誤,總是在第13頁

方法摘要:

該方法採用一個關鍵字,然後通過使用該網站的URI搜索該關鍵字在www.clinicaltrials.com。例如:

http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true

該URI將以臨牀試驗的形式將臨牀研究存儲爲XML。由於大量的臨牀數據,他們每頁只有20項研究。因此,要進入下一頁,您必須添加& pg = 2,以轉到第二頁。我的代碼解析所有頁面並將每個頁面轉換爲C#對象。

問題:

的問題是,當它到達13它與下面的錯誤崩潰頁:

InvalidOperationException was unhandled: There is an error in XML document (155, 23)

當我複製XML爲13頁,每頁12或任何其他頁面接近第13頁到XML驗證器,它說它很好。當我自己搜索xml時,我找不到任何錯誤。我在想也許內存已滿,但在240個對象之後?如果我搜索一個關鍵字,它可以檢索到少於13頁的結果。

我已經寫了以檢索並解析XML,你可以在這裏閱讀的代碼:

public List<search_resultsClinical_study> SearchStudyByKeyword(string keyword) 
    { 
     int currentPage = 1; 
     double numberOfStudiesOnAPage = 20; 
     double totalPages = 1; //if not it will crash anyways 
     List<search_results> searchResult = new List<search_results>(); 

     try 
     { 
      while (totalPages >= currentPage) 
      { 
       //crashes if search is larger then 13 pages... have to figure out why.... 
       string newUri = URI + "ct2/results?term=" + keyword + "&Search=Search&displayxml=true&pg=" + currentPage ; 
       System.Xml.Serialization.XmlSerializer reader = new System.Xml.Serialization.XmlSerializer(typeof(search_results)); 
       XmlReader xmlReader = XmlReader.Create(newUri); 
       search_results studies = new search_results(); 
       studies = (search_results)reader.Deserialize(xmlReader); 
       searchResult.Add(studies); 
       totalPages = Math.Ceiling((double)studies.count/numberOfStudiesOnAPage); 
       currentPage += 1; 

      } 
      //return searchResult; 
      //Append all studies to one list, easier to handle for user 
      List<search_resultsClinical_study> result = new List<search_resultsClinical_study>(); 
      foreach (search_results sr in searchResult) 
      { 
       foreach (search_resultsClinical_study cs in sr.clinical_study) 
       { 
        result.Add(cs); 
       } 
      } 
      return result; 
     } 

     catch (WebException) 
     { 
      Debug.Write("404 - Might be a invalid search term "); 
      return null; 
     } 


    } 

錯誤出現在以下行:

studies = (search_results)reader.Deserialize(xmlReader); 

search_result類:

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
[System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)] 
public partial class search_results 
{ 

    private string queryField; 

    private search_resultsClinical_study[] clinical_studyField; 

    private uint countField; 

    /// <remarks/> 
    public string query 
    { 
     get 
     { 
      return this.queryField; 
     } 
     set 
     { 
      this.queryField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlElementAttribute("clinical_study")] 
    public search_resultsClinical_study[] clinical_study 
    { 
     get 
     { 
      return this.clinical_studyField; 
     } 
     set 
     { 
      this.clinical_studyField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlAttributeAttribute()] 
    public uint count 
    { 
     get 
     { 
      return this.countField; 
     } 
     set 
     { 
      this.countField = value; 
     } 
    } 
} 

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
public partial class search_resultsClinical_study 
{ 

    private byte orderField; 

    private decimal scoreField; 

    private string nct_idField; 

    private string urlField; 

    private string titleField; 

    private search_resultsClinical_studyStatus statusField; 

    private string condition_summaryField; 

    private string last_changedField; 

    /// <remarks/> 
    public byte order 
    { 
     get 
     { 
      return this.orderField; 
     } 
     set 
     { 
      this.orderField = value; 
     } 
    } 

    /// <remarks/> 
    public decimal score 
    { 
     get 
     { 
      return this.scoreField; 
     } 
     set 
     { 
      this.scoreField = value; 
     } 
    } 

    /// <remarks/> 
    public string nct_id 
    { 
     get 
     { 
      return this.nct_idField; 
     } 
     set 
     { 
      this.nct_idField = value; 
     } 
    } 

    /// <remarks/> 
    public string url 
    { 
     get 
     { 
      return this.urlField; 
     } 
     set 
     { 
      this.urlField = value; 
     } 
    } 

    /// <remarks/> 
    public string title 
    { 
     get 
     { 
      return this.titleField; 
     } 
     set 
     { 
      this.titleField = value; 
     } 
    } 

    /// <remarks/> 
    public search_resultsClinical_studyStatus status 
    { 
     get 
     { 
      return this.statusField; 
     } 
     set 
     { 
      this.statusField = value; 
     } 
    } 

    /// <remarks/> 
    public string condition_summary 
    { 
     get 
     { 
      return this.condition_summaryField; 
     } 
     set 
     { 
      this.condition_summaryField = value; 
     } 
    } 

    /// <remarks/> 
    public string last_changed 
    { 
     get 
     { 
      return this.last_changedField; 
     } 
     set 
     { 
      this.last_changedField = value; 
     } 
    } 
} 

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
public partial class search_resultsClinical_studyStatus 
{ 

    private string openField; 

    private string valueField; 

    /// <remarks/> 
    [System.Xml.Serialization.XmlAttributeAttribute()] 
    public string open 
    { 
     get 
     { 
      return this.openField; 
     } 
     set 
     { 
      this.openField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlTextAttribute()] 
    public string Value 
    { 
     get 
     { 
      return this.valueField; 
     } 
     set 
     { 
      this.valueField = value; 
     } 
    } 
} 

XML失敗:

http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true&pg=13

有誰得到了,爲什麼會出現這個錯誤的線索?我還添加了一個XmlSchema,並嘗試基於XmlSchema生成C#類!

感謝您的幫助!

+0

做這個簡單的測試:在試圖反序列化之前,將每個頁面轉儲到硬盤上。你可以這樣做:http://stackoverflow.com/questions/3988832/how-to-create-an-xml-file-from-a-xmlreader之後,嘗試並反序列化硬盤上的文件。 – 2014-10-01 09:00:28

+0

嘿,謝謝你的迴應!即使我在嘗試反序列化之前將每個頁面轉儲到硬盤,我仍然得到相同的錯誤。 – 2014-10-01 09:33:56

+0

附加您遇到問題的具體XML並添加search_results的結構。 – 2014-10-01 10:02:41

回答

1

private byte orderField;

Type Range Size .NET Framework type byte 0 to 255 Unsigned 8-bit integer System.Byte

只要它到達這個記錄,它可能會崩潰。

<clinical_study> 
    <order>256</order> 
    <score>1.00</score> 
    <nct_id>NCT00006461</nct_id> 
    <url>http://ClinicalTrials.gov/show/NCT00006461</url> 
    <title> 
     Combination Chemotherapy Followed by Second-Look Surgery and ... 
    </title> 
    <status open="N">Completed</status> 
    <condition_summary> 
     Untreated Childhood Medulloblastoma; Untreated Childhood.. 
    </condition_summary> 
    <last_changed>August 7, 2013</last_changed> 
</clinical_study> 

正如你所看到的,字節不能與256的值爲了保持你平時檢測此類問題的方法是,你總是驗證對反序列化之前的模式(S)的一切。

Ps你給定的模式似乎是3歲。它沒有這樣的屬性,比如「condition_summary」等等。你可能最好從頭開始創建自己的,或者從現有的XML創建自己的。

+0

謝謝!我將此標記爲已解決,因爲它現在已經有意義了!由於名譽太低,我不能投票,但以後會做!再次感謝! – 2014-10-01 11:30:27

相關問題