2014-01-10 105 views

回答

45

我在這裏做了一個小的研究。

這裏是ASCII表。有128個符號 asciitable 下面是一些小的測試代碼,它添加了ASCII表中的每個符號,並嘗試將其作爲XML文檔加載。

static public void RegexTry() 
{ 
    StreamReader stream = new StreamReader(@"test.xml"); 
    string xmlfile = stream.ReadToEnd(); 
    stream.Close(); 

    string text = ""; 

    for (int i = 0; i < 128; i++) 
    { 
     char t = (char) i; 

     text = xmlfile.Replace('П', t); 

     XmlDocument xml = new XmlDocument(); 
     try 
     { 
      xml.LoadXml(text); 
     } 
     catch (Exception ex) 
     { 
      Console.WriteLine("Char("+i.ToString() +"): " + t + " => error! " + ex.Message); 
      continue; 
     } 

     Console.WriteLine("Char(" + i.ToString() + "): " + t + " => fine!"); 
    } 

    Console.ReadKey(); 
} 

當它返回一個結果:

Char(0): => error! '.', hexadecimal value 0x00, is an invalid character. Line 5, position 7. 
Char(1): => error! '', hexadecimal value 0x01, is an invalid character. Line 5, position 7. 
Char(2): => error! '', hexadecimal value 0x02, is an invalid character. Line 5, position 7. 
Char(3): => error! '', hexadecimal value 0x03, is an invalid character. Line 5, position 7. 
Char(4): => error! '', hexadecimal value 0x04, is an invalid character. Line 5, position 7. 
Char(5): => error! '', hexadecimal value 0x05, is an invalid character. Line 5, position 7. 
Char(6): => error! '', hexadecimal value 0x06, is an invalid character. Line 5, position 7. 
Char(7): => error! '', hexadecimal value 0x07, is an invalid character. Line 5, position 7. 
Char(8): => error! '', hexadecimal value 0x08, is an invalid character. Line 5, position 7. 
Char(9):  => fine! 
Char(10): 
=> fine! 
Char(11): => error! '', hexadecimal value 0x0B, is an invalid character. Line 5, position 7. 
Char(12): => error! '', hexadecimal value 0x0C, is an invalid character. Line 5, position 7. 
Char(13): 
=> fine! 
Char(14): => error! '', hexadecimal value 0x0E, is an invalid character. Line 5, position 7. 
Char(15): => error! '', hexadecimal value 0x0F, is an invalid character. Line 5, position 7. 
Char(16): => error! '', hexadecimal value 0x10, is an invalid character. Line 5, position 7. 
Char(17): => error! '', hexadecimal value 0x11, is an invalid character. Line 5, position 7. 
Char(18): => error! '', hexadecimal value 0x12, is an invalid character. Line 5, position 7. 
Char(19): => error! '', hexadecimal value 0x13, is an invalid character. Line 5, position 7. 
Char(20): => error! '', hexadecimal value 0x14, is an invalid character. Line 5, position 7. 
Char(21): => error! '', hexadecimal value 0x15, is an invalid character. Line 5, position 7. 
Char(22): => error! '', hexadecimal value 0x16, is an invalid character. Line 5, position 7. 
Char(23): => error! '', hexadecimal value 0x17, is an invalid character. Line 5, position 7. 
Char(24): => error! '', hexadecimal value 0x18, is an invalid character. Line 5, position 7. 
Char(25): => error! '', hexadecimal value 0x19, is an invalid character. Line 5, position 7. 
Char(26): => error! '', hexadecimal value 0x1A, is an invalid character. Line 5, position 7. 
Char(27): => error! '', hexadecimal value 0x1B, is an invalid character. Line 5, position 7. 
Char(28): => error! '', hexadecimal value 0x1C, is an invalid character. Line 5, position 7. 
Char(29): => error! '', hexadecimal value 0x1D, is an invalid character. Line 5, position 7. 
Char(30): => error! '', hexadecimal value 0x1E, is an invalid character. Line 5, position 7. 
Char(31): => error! '', hexadecimal value 0x1F, is an invalid character. Line 5, position 7. 
Char(32): => fine! 
Char(33): ! => fine! 
Char(34): " => fine! 
Char(35): # => fine! 
Char(36): $ => fine! 
Char(37): % => fine! 
Char(38): => error! An error occurred while parsing EntityName. Line 5, position 8. 
Char(39): ' => fine! 
Char(40): (=> fine! 
Char(41):) => fine! 
Char(42): * => fine! 
Char(43): + => fine! 
Char(44): , => fine! 
Char(45): - => fine! 
Char(46): . => fine! 
Char(47):/=> fine! 
Char(48): 0 => fine! 
Char(49): 1 => fine! 
Char(50): 2 => fine! 
Char(51): 3 => fine! 
Char(52): 4 => fine! 
Char(53): 5 => fine! 
Char(54): 6 => fine! 
Char(55): 7 => fine! 
Char(56): 8 => fine! 
Char(57): 9 => fine! 
Char(58): : => fine! 
Char(59): ; => fine! 
Char(60): => error! The '<' character, hexadecimal value 0x3C, cannot be included in a name. Line 5, position 13. 
Char(61): = => fine! 
Char(62): > => fine! 
Char(63): ? => fine! 
Char(64): @ => fine! 
Char(65): A => fine! 
Char(66): B => fine! 
Char(67): C => fine! 
Char(68): D => fine! 
Char(69): E => fine! 
Char(70): F => fine! 
Char(71): G => fine! 
Char(72): H => fine! 
Char(73): I => fine! 
Char(74): J => fine! 
Char(75): K => fine! 
Char(76): L => fine! 
Char(77): M => fine! 
Char(78): N => fine! 
Char(79): O => fine! 
Char(80): P => fine! 
Char(81): Q => fine! 
Char(82): R => fine! 
Char(83): S => fine! 
Char(84): T => fine! 
Char(85): U => fine! 
Char(86): V => fine! 
Char(87): W => fine! 
Char(88): X => fine! 
Char(89): Y => fine! 
Char(90): Z => fine! 
Char(91): [ => fine! 
Char(92): \ => fine! 
Char(93): ] => fine! 
Char(94):^=> fine! 
Char(95): _ => fine! 
Char(96): ` => fine! 
Char(97): a => fine! 
Char(98): b => fine! 
Char(99): c => fine! 
Char(100): d => fine! 
Char(101): e => fine! 
Char(102): f => fine! 
Char(103): g => fine! 
Char(104): h => fine! 
Char(105): i => fine! 
Char(106): j => fine! 
Char(107): k => fine! 
Char(108): l => fine! 
Char(109): m => fine! 
Char(110): n => fine! 
Char(111): o => fine! 
Char(112): p => fine! 
Char(113): q => fine! 
Char(114): r => fine! 
Char(115): s => fine! 
Char(116): t => fine! 
Char(117): u => fine! 
Char(118): v => fine! 
Char(119): w => fine! 
Char(120): x => fine! 
Char(121): y => fine! 
Char(122): z => fine! 
Char(123): { => fine! 
Char(124): | => fine! 
Char(125): } => fine! 
Char(126): ~ => fine! 
Char(127): => fine! 

你可以看到有很多不能在XML代碼符號。要取代它們,我們可以使用Reqex.Replace

static string ReplaceHexadecimalSymbols(string txt) 
{ 
    string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]"; 
    return Regex.Replace(txt, r,"",RegexOptions.Compiled); 
} 

PS。對不起,如果大家都知道。

+3

最大的問題很可能是他們是如何進入擺在首位的XML文檔。 – PMF

+5

您不應該在這裏使用試驗和錯誤。請參閱標準。我的答案包含相關部分。試錯是導致您編寫正則表達式的原因,該正則表達式會從XML文檔中刪除所有「&」字符。這將不會結束! –

10

XML specification定義的有效字符是這樣的:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 

正如你所看到#x12不是一個XML文檔中的有效字符。

你問如何刪除它們,但我認爲這不是你應該問的問題。他們應該根本不在場。你應該拒絕任何這樣的文件形式不正確。簡單地刪除無效字符就可以解決真正的問題。

如果您正在創建有問題的文檔,那麼您需要修復生成該文檔的代碼,以便生成有效的XML。

0

這實質上是this question的特例。我建議你使用其中的答案之一。

0

只需使用jhon提供的修補程序更新這些功能,並且必須在代碼中更新這些功能。它會爲你工作,我已經測試過。

private static void WriteDataTableToExcelWorksheet(DataTable dt, WorksheetPart worksheetPart) 
    { 
     var worksheet = worksheetPart.Worksheet; 
     var sheetData = worksheet.GetFirstChild<SheetData>(); 

     string cellValue = ""; 

     // Create a Header Row in our Excel file, containing one header for each Column of data in our DataTable. 
     // 
     // We'll also create an array, showing which type each column of data is (Text or Numeric), so when we come to write the actual 
     // cells of data, we'll know if to write Text values or Numeric cell values. 
     int numberOfColumns = dt.Columns.Count; 
     bool[] IsNumericColumn = new bool[numberOfColumns]; 

     string[] excelColumnNames = new string[numberOfColumns]; 
     for (int n = 0; n < numberOfColumns; n++) 
      excelColumnNames[n] = GetExcelColumnName(n); 

     // 
     // Create the Header row in our Excel Worksheet 
     // 
     uint rowIndex = 1; 

     var headerRow = new Row { RowIndex = rowIndex }; // add a row at the top of spreadsheet 
     sheetData.Append(headerRow); 

     for (int colInx = 0; colInx < numberOfColumns; colInx++) 
     { 
      DataColumn col = dt.Columns[colInx]; 
      AppendTextCell(excelColumnNames[colInx] + "1", col.ColumnName, headerRow); 
      IsNumericColumn[colInx] = (col.DataType.FullName == "System.Decimal") || (col.DataType.FullName == "System.Int32"); 
     } 

     // 
     // Now, step through each row of data in our DataTable... 
     // 
     double cellNumericValue = 0; 
     foreach (DataRow dr in dt.Rows) 
     { 
      // ...create a new row, and append a set of this row's data to it. 
      ++rowIndex; 
      var newExcelRow = new Row { RowIndex = rowIndex }; // add a row at the top of spreadsheet 
      sheetData.Append(newExcelRow); 

      for (int colInx = 0; colInx < numberOfColumns; colInx++) 
      { 
       cellValue = dr.ItemArray[colInx].ToString(); 

       // Create cell with data 
       if (IsNumericColumn[colInx]) 
       { 
        // For numeric cells, make sure our input data IS a number, then write it out to the Excel file. 
        // If this numeric value is NULL, then don't write anything to the Excel file. 
        cellNumericValue = 0; 
        if (double.TryParse(cellValue, out cellNumericValue)) 
        { 
         cellValue = ReplaceHexadecimalSymbols(cellNumericValue.ToString()); 
         AppendNumericCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow); 
        } 
       } 
       else 
       { 
        // For text cells, just write the input data straight out to the Excel file. 
        AppendTextCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow); 
       } 
      } 
     } 
    } 
    static string ReplaceHexadecimalSymbols(string txt) 
    { 
     string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]"; 
     return Regex.Replace(txt, r, "", RegexOptions.Compiled); 
    } 

    private static void AppendTextCell(string cellReference, string cellStringValue, Row excelRow) 
    { 
     // Add a new Excel Cell to our Row 
     Cell cell = new Cell() { CellReference = cellReference, DataType = CellValues.String }; 
     CellValue cellValue = new CellValue(); 
     cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue); 
     cell.Append(cellValue); 
     excelRow.Append(cell); 
    } 

    private static void AppendNumericCell(string cellReference, string cellStringValue, Row excelRow) 
    { 
     // Add a new Excel Cell to our Row 
     Cell cell = new Cell() { CellReference = cellReference }; 
     CellValue cellValue = new CellValue(); 
     cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue); 
     cell.Append(cellValue); 
     excelRow.Append(cell); 
    } 

謝謝,讓我知道你是否需要進一步的幫助。

6

我認爲x26「&」是一個有效的字符,它可以通過XML進行反序列化。

所以更換非法字符,我們應該使用:

// Replace illegal character in XML documents with blank 
// See here for reference http://www.w3.org/TR/xml/#charsets 
var regex = "[\x00-\x08\x0B\x0C\x0E-\x1F]"; 
xml = Regex.Replace(xml, r, String.Empty, RegexOptions.Compiled); 
0

正則表達式的解決方案甚至100MB的XML文檔的工作相當快。

下面的表達式字符串可以完成這項工作。

"[\x00-\x08\x0B\x0C\x0E-\x1F]" 
相關問題