我加載了很多xml文檔,其中一些返回錯誤,如「十六進制值0x12,是一個無效字符」,並且有不同的字符。如何刪除它們?C#十六進制值0x12,是一個無效字符
18
A
回答
45
我在這裏做了一個小的研究。
這裏是ASCII表。有128個符號 下面是一些小的測試代碼,它添加了ASCII表中的每個符號,並嘗試將其作爲XML文檔加載。
static public void RegexTry()
{
StreamReader stream = new StreamReader(@"test.xml");
string xmlfile = stream.ReadToEnd();
stream.Close();
string text = "";
for (int i = 0; i < 128; i++)
{
char t = (char) i;
text = xmlfile.Replace('П', t);
XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(text);
}
catch (Exception ex)
{
Console.WriteLine("Char("+i.ToString() +"): " + t + " => error! " + ex.Message);
continue;
}
Console.WriteLine("Char(" + i.ToString() + "): " + t + " => fine!");
}
Console.ReadKey();
}
當它返回一個結果:
Char(0): => error! '.', hexadecimal value 0x00, is an invalid character. Line 5, position 7.
Char(1): => error! '', hexadecimal value 0x01, is an invalid character. Line 5, position 7.
Char(2): => error! '', hexadecimal value 0x02, is an invalid character. Line 5, position 7.
Char(3): => error! '', hexadecimal value 0x03, is an invalid character. Line 5, position 7.
Char(4): => error! '', hexadecimal value 0x04, is an invalid character. Line 5, position 7.
Char(5): => error! '', hexadecimal value 0x05, is an invalid character. Line 5, position 7.
Char(6): => error! '', hexadecimal value 0x06, is an invalid character. Line 5, position 7.
Char(7): => error! '', hexadecimal value 0x07, is an invalid character. Line 5, position 7.
Char(8): => error! '', hexadecimal value 0x08, is an invalid character. Line 5, position 7.
Char(9): => fine!
Char(10):
=> fine!
Char(11): => error! '', hexadecimal value 0x0B, is an invalid character. Line 5, position 7.
Char(12): => error! '', hexadecimal value 0x0C, is an invalid character. Line 5, position 7.
Char(13):
=> fine!
Char(14): => error! '', hexadecimal value 0x0E, is an invalid character. Line 5, position 7.
Char(15): => error! '', hexadecimal value 0x0F, is an invalid character. Line 5, position 7.
Char(16): => error! '', hexadecimal value 0x10, is an invalid character. Line 5, position 7.
Char(17): => error! '', hexadecimal value 0x11, is an invalid character. Line 5, position 7.
Char(18): => error! '', hexadecimal value 0x12, is an invalid character. Line 5, position 7.
Char(19): => error! '', hexadecimal value 0x13, is an invalid character. Line 5, position 7.
Char(20): => error! '', hexadecimal value 0x14, is an invalid character. Line 5, position 7.
Char(21): => error! '', hexadecimal value 0x15, is an invalid character. Line 5, position 7.
Char(22): => error! '', hexadecimal value 0x16, is an invalid character. Line 5, position 7.
Char(23): => error! '', hexadecimal value 0x17, is an invalid character. Line 5, position 7.
Char(24): => error! '', hexadecimal value 0x18, is an invalid character. Line 5, position 7.
Char(25): => error! '', hexadecimal value 0x19, is an invalid character. Line 5, position 7.
Char(26): => error! '', hexadecimal value 0x1A, is an invalid character. Line 5, position 7.
Char(27): => error! '', hexadecimal value 0x1B, is an invalid character. Line 5, position 7.
Char(28): => error! '', hexadecimal value 0x1C, is an invalid character. Line 5, position 7.
Char(29): => error! '', hexadecimal value 0x1D, is an invalid character. Line 5, position 7.
Char(30): => error! '', hexadecimal value 0x1E, is an invalid character. Line 5, position 7.
Char(31): => error! '', hexadecimal value 0x1F, is an invalid character. Line 5, position 7.
Char(32): => fine!
Char(33): ! => fine!
Char(34): " => fine!
Char(35): # => fine!
Char(36): $ => fine!
Char(37): % => fine!
Char(38): => error! An error occurred while parsing EntityName. Line 5, position 8.
Char(39): ' => fine!
Char(40): (=> fine!
Char(41):) => fine!
Char(42): * => fine!
Char(43): + => fine!
Char(44): , => fine!
Char(45): - => fine!
Char(46): . => fine!
Char(47):/=> fine!
Char(48): 0 => fine!
Char(49): 1 => fine!
Char(50): 2 => fine!
Char(51): 3 => fine!
Char(52): 4 => fine!
Char(53): 5 => fine!
Char(54): 6 => fine!
Char(55): 7 => fine!
Char(56): 8 => fine!
Char(57): 9 => fine!
Char(58): : => fine!
Char(59): ; => fine!
Char(60): => error! The '<' character, hexadecimal value 0x3C, cannot be included in a name. Line 5, position 13.
Char(61): = => fine!
Char(62): > => fine!
Char(63): ? => fine!
Char(64): @ => fine!
Char(65): A => fine!
Char(66): B => fine!
Char(67): C => fine!
Char(68): D => fine!
Char(69): E => fine!
Char(70): F => fine!
Char(71): G => fine!
Char(72): H => fine!
Char(73): I => fine!
Char(74): J => fine!
Char(75): K => fine!
Char(76): L => fine!
Char(77): M => fine!
Char(78): N => fine!
Char(79): O => fine!
Char(80): P => fine!
Char(81): Q => fine!
Char(82): R => fine!
Char(83): S => fine!
Char(84): T => fine!
Char(85): U => fine!
Char(86): V => fine!
Char(87): W => fine!
Char(88): X => fine!
Char(89): Y => fine!
Char(90): Z => fine!
Char(91): [ => fine!
Char(92): \ => fine!
Char(93): ] => fine!
Char(94):^=> fine!
Char(95): _ => fine!
Char(96): ` => fine!
Char(97): a => fine!
Char(98): b => fine!
Char(99): c => fine!
Char(100): d => fine!
Char(101): e => fine!
Char(102): f => fine!
Char(103): g => fine!
Char(104): h => fine!
Char(105): i => fine!
Char(106): j => fine!
Char(107): k => fine!
Char(108): l => fine!
Char(109): m => fine!
Char(110): n => fine!
Char(111): o => fine!
Char(112): p => fine!
Char(113): q => fine!
Char(114): r => fine!
Char(115): s => fine!
Char(116): t => fine!
Char(117): u => fine!
Char(118): v => fine!
Char(119): w => fine!
Char(120): x => fine!
Char(121): y => fine!
Char(122): z => fine!
Char(123): { => fine!
Char(124): | => fine!
Char(125): } => fine!
Char(126): ~ => fine!
Char(127): => fine!
你可以看到有很多不能在XML代碼符號。要取代它們,我們可以使用Reqex.Replace
static string ReplaceHexadecimalSymbols(string txt)
{
string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";
return Regex.Replace(txt, r,"",RegexOptions.Compiled);
}
PS。對不起,如果大家都知道。
10
的XML specification定義的有效字符是這樣的:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
正如你所看到#x12
不是一個XML文檔中的有效字符。
你問如何刪除它們,但我認爲這不是你應該問的問題。他們應該根本不在場。你應該拒絕任何這樣的文件形式不正確。簡單地刪除無效字符就可以解決真正的問題。
如果您正在創建有問題的文檔,那麼您需要修復生成該文檔的代碼,以便生成有效的XML。
0
這實質上是this question的特例。我建議你使用其中的答案之一。
0
只需使用jhon提供的修補程序更新這些功能,並且必須在代碼中更新這些功能。它會爲你工作,我已經測試過。
private static void WriteDataTableToExcelWorksheet(DataTable dt, WorksheetPart worksheetPart)
{
var worksheet = worksheetPart.Worksheet;
var sheetData = worksheet.GetFirstChild<SheetData>();
string cellValue = "";
// Create a Header Row in our Excel file, containing one header for each Column of data in our DataTable.
//
// We'll also create an array, showing which type each column of data is (Text or Numeric), so when we come to write the actual
// cells of data, we'll know if to write Text values or Numeric cell values.
int numberOfColumns = dt.Columns.Count;
bool[] IsNumericColumn = new bool[numberOfColumns];
string[] excelColumnNames = new string[numberOfColumns];
for (int n = 0; n < numberOfColumns; n++)
excelColumnNames[n] = GetExcelColumnName(n);
//
// Create the Header row in our Excel Worksheet
//
uint rowIndex = 1;
var headerRow = new Row { RowIndex = rowIndex }; // add a row at the top of spreadsheet
sheetData.Append(headerRow);
for (int colInx = 0; colInx < numberOfColumns; colInx++)
{
DataColumn col = dt.Columns[colInx];
AppendTextCell(excelColumnNames[colInx] + "1", col.ColumnName, headerRow);
IsNumericColumn[colInx] = (col.DataType.FullName == "System.Decimal") || (col.DataType.FullName == "System.Int32");
}
//
// Now, step through each row of data in our DataTable...
//
double cellNumericValue = 0;
foreach (DataRow dr in dt.Rows)
{
// ...create a new row, and append a set of this row's data to it.
++rowIndex;
var newExcelRow = new Row { RowIndex = rowIndex }; // add a row at the top of spreadsheet
sheetData.Append(newExcelRow);
for (int colInx = 0; colInx < numberOfColumns; colInx++)
{
cellValue = dr.ItemArray[colInx].ToString();
// Create cell with data
if (IsNumericColumn[colInx])
{
// For numeric cells, make sure our input data IS a number, then write it out to the Excel file.
// If this numeric value is NULL, then don't write anything to the Excel file.
cellNumericValue = 0;
if (double.TryParse(cellValue, out cellNumericValue))
{
cellValue = ReplaceHexadecimalSymbols(cellNumericValue.ToString());
AppendNumericCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow);
}
}
else
{
// For text cells, just write the input data straight out to the Excel file.
AppendTextCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow);
}
}
}
}
static string ReplaceHexadecimalSymbols(string txt)
{
string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";
return Regex.Replace(txt, r, "", RegexOptions.Compiled);
}
private static void AppendTextCell(string cellReference, string cellStringValue, Row excelRow)
{
// Add a new Excel Cell to our Row
Cell cell = new Cell() { CellReference = cellReference, DataType = CellValues.String };
CellValue cellValue = new CellValue();
cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue);
cell.Append(cellValue);
excelRow.Append(cell);
}
private static void AppendNumericCell(string cellReference, string cellStringValue, Row excelRow)
{
// Add a new Excel Cell to our Row
Cell cell = new Cell() { CellReference = cellReference };
CellValue cellValue = new CellValue();
cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue);
cell.Append(cellValue);
excelRow.Append(cell);
}
謝謝,讓我知道你是否需要進一步的幫助。
6
我認爲x26「&」是一個有效的字符,它可以通過XML進行反序列化。
所以更換非法字符,我們應該使用:
// Replace illegal character in XML documents with blank
// See here for reference http://www.w3.org/TR/xml/#charsets
var regex = "[\x00-\x08\x0B\x0C\x0E-\x1F]";
xml = Regex.Replace(xml, r, String.Empty, RegexOptions.Compiled);
0
正則表達式的解決方案甚至100MB的XML文檔的工作相當快。
下面的表達式字符串可以完成這項工作。
"[\x00-\x08\x0B\x0C\x0E-\x1F]"
相關問題
- 1. '<',十六進制值0x3C,是一個無效屬性字符
- 2. 錯誤:十六進制值0x00是一個無效的字符c#
- 3. 十六進制值0X03是無效字符
- 4. 隨機XML例外 - '',十六進制值0x1F,是無效字符
- 5. BizTalk架構開發 - 十六進制值0x19,是無效字符
- 6. 十六進制值爲0x00是無效字符
- 7. XML解析錯誤:十六進制值是無效字符
- 8. 十六進制值0x1F的,是無效字符
- 9. '',十六進制值0x1F,是一個無效的字符。第1行,位置1
- 10. 轉換十六進制字符串十六進制值
- 11. python字符串與十六進制轉義十六進制值
- 12. 在C++中將十六進制十六進制字符標記爲十進制
- 13. C# - 轉換十六進制值的字符串爲十六進制
- 14. 字符被分配到一個特定的十六進制值/十進制值
- 15. C++將十六進制字符串轉換爲十六進制char *十六進制數字
- 16. 無符號字符十六進制到int十進制轉換C++
- 17. C - 顯示字符爲十六進制
- 18. INT爲十六進制字符串(C++)
- 19. C#十六進制字符串問題
- 20. 0x02,0x03十六進制字符在objective-c
- 21. 將十六進制字符串轉換爲無符號十進制值
- 22. C#字符串爲十六進制,十六進制到字節轉換
- 23. 從十六進制字符
- 24. 十六進制字符?
- 25. 十六進制字符
- 26. 字符串到十六進制值
- 27. 指定十六進制值爲字符
- 28. 十六進制值的字符串格式爲十六進制數字排序
- 29. 十六進制字符到實際十六進制字符的字符串
- 30. C#編碼:十六進制到十進制&字符編碼
最大的問題很可能是他們是如何進入擺在首位的XML文檔。 – PMF
您不應該在這裏使用試驗和錯誤。請參閱標準。我的答案包含相關部分。試錯是導致您編寫正則表達式的原因,該正則表達式會從XML文檔中刪除所有「&」字符。這將不會結束! –