2015-01-15 45 views
0

我被困在一個可能很難解決任何其他問題的問題上。我想從通過套接字接收的xml字符串中創建元素樹,而不是從文件中獲取。xml字符串中的無效標記,無法創建元素樹python

方法:

下面這Python腳本是插座客戶端接收一個python字符串(恰好是XML)是通過使用TinyXML的一個C++服務器創建的。

程序步驟: 1)創建套接字 2)接收的XML字符串 3)解析XML成可在其他地方使用的元素樹

問題:

功能fromstring()不能似乎弄明白了。這裏是我的代碼:

import socket 
import sys 
import struct 
import binascii 
import io 
import re 
from xml.etree import ElementTree 

#illegal characters to remove from string later before going to xml 
RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \ 
      u'|' + \ 
      u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \ 
       (unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff), 
       unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff), 
       unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff)) 

HOST = 'localhost' 
PORT = 50008 

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
print 'Socket created' 
print 'Socket now connecting' 
s.connect((HOST,PORT)) 
s.send('1')#as long as we are not sending "0" cpp server will return information.   

#declare global xml object "root" 
global root 

while 1: 
    data = s.recv(1024)#receive the initial message 
    data3 = data[:3]#get first 3 letters 
    if (data3 == "New"): 
     #get ready for new packet 
     nextsizestring = data[3:] 
     nextsizestring2 = nextsizestring.rstrip('\0') 
     nextsize = int(nextsizestring2,10) 
     s.send('b')#tell cpp we are ready for the packet 

     databuf = s.recv(nextsize)#data buffer as a python string 
     databuf2 = re.sub(RE_XML_ILLEGAL, "?", databuf)#remove illegal xml characters 
     print(databuf2) 
     root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))#convert to element tree 
     print(root) 

    elif (data3 != "New"): 
     print("WARNING! TCP SYNCH HAS FAILED") 
    if not data: break#if not data then stop listening for more 

    s.send('b')#keep sending anything but zero to get more stuff 
conn.close() 
s.close() 

,這裏是輸出:

Socket created 
Socket now connecting 
<Frame> 
    <FrameNumber ="1509677" /> 
    <Time ="27427839" /> 
    <Forceplatedata> 
      <Forceplate_0> 
       <Subframe#_0> 
        <F_x ="0" /> 
        <F_y ="0" /> 
        <F_z ="0" /> 
       </Subframe#_0> 
. 
. 
. 
</Frame> 

Traceback (most recent call last): 
    File "<string>", line 11, in <module> 
    File "C:\Users\Gelsey Torres- Oviedo\Desktop\VizardFolderVRServer\Python2CPP_Client_rev1.py", line 50, in <module> 
    root = ElementTree.ElementTree(ElementTree.fromstring(databuf2)) 
    File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line  1282, in XML 
    parser.feed(text) 
    File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1624, in feed 
    self._raiseerror(v) 
    File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror 
    raise err 
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 18 

我把上面的截斷,因爲它是相當長XML字符串的自由。正如你在錯誤中看到的那樣,它看起來像第2行第18行有一個問題,我認爲它是空格字符。我不明白爲什麼會發生這種情況。

失敗解決方案:

1)StringIO的解析(通字符串) 2)編碼和解碼的UTF-8 3)minidom命名類似的方法的幾種變化

我猜測這是一個句法問題?我可能在做一些非常愚蠢的事情......

+0

這......不是xml的工作原理,是嗎?你需要'',而不僅僅是'',不是嗎? – senshin

+0

好點,我沒有想到,我用tinyxml setattribute()來放置這些數字,但是沒有注意到這個語法可能是錯誤的。讓我檢查一下...... – willpower2727

回答

0

Senshin說什麼是關鍵問題。我正在創建不良格式的XML。

通過改變所有的地方,它看起來像

<FrameNumber ="1381949" /> 

<FrameNumber attribute="1381949" /> 

程序現在可以創建元素樹。

我知道這件事很簡單,謝謝!