提取列表

我做我在語言學項目（語言爲馬拉雅拉姆語），從每個項目的整數和統一碼。提取列表

我的目錄是

x= [u'1\u0d30\u0d3e\u0d2e\u0d28\u0d4d\u200d', u'5\u0d05\u0d35\u0d28\u0d4d\u200d']

我想提取從列表中每個項目的整數和統一碼。

預期的輸出是

1 \u0d30\u0d3e\u0d2e\u0d28\u0d4d\u200 
5 \u0d05\u0d35\u0d28\u0d4d\u200d

首先我試圖到第一項x [0]轉換成ASCII

print unicodedata.normalize('NFKD',x[0]).encode('ascii','ignore')

輸出爲1。

我認爲產生這種輸出，因爲在列表中的Unicode是馬拉雅拉姆語。

然後我試圖找到這樣發生的「\ U」狀

x[0].index("\u")

錯誤的第一指標。

來源

2014-02-25 user3251664

看看這裏蟒'更多信息repr'功能：HTTP：// stackove rflow.com/questions/7784148/understanding-repr-function-in-python – jayelm

\uXXXX表示單個Unicode字符，而不是在字符串中的字符序列的字符序列。

就可以得到期望的輸出如下所示：

for i in x: 
    print int(i[0]), repr(i[1:])[2:-1]

（假設整數僅具有一個數字）

對於更一般的情況下，一種解決方案是使用正則表達式來提取整數：

import re 
for i in x: 
    s = re.match('([0-9]+)', i).group(1) 
    print int(s), repr(i[len(s):])[2:-1]

來源

2014-02-25 06:29:44 isedev

>>> x= [u'1\u0d30\u0d3e\u0d2e\u0d28\u0d4d\u200d', u'5\u0d05\u0d35\u0d28\u0d4d\u200d'] 
>>> res = [ (i[:1], i[1:]) for i in x ] 
>>> res 
[(u'1', u'\u0d30\u0d3e\u0d2e\u0d28\u0d4d\u200d'), (u'5', u'\u0d05\u0d35\u0d28\u0d4d\u200d')] 

>>> for i in res: 
...  print i[0], repr(i[1]) 
... 
1 u'\u0d30\u0d3e\u0d2e\u0d28\u0d4d\u200d' 
5 u'\u0d05\u0d35\u0d28\u0d4d\u200d'

來源

2014-02-25 06:47:51

的表示'res'的解釋是輸出OP想要的，但它不是你得到什麼，當你'print'它。您需要使用'repr'函數來獲取對象表示。 – jayelm

是的，我得到它，如果我們使用「打印」它打印的實際unicode的對象。所以爲此我們需要使用repr函數。謝謝:) –

上面的代碼將正常工作，只有個位數。 – user3251664

回答

相關問題