unicodedata.digit和unicodedata.numeric有什麼區別？

從unicodedata DOC：unicodedata.digit和unicodedata.numeric有什麼區別？

unicodedata.digit（CHR [，默認]）返回分配給字符CHR作爲整數的數位值。如果沒有定義這樣的值，則返回默認值，否則引發ValueError。

unicodedata.numeric（chr [，default]）以float形式返回字符chr分配的數字值。如果沒有定義這樣的值，則返回默認值，否則引發ValueError。

有人可以解釋我這兩個功能之間的區別嗎？

這裏的人可以讀取the implementation of both functions但對我來說不明顯與快速查看有什麼不同，因爲我不熟悉CPython實現。

EDIT 1：

將是很好，顯示差的例子。

編輯2：

例子來自@補充意見和壯觀的答案user2357112有用：

print(unicodedata.digit('1')) # Decimal digit one. 
print(unicodedata.digit('١')) # ARABIC-INDIC digit one 
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated. 

print(unicodedata.numeric('Ⅱ')) # Roman number two. 
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.

來源

2017-08-28 gsi-frank

我相信'數字'除了阿拉伯數字之外還適用於其他數字字符，比如DEVANAGIRI ONE等等。 –

@cᴏʟᴅsᴘᴇᴇᴅ你能舉一個例子來說明一下嗎？ –

從類型和描述來看，數字是用於實際數字的，數字可以處理粗俗分數（例如¾）。來自doc的 – weirdan

答案很簡單：如果一個字符代表一個十進制數字

，所以諸如1,¹（SUPERSCRIPT ONE），①（CIRCLED DIGIT ONE），١（ARABIC-INDIC DIGIT ONE），unicodedata.digit將返回字符表示爲int的數字（因此所有這些示例均爲1）。

如果字符表示任何數值，那麼諸如⅐（VULGAR FRACTION ONE SEVENTH）和所有十進制數字示例unicodedata.numeric將會將該字符的數值作爲浮點值。

由於技術原因，最近的數字字符如（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）可能會從unicodedata.digit引發ValueError。

龍答：

Unicode字符都有一個Numeric_Type財產。該屬性可以有4個可能的值：Numeric_Type = Decimal，Numeric_Type = Digit，Numeric_Type = Numeric或Numeric_Type = None。

引述Unicode standard, version 10.0.0, section 4.6，

的Numeric_Type =十進制屬性值（其與General_Category =釹屬性值相關）被限制爲以十進制基數使用號碼，併爲那些數字字符其中一組完整的數字已被編碼在連續的範圍內，以Numeric_Value的升序排列，並且數字零作爲範圍內的第一個編碼點。

Numeric_Type =十進制字符因此是十進制數字符合一些其他特定的技術要求。

十進制數字，如通過這些屬性分配Unicode標準定義，排除某些字符，如CJK表意位數（參見表4-5第一十個條目），未在編碼一個連續的序列。小數位還不包括兼容性下標和上標數字，以防止簡單化的解析器在上下文中錯誤地解釋的值。（有關上標和下標的更多信息，請參見第22.4節，上標和下標符號。）傳統上，Unicode字符數據庫已將這些非連續或兼容性數字的集合賦值爲Numeric_Type = Digit，以識別它們由但不一定符合Numeric_Type = Decimal的所有條件。但是， Numeric_Type = Digit和更通用的Numeric_Type = Numeric之間的區別已被證明不是在實現中很有用。因此，未來可能添加到標準並且不符合Numeric_Type = Decimal標準的數字集合將簡單地分配爲值Numeric_Type = Numeric的。

所以在歷史上使用Numeric_Type =數字不配合Numeric_Type =十進制的技術要求其他數字，但他們決定不實用，而數字字符不符合Numeric_Type =十進制需求剛剛被分配Numeric_Type = Unicode 6.3.0以來的數字。例如，在Unicode 7.0中引入的（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）具有Numeric_Type = Numeric。

Numeric_Type =數字表示所有代表數字且不符合其他類別的字符，而Numeric_Type = None表示不代表數字的字符（或者至少不在正常使用情況下）。

所有帶有非無Numeric_Type屬性的字符都有一個Numeric_Value屬性來表示它們的數值。 unicodedata.digit將返回該值作爲Numeric_Type = Decimal或Numeric_Type = Digit字符的整數，而unicodedata.numeric將返回該值作爲具有任何非None Numeric_Type的字符的浮點值。

來源

2017-08-28 17:11:18 user2357112

完美的解釋和例子！ –

unicodedata.digit和unicodedata.numeric有什麼區別？

回答

相關問題