2015-10-23 31 views
5

對於自項目,我想這樣做:試圖複製Python的字符串實習的功能對於非字符串

class Species(object): # immutable. 
    def __init__(self, id): 
     # ... (using id to obtain height and other data from file) 
    def height(self): 
     # ... 

class Animal(object): # mutable. 

    def __init__(self, nickname, species_id): 
     self.nickname = nickname 
     self.species = Species(id) 
    def height(self): 
     return self.species.height() 

正如你所看到的,我並不真正需要的多個實例物種(ID)每個ID,但我會創建一個每次我創建一個動物對象與該ID,我可能需要多個呼叫,如Animal(somename, 3)

爲了解決這個問題,我想要做的就是讓一個類,以便爲它的兩個實例,讓我們假設A和B,以下是總是正確的:

(a == b) == (a is b) 

這事Python使用字符串文字,並且被稱爲實習。例如:

a = "hello" 
b = "hello" 
print(a is b) 

打印將產生真實(只要字符串足夠短,如果我們直接使用python shell)。

我只能猜測CPython如何做到這一點(它可能涉及一些C魔術),所以我正在做我自己的版本。到目前爲止,我已經有了:

class MyClass(object): 

    myHash = {} # This replicates the intern pool. 

    def __new__(cls, n): # The default new method returns a new instance 
     if n in MyClass.myHash: 
      return MyClass.myHash[n] 

     self = super(MyClass, cls).__new__(cls) 
     self.__init(n) 
     MyClass.myHash[n] = self 

     return self 

    # as pointed out on an answer, it's better to avoid initializating the instance 
    # with __init__, as that one's called even when returning an old instance. 
    def __init(self, n): 
     self.n = n 

a = MyClass(2) 
b = MyClass(2) 

print a is b # <<< True 

我的問題是:

一)是我的問題,甚至值得解決?因爲我的目標物種物體應該是相當輕的重量和動物可以稱爲最大次數,而非常有限(想象一個口袋妖怪遊戲:不超過1000個實例,上衣)

b)如果是,這是一個有效的方法來解決我的問題? c)如果無效,請您詳細說明一個更簡單/更清潔/更Pythonic的方式來解決這個問題嗎?

回答

1

是的,實現返回緩存對象的__new__方法是創建有限數量實例的適當方式。如果您不希望創建大量實例,則可以實施__eq__並按值而不是身份進行比較,但這樣做並不會傷害。

請注意,不可變對象通常應該在__new__而不是__init__中進行所有初始化,因爲後者是在創建對象後調用的。此外,將從__new__返回的類的任何實例上調用__init__,因此在進行緩存時,每次返回緩存對象時都會再次調用它。

此外,第一個參數是__new__類對象不是一個實例,所以你應該命名cls而不是self(你可以用self代替instance後面的方法,如果你想,雖然!)。

0

爲了使這個儘可能通用,我要推薦一些東西。如果你想要「真正的」不變性(通常人們對此非常不滿意,但是當你正在進行實習時,打破不可變的不變性會導致更大的問題),一種方法繼承自namedtuple。其次,使用鎖定來允許線程安全行爲。

因爲這是相當複雜的,我要提供的Species代碼修改後的副本與註釋解釋它:

import collections 
import operator 
import threading 

# Inheriting from a namedtuple is a convenient way to get immutability 
class Species(collections.namedtuple('SpeciesBase', 'species_id height ...')): 
    __slots__ =() # Prevent creation of arbitrary values on instances; true immutability of declared values from namedtuple makes true immutable instances 

    # Lock and cache, with underscore prefixes to indicate they're internal details 
    _cache_lock = threading.Lock() 
    _cache = {} 

    def __new__(cls, species_id): # Switching to canonical name cls for class type 
     # Do quick fail fast check that ID is in fact an int/long 
     # If it's int-like, this will force conversion to true int/long 
     # and minimize risk of incompatible hash/equality checks in dict 
     # lookup 
     # I suspect that in CPython, this would actually remove the need 
     # for the _cache_lock due to the GIL protecting you at the 
     # critical stages (because no byte code is executing comparing 
     # or hashing built-in int/long types), but the lock is a good idea 
     # for correctness (avoiding reliance on implementation details) 
     # and should cost little 
     species_id = operator.index(species_id) 

     # Lock when checking/mutating cache to make it thread safe 
     try: 
      with cls._cache_lock: 
       return cls._cache[species_id] 
     except KeyError: 
      pass 

     # Read in data here; not done under lock on assumption this might 
     # be expensive and other Species (that already exist) might be 
     # created/retrieved from cache during this time 
     species_id = ... 
     height = ... 
     # Pass all the values read to the superclass (the namedtuple base) 
     # constructor (which will set them and leave them immutable thereafter) 
     self = super(Species, cls).__new__(cls, species_id, height, ...) 

     with cls._cache_lock: 
      # If someone tried to create the same species and raced 
      # ahead of us, use their version, not ours to ensure uniqueness 
      # If no one raced us, this will put our new object in the cache 
      self = cls._cache.setdefault(species_id, self) 
     return self 

如果你想要做的實習一般圖書館(其中用戶可能有螺紋,你不能相信他們不會破壞不變性),像上面這樣的東西是一個基本的結構。它速度很快,儘量減少攤位的機會,即使建築重量較重(如果許多線程試圖一次構建第一次,可能會不止一次地重建物體並丟棄除一個副本以外的所有副本)等。

當然,如果建設是便宜和實例都較小,那麼只寫一個__eq__(也可能是__hash__如果它在邏輯上是不變的),並用它做,

+0

你可以看到一個非常類似的代碼排序在Python實現的各種['functools.lru_cache'的封裝](https://hg.python.org/cpython/file/3.5/Lib/functools.py#l453)。碰巧,當完成的工作是原子的時候,他們不會鎖定「原子」操作,這支持了我上面的評論,但他們使用相同的「嘗試從鎖定緩存中獲取,如果失敗,釋放鎖定,執行昂貴工作,重新獲取鎖定並更新緩存(如果沒有競爭)「我上面用於緩存大小的模式是有限的(並且操作組必須以原子方式執行)。 – ShadowRanger