將字符串中的每個字母加倍

Python中將字符串中的每個字母加倍（或重複n次）的最有效方法是什麼？將字符串中的每個字母加倍

"abcd" -> "aabbccdd"

或

"abcd" -> "aaaabbbbccccdddd"

我有一個很長的字符串，需要以這種方式進行突變和當前解決方案涉及n級聯的每一封信，我想可以更有效的循環。

來源

2013-06-03 Tim

你是否真的需要最高效的？如果一種方法需要309ns而另一種需要316ns，那麼即使它的可讀性較差，靈活性較差等，你是否會選擇第一種方法？ – abarnert

不，我不需要它是最高效的，但以前的實現是超詳細的，我知道必須有更「Python」的方式來做到這一點，這也會更快。 – Tim

這就是我的觀點。通常，簡潔/可讀/靈活/ pythonic比效率要重要得多。從你的評論來看，你的情況聽起來確實如此。如果這不是你想要的，不要求「最有效率」。 – abarnert

既然你特別問了一下效率：

# drewk's answer, optimized by using from_iterable instead of * 
def double_chain(s): 
    return ''.join(chain.from_iterable(zip(s, s))) 

# Ashwini Chaudhary's answer 
def double_mult(s): 
    return ''.join([x*2 for x in s]) 

# Jon Clements' answer, but optimized to take the re.compile and *2 out of the loop. 
r = re.compile('(.)') 
def double_re(s): 
    return r.sub(r'\1\1', s)

現在：

In [499]: %timeit double_chain('abcd') 
1000000 loops, best of 3: 1.99 us per loop 
In [500]: %timeit double_mult('abcd') 
1000000 loops, best of 3: 1.25 us per loop 
In [501]: %timeit double_re('abcd') 
10000 loops, best of 3: 22.2 us per loop

所以，itertools方法比簡單的方法要慢60％左右，並使用正則表達式是不是一個數量級以上數量級仍然較慢。

但是這樣一個微小的字符串可能不適用於長字符串有代表性，所以：

In [504]: %timeit double_chain('abcd' * 10000) 
100 loops, best of 3: 4.92 ms per loop 
In [505]: %timeit double_mult('abcd' * 10000) 
100 loops, best of 3: 5.57 ms per loop 
In [506]: %timeit double_re('abcd' * 10000) 
10 loops, best of 3: 91.5 ms per loop

正如所料，itertools方法得到更好的（現在拍的簡單方法），和正則表達式變得更糟隨着字符串變長。

因此，沒有一個「最有效」的方式。如果你將數十億個小絃線加倍，Ashwini的答案是最好的。如果你將數百萬個大字符串或數千個大字符串翻倍，那麼drewk就是最好的選擇。如果你不這樣做......沒有理由首先優化它。

此外，通常的警告：此測試是在我的Mac上64位CPython 3.3.0空載;對於您的Python實現，版本和平臺，在您的應用程序中使用您的真實數據，不會有任何保證。使用32位2.6進行的快速測試顯示了類似的結果，但如果它很重要，則需要自己進行更真實和相關的測試。

來源

2013-06-04 00:18:47 abarnert

很好的答案。 +1 – dawg

使用str.join：

>>> strs = "abcd" 
>>> "".join([x*2 for x in strs]) 
'aabbccdd' 
>>> "".join([x*4 for x in strs]) 
'aaaabbbbccccdddd'

從docs：

s = "" 
for substring in list: 
    s += substring

使用s = "".join(list)代替。前者在建造大型絃樂時是一個非常普遍的災難性錯誤。

來源

2013-06-03 23:46:35

爲什麼要創建一箇中間列表？ – arshajii

@arshajii http://stackoverflow.com/questions/9060653/list-comprehension-without-python/9061024#9061024 –

哇，這是我的新聞 - 很高興知道。 – arshajii

我會去爲str.join，所以我會提供了一個替代的re選項：

>>> s = "abcd" 
>>> import re 
>>> re.sub('(.)', r'\1' * 2, s) 
'aabbccdd'

來源

2013-06-03 23:49:25

您可以使用加入，izip和鏈條：

>>> st='abcd' 
>>> from itertools import chain,izip 
>>> ''.join(chain(*izip(st,st))) 
'aabbccdd'

雖然比列表理解更不可讀，優點是沒有中間列表; izip和chain產生迭代器。

來源

2013-06-04 00:07:47 dawg

還有一箇中間列表，因爲如果你通過'join'這個不是序列的東西，它會列出一個列表。請參閱Ashwini Chaudhary對他自己的回答的評論。 – abarnert

另外，'chain.from_iterable（it）'通常被認爲比'chain（* it）'更可讀（否則它不會被添加到'itertools'中），而且它也稍快一些。 – abarnert

但+1，因爲這是考慮問題的好方法（特別是如果將它擴展到處理n> 2），並且它也恰好是非平凡字符串中最快的方式。 – abarnert

每當問題是：「將字符串的每個字符映射到其他東西的最有效方法是什麼」原來是str.translate是最好的選擇......足夠大的字符串：

def double_translate(s): 
    return s.translate({ord(x):2*x for x in set(s)})

時序對其他答案：

In [5]: %timeit double_chain('abcd') 
The slowest run took 11.03 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 992 ns per loop 

In [6]: %timeit double_chain('mult') 
The slowest run took 13.61 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1 µs per loop 

In [7]: %timeit double_mult('abcd') 
The slowest run took 7.59 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 869 ns per loop 

In [8]: %timeit double_re('abcd') 
The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 9.4 µs per loop 

In [9]: %timeit double_translate('abcd') 
The slowest run took 5.80 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.78 µs per loop 

In [10]: %%timeit t='abcd'*5000 
    ...: double_chain(t) 
    ...: 
1000 loops, best of 3: 1.66 ms per loop 

In [11]: %%timeit t='abcd'*5000 
    ...: double_mult(t) 
    ...: 
100 loops, best of 3: 2.35 ms per loop 

In [12]: %%timeit t='abcd'*5000 
    ...: double_re(t) 
    ...: 
10 loops, best of 3: 30 ms per loop 

In [13]: %%timeit t='abcd'*5000 
    ...: double_translate(t) 
    ...: 
1000 loops, best of 3: 1.03 ms per loop

不過請注意，此解決方案具有額外的好處，在某些情況下，你可能會避免重新構建表要傳遞給translate ，例如：

def double_translate_opt(s, table=None): 
    if table is None: 
     table = {ord(x):2*x for x in set(s)} 
    return s.translate(table)

這將避免一些開銷，使得它更快：

In [19]: %%timeit t='abcd'; table={ord(x):2*x for x in t} 
    ...: double_translate_opt(t, table) 
    ...: 
The slowest run took 17.59 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 452 ns per loop

正如你可以看到的小字符串，它是當前答案的兩倍，只要你避免每次構建表。對於長文本，建立表格的成本以翻譯速度進行償還（並且在這些情況下使用set是值得的，以避免多次撥打ord）。

來源

2015-09-28 20:11:29 Bakuriu

將字符串中的每個字母加倍

回答

相關問題