您好我有在python壓縮任務來開發代碼,其中如果輸入是在python分配數字在字符串
'hello its me, hello can you hear me, hello are you listening'
然後輸出應該是
1,2,3,1,4,5,6,3,1,7,5,8
基本上每個單詞都被分配一個數值,如果該單詞重複,那麼單詞也是如此。 這個代碼是在Python中,請幫助我,謝謝
您好我有在python壓縮任務來開發代碼,其中如果輸入是在python分配數字在字符串
'hello its me, hello can you hear me, hello are you listening'
然後輸出應該是
1,2,3,1,4,5,6,3,1,7,5,8
基本上每個單詞都被分配一個數值,如果該單詞重複,那麼單詞也是如此。 這個代碼是在Python中,請幫助我,謝謝
一個簡單的方法是使用一個字典,當你發現一個新的單詞添加一個鍵/值配對使用增量變量,當你看到前的單詞只是從字典打印值:
s = 'hello its me, hello can you hear me, hello are you listening'
def cyc(s):
# set i to 1
i = 1
# split into words on whitespace
it = s.split()
# create first key/value pair
seen = {it[0]: i}
# yield 1 for first word
yield i
# for all var the first word
for word in it[1:]:
# if we have seen this word already, use it's value from our dict
if word in seen:
yield seen[word]
# else first time seeing it so increment count
# and create new k/v pairing
else:
i += 1
yield i
seen[word] = i
print(list(cyc(s)))
輸出:
[1, 2, 3, 1, 4, 5, 6, 3, 1, 7, 5, 8]
您還可以避免切片用iter
並調用next
彈出的第一個字,如果你還想讓foo == foo!
我們需要從哪個凸輪與str.rstrip做字符串中刪除任何標點:
from string import punctuation
def cyc(s):
i = 1
it = iter(s.split())
seen = {next(it).rstrip(punctuation): i}
yield i
for word in it:
word = word.rstrip(punctuation)
if word in seen:
yield seen[word]
else:
i += 1
yield i
seen[word] = i
如何建立一個與dict
項目:索引映射:
>>> s
'hello its me, hello can you hear me, hello are you listening'
>>>
>>> l = s.split()
>>> d = {}
>>> i = 1
>>> for x in l:
if x not in d:
d[x]=i
i += 1
>>> d
{'its': 2, 'listening': 8, 'hear': 6, 'hello': 1, 'are': 7, 'you': 5, 'me,': 3, 'can': 4}
>>> for x in l:
print(x, d[x])
hello 1
its 2
me, 3
hello 1
can 4
you 5
hear 6
me, 3
hello 1
are 7
you 5
listening 8
>>>
如果你不」不想在你的分裂列表中的任何標點,那麼你可以做:
>>> import re
>>> l = re.split(r'(?:,|\s)\s*', s)
>>> l
['hello', 'its', 'me', 'hello', 'can', 'you', 'hear', 'me', 'hello', 'are', 'you', 'listening']
import re
from collections import OrderedDict
text = 'hello its me, hello can you hear me, hello are you listening'
words = re.sub("[^\w]", " ", text).split()
uniq_words = list(OrderedDict.fromkeys(words))
res = [uniq_words.index(w) + 1 for w in words]
print(res) # [1, 2, 3, 1, 4, 5, 6, 3, 1, 7, 5, 8]
你嘗試過什麼嗎? StackOverflow不是一個代碼寫入服務。 –