關於字符串查找的困惑？

我有我想要搜索的數據列表。這個新的數據列表的結構就像這樣。關於字符串查找的困惑？

的姓名，地址DOB家庭成員的年齡身高等。

我希望通過數據的線，所以我停在該名稱後顯示優化搜索的搜索「」搜索。我相信我想要使用這個命令：

str.find(sub[, start[, end]])

雖然我在編寫代碼時遇到了麻煩。有關如何讓字符串爲我找到工作的任何提示？

下面是一些樣本數據：

Bennet, John, 17054099","5","156323558","-","0", 714 // 
Menendez, Juan,7730126","5","158662525" 11844 // 
Brown, Jamal,"9","22966592","+","0",,"4432 //

的想法是我希望我的程序只搜索到第一「」並通過大線的其餘部分不進行搜索。

編輯。所以這是我的代碼。

我想要搜索completedataset中的行直到第一個逗號。我仍然對如何將這些建議落實到現有的代碼中感到困惑。

counter = 1 
for line in completedataset: 
    print counter 
    counter +=1 
    for t in matchedLines: 
     if t in line: 
      smallerdataset.write(line)

來源

2010-08-04 Robert A. Fettikowski

你可以舉一個你的數據列表的例子嗎？ – 2010-08-04 15:09:17

這些'//'是什麼？新行？ – kennytm 2010-08-04 15:16:41

如果我正確理解您的規格，

for thestring in listdata: 
    firstcomma = thestring.find(',') 
    havename = thestring.find(name, 0, firstcomma) 
    if havename >= 0: 
     print "found name:", thestring[:firstcomma]

編輯：給定Q的OP的編輯，這將成爲類似：

counter = 1 
for line in completedataset: 
    print counter 
    counter += 1 
    firstcomma = thestring.find(',') 
    havename = thestring.find(t, 0, firstcomma) 
    if havename >= 0: 
     smallerdataset.write(line)

當然，使用counter是unPythonically低的水平，和更好的當量是

for counter, line in enumerate(completedataset): 
    print counter + 1 
    firstcomma = thestring.find(',') 
    havename = thestring.find(t, 0, firstcomma) 
    if havename >= 0: 
     smallerdataset.write(line)

但這並不影響問題。

來源

2010-08-04 15:21:13

那麼，我怎樣才能將這個代碼集成到我已經擁有的代碼中呢......查看我編輯過的帖子。謝謝。 – 2010-08-04 15:53:31

你會在每行可能進行搜索，所以你可以通過將它們分割「」然後做的第一元件上的搜索：

for line in file: 
    name=line.split(', ')[0] 
    if name.find('smth'): 
     break

來源

2010-08-04 15:18:46

爲什麼smth？我有多條線路和多個我想要搜索的名稱。 – 2010-08-04 15:39:25

您可以很直接地做到這一點：

s = 'Bennet, John, 17054099","5","156323558","-","0", 714 //' 
print s.find('John', 0, s.index(',')) # find the index of ',' and stop there

來源

2010-08-04 15:19:40

任何你必須使用find的原因？爲什麼不這樣做：

if str.split(",", 1)[0] == search_string: 
    ...

編輯： 想我要指出的 - 我只是測試這和split方法看起來一樣快（如果不是發現不是更快）。使用timeit模塊測試兩種方法的性能，並查看您得到的結果。

嘗試：

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.find('Bennet', 0, a.find(','))"

使名稱較長（如"BennetBennetBennetBennetBennetBennet"），你知道，發現遭受超過分裂

注：上午

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.split(',',1)[0] == 'Bennet'"

然後用比較使用split與maxsplit選項

來源

2010-08-04 15:22:01 domino

OP表示他不想處理冗長的問題;使用拆分將檢查整條線，並建立所有領域的數組，當他明確要通過不處理整條線進行優化時。 – 2010-08-04 15:24:09

是的，這正是我不想做的。 – 2010-08-04 15:29:15

如果你正在檢查每行的很多名字，看起來最大的優化可能只是處理逗號的每一行！

for line in completedataset: 
    i = line.index(',') 
    first_field = line[:i] 
    for name in matchedNames: 
     if name in first_field: 
      smalldataset.append(name)

來源

2010-08-05 23:59:22 fholo

關於字符串查找的困惑？

回答

相關問題