我如何通過字符線

處理字符有對2號線和可變稱爲分詞序列，這給了我一個老位置值的文件。我試圖找到新的位置。例如標記者此行給我12位，這爲E僅12之前計數的信件所以我需要通過計算破折號找出新的位置...我如何通過字符線

--------------- LL --- NE - HVKTHTEEK --- PF-ICTVCR-KS ----------

這是我到目前爲止它仍然不起作用。

with open(filename) as f: 
    countletter = 0 
    countdash = 0 
    for line, line2 in itertools.izip_longest(f, f, fillvalue=''): 
     tokenizer=line.split()[4] 
     print tokenizer 

     for i,character in enumerate(line2): 

      for countletter <= tokenizer: 

       if character != '-': 
        countletter += 1 
       if character == '-': 
        countdash +=1

我的新職位應該是32這個例子

來源

2012-07-30 Chad D

是什麼'爲countletter <=標記者：'是什麼意思？ – 2012-07-30 20:41:52

任何你不能迭代字符串的原因？ '對於line2中的c' – Wug 2012-07-30 20:44:25

@GregHewgill縮進是錯誤的。但計數字母<= tokenizer ..我試圖讓程序計算破折號，一旦程序計數字母，並達到12 ..它應該停下來，告訴我它有多少破折號..但現在我有語法錯誤在<=我不知道爲什麼 – 2012-07-30 20:46:02

第一個答案，乍得d編輯，使其1索引（但不正確的）：

def get_new_index(string, char_index): 
    chars = 0 
    for i, char in enumerate(string): 
     if char != '-': 
      chars += 1 
     if char_index == chars: 
      return i+1

重寫版本：

import re 

def get(st, char_index): 
    chars = -1 
    for i, char in enumerate(st): 
     if char != '-': 
      chars += 1 
     if char_index == chars: 
      return i 

def test(): 
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------' 
    initial = re.sub('-', '', st) 
    for i, char in enumerate(initial): 
     print i, char, st[get_1_indexed(st, i)] 

def get_1_indexed(st, char_index): 
    return 1 + get(st, char_index - 1) 

def test_1_indexed(): 
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------' 
    initial = re.sub('-', '', st) 
    for i, char in enumerate(initial): 
     print i+1, char, st[get_1_indexed(st, i + 1) - 1]

來源

2012-07-30 21:16:52

非常感謝:) – 2012-07-30 22:04:01

該函數使用OP提供的字符串爲輸入0,2,4,13,15和21返回一個破折號字符的索引。使用print（字符串[get_new_index（string，char_index）]）'來驗證。此外，您不應該使用字符串作爲變量名稱，因爲它是內置模塊的名稱。 – Wug 2012-07-31 14:50:21

此外，如果您請求以字母開頭的字符串中的第零項，此功能將失敗。 – Wug 2012-07-31 15:30:21

我的原始文本看起來像這樣，我感興趣的位置是12這是'E'

實際上，它是K，假設您使用零索引字符串。 Python使用零索引，所以除非你跳過一個索引的東西（而你不是），否則它會給你K.如果你遇到問題，試着解決這個問題。

下面是一些對你的代碼做什麼，你需要它（儘管有0索引，而不是1分度）。這可以在網上here發現：

def get_new_index(oldindex, str): 
    newindex = 0 

    for c in str: 
     if c != '-': 
      if oldindex == 0: 
       return newindex 
      oldindex -= 1 
     newindex += 1 

    return 1/0 # throw a shitfit if we don't find the index

來源

2012-07-30 21:20:50 Wug

你能告訴我你的代碼背後有什麼想法。它的工作原理，但我失去了從重新調整索引直到返回1/0 – 2012-07-30 21:47:48

而不是保留我們發現的字母的計數，並計數直到計數與我們正在尋找的數字相匹配，我們要查找的數字被計數直到達到零，此時我們找到了正確的字母數。 newindex變量跟蹤我們找到的字符的總數（包括破折號）。 – Wug 2012-07-30 22:13:03

我在代碼中發現了一個錯誤..它不適用於索引，並且該索引具有特定索引的字母 – 2012-07-31 05:25:40

這是一個愚蠢的方式來獲得第二線，這將是更清楚使用islice，或next(f)

for line, line2 in itertools.izip_longest(f, f, fillvalue=''):

這裏count_letter似乎是一個int而tokenizer是str。可能不是你所期望的。

for countletter <= tokenizer:

這也是一個語法錯誤，所以我覺得這是不是你正在運行

也許你的代碼應該有

tokenizer = int(line.split()[4])

使tokenizer爲int

print tokenizer可能會產生誤導，因爲int和str看起來相同，所以你看看你è xpect看。當您正在調試時，請嘗試print repr(tokenizer)。

，一旦你確定標記生成器是一個int，你可以改變這一行

for i,character in enumerate(line2[:tokenizer]):

來源

2012-07-30 21:46:04

對於>> line，line2，itertools.izip_longest（f，f，fillvalue ='' ）：感謝您的建議 – 2012-07-30 21:53:42

@ChadD，'line = next（f）; line2 = next（f）'兩行以上 – 2012-07-30 21:55:18

我如何通過字符線

回答

相關問題