2017-08-24 62 views
1

我在使用Python 3查找大型二進制數集中最長連續數字集的開始點和結束點。目前我已經找到了1和0的最長連續數,現在我必須找到每個數的起點和終點。到目前爲止,我的代碼是:在Python中查找大型二進制集中最長連續數字集的開始點和結束點

爲1的:

def getMaxSegmentLength(readable): 
    current_length = 0 
    max_length = 0 


    for x in readable: 
     if x == '1': 
      current_length += 1 
     else: 
      max_length = max(max_length, current_length) 
      current_length = 0 

     return max(max_length, current_length) 


def main(): 
    with open('C:/01.txt', 'r') as inputf: 
     s = inputf.read() 
     n = getMaxSegmentLength(s) 
    print("The longest streak of 1's = " + str(n)) 


if __name__ == '__main__': 
    main() 

爲0的:

def getMaxSegmentLength(readable): 
    current_length = 0 
    max_length = 0 


    for x in readable: 
     if x == '0': 
      current_length += 1 
     else: 
      max_length = max(max_length, current_length) 
      current_length = 0 

     return max(max_length, current_length) 


def main(): 
    with open('C:/01.txt', 'r') as inputf: 
     s = inputf.read() 
     m = getMaxSegmentLength(s) 
    print("The longest streak of 0's = " + str(m)) 


if __name__ == '__main__': 
    main() 

這個代碼是找到最長的連續組數字,其中包含在一個非常大的二進制集單獨的文件。我也知道總共有多少個0和1,並且我還沒有開始下一步查找起點和終點。任何幫助非常感謝,因爲我是Python 3的新手。

+1

我想你需要[枚舉] (https://docs.python.org/2.3/whatsnew/section-enumerate.html)。 –

回答

0

簡單,跟蹤1開始的​​條紋和變量(max_streak)以保持最大連貫的起點。每次發現更大的連勝更新max_streak。

def getMaxSegmentLength(readable, digit): 
'''find the longest streak of digit in the readable string''' 
    current_length = 0 
    max_length = 0 

    starts_at= -1 
    max_starts_at= -1 

    for i, x in enumerate(readable): 
     if x == digit: 
      current_length += 1 
      if current_length == 1: 
       starts_at = i 

     elif max_length < current_length: 
      max_length = current_length 
      max_starts_at = starts_at 
      current_length = 0 

    if max_length < current_length: 
     max_length = current_length 
     max_starts_at = starts_at 

    max_ends_at = max_starts_at+max_length-1 

    # return a tuple of start point and end point index 
    return max_starts_at, max_ends_at 


def main(): 
    with open('F:/input.txt', 'r') as inputf: 
     s = inputf.read() 

     # check for 1's 
     n = getMaxSegmentLength(s, '1') 
     print("The longest streak of 1's = " + str(n)) 

     # check for 0's 
     n = getMaxSegmentLength(s, '0') 
     print("The longest streak of 0's = " + str(n)) 

if __name__ == '__main__': 
    main() 
0

你可以使用正則表達式每個序列匹配,然後更新相應的數字的字典:

import re 

# example input string 
input = "00111101100010100010101111011011011" 

best = { 
    "0": { "start": 0, "len": 0 }, 
    "1": { "start": 0, "len": 0 } 
}; 
for m in re.compile(r"(.)\1*").finditer(input): 
    if best[m.group()[0]]["len"] < len(m.group()): 
     best[m.group()[0]] = { "start": m.start(), "len": len(m.group()) } 

print (best) 

輸出:

{'1': {'start': 2, 'len': 4}, '0': {'start': 9, 'len': 3}} 
相關問題