2014-07-23 69 views
1

例如,我想使用python從文本文件中取出前30%的數據。從txt文件中獲取一定數量的數據

這裏有一些代碼我都試圖產生兩個新的文件,但我不知道如何利用數據的一定比例和他們寫

這裏是:

import sys 

def NewFile(FileName): 
     open(FileName,'r') 
     print('Creating new text file') 
     #A for 70% training data 
     AFileName = 'A'+FileName 
     #B for 30% testing data 
     BFileName = 'B'+FileName 

     try: 
       afile = open(AFileName,'a') 
       bfile = open(BFileName,'a') 
       afile.close() 
       bfile.close() 
     except: 
       print('~~~~~~') 
       sys.exist(0) 
+0

percentae意味着如果有100行做你想要30線 –

回答

0

聽起來像是你想沿着這些線路的東西,在這裏filename你從閱讀和proportion該文件是你想要的百分比在第一個文件:

def split_file(filename, tofile, othertofile, proportion): 
    content = open(filename).readlines() 
    number_of_lines = len(content) 

    # Split content. 
    first_portion = "\n".join(content[:number_of_lines * proportion]) 
    second_portion = "\n".join(content[number_of_lines * proportion:]) 

    # Write to files. 
    open(tofile, "w").write(first_portion) 
    open(othertofile, "w").write(second_portion) 
+0

這些行末尾已經有換行符。拆分索引應該是一個'int'以便在切片中使用它。文件應該在完成後關閉。沒有保證何時會發生這種情況。 – BlackJack

+0

非常感謝,它效果很好!我也發現需要在計算之前添加int))) – user3843433

0

這將工作

import sys 

def NewFile(FileName): 
    lines = open(FileName, 'r').readlines() 
    print('Creating new text file') 
    num_lines = len(lines) 
    num_lines_a = 0.7 * num_lines 
    #A for 70% training data 
    AFileName = 'A'+FileName 
    #B for 30% testing data 
    BFileName = 'B'+FileName 

    try: 
     afile = open(AFileName,'a') 
     bfile = open(BFileName,'a') 
     a = 1 
     for line in lines: 
      if a <= num_lines_a: 
       afile.write(line) 
       a +=1 
      else: 
       bfile.write(line) 
     afile.close() 
     bfile.close() 
    except: 
      print('~~~~~~') 
      sys.exist(0) 
+0

可以使用枚舉? –

0

另一種方式來做到這一點:

from itertools import islice 


def new_file(old_filename): 
    with open(old_filename, 'r') as old_file: 
     lines = list(old_file) 
    training_file_line_count = int(len(lines) * 0.7) 
    lines_iter = iter(lines) 
    with open('A' + old_filename, 'w') as training_file: 
     training_file.writelines(islice(lines_iter, training_file_line_count)) 
    with open('B' + old_filename, 'w') as testing_data_file: 
     testing_data_file.writelines(lines_iter)