讀取非統一行ascii數據 - Python

我試圖讀取非統一行ascii數據，例如：讀取非統一行ascii數據 - Python

4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061 
    73540.4807 10 
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418 
    16665.3857 10 
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518 
-70784.7024 32640.6307 10 
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003 
-126522.859 10 
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234 
-40205.3438 10 
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848 
    47260.338 10 
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788 
    97184.4856 10 
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 
-13991.5163 10 
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738 
-14052.1145 10 
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726 
-53145.1259 36948.4058 10

問題是我需要將每兩行讀入一行。我正在嘗試使用pandas.read_csv或numpy.genfromtxt，但他們閱讀並分離成獨立的行。我試圖合併每兩行沒有成功，因爲，你怎麼看，有時我有一個分隔在7列和2列的行，在6列和3列somentimes。共有9列可供閱讀。

來源

2017-07-24 nandhos

像這樣的東西應該工作。

把你的數據放在一個字符串或文檔中，並用python進行處理。然後，當你有你想要的數據時，你可以使用熊貓。

string1 = '''4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061 
    73540.4807 10 
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418 
    16665.3857 10 
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518 
-70784.7024 32640.6307 10 
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003 
-126522.859 10 
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234 
-40205.3438 10 
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848 
    47260.338 10 
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788 
    97184.4856 10 
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 
-13991.5163 10 
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738 
-14052.1145 10 
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726 
-53145.1259 36948.4058 10''' 

splitted = string1.splitlines() 
result = "" 
for index,item in enumerate(splitted): 
    if index % 2 != 0: 
    result += item+ "\n" 
    else: 
     result += item 
print(result) 

4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061 73540.4807 10 
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418 16665.3857 10 
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518 -70784.7024 32640.6307 10 
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003 -126522.859 10 
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234 -40205.3438 10 
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848 47260.338 10 
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788 97184.4856 10 
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 -13991.5163 10 
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738 -14052.1145 10

或者，如果你從文件中讀取數據：

data = open('/path/original.txt', 'r') 
string1 = data.read() 
splitted = string1.splitlines() 
result = "" 
for index,item in enumerate(splitted): 
    if index % 2 != 0: 
    result += item+ "\n" 
    else: 
    result += item 
new_data = open('/path/new_data.txt','w') 
new_data.write(result)

來源

2017-07-24 16:23:57

謝謝，我只在代碼中添加了以下內容。爲了讀取字符串，我使用了data = open（'/ path/original.txt，'r'），然後string1 = data.read（）。在運行所有代碼之後，我需要保存字符串重新格式化，所以我寫了一個新文件，如new_data = open（'/ path/new_data.txt'，'w'），然後new_data.write（result）。之後，我用熊貓讀它！也許你可以在你的答案中加入更詳細的內容。再次感謝。 – nandhos

幹了！我剛添加了上一版中錯過的引號 – nandhos

如果我，我想這樣做，在這種方式：

import re 
with open('data.txt') as f: 
    s = f.read().strip() 
L = [float(i) for i in re.split(r'\s+', s)] 
LL = [L[i:i+9] for i in range(0, len(L), 9)] 
print(LL)

[4.0，0.0790926412 - 0.199457773,0.325952223,0.924105917,48915.3072，-2086.17061,73540.4807,10.0]，[4.0,0.0245689377，-0.805261448，-0.152373497,0.573006386，-39801.696,49084.2418,16665.3857,10.0]，[4.0,0.0427767979，-0.0185129676，-0.143135691， -0.989529911，38770.6518， - [4.0,0.0262684678,0.1137741，-0.820259709，-0.555158921,25293.3918，-51148.4003，-126522.859,10.0]，[4.0,0.1145932295,0.466618154，-0.00805648931，-0.88442218,90951.8483,19221.4234，-40205.3438 ，10.0]，[4.0,0.0907820906,0.584060054，-0.671576188,0.455915866，-78193.2124，-31269.5848,47260.338,10.0]，[4.0,0.0794897928,0.654042761,0.537625452,0.532153117,24643.9195,39614.3788,97184.4856,10.0]，[4.0， 0.0896920622，-0.517384933，-0.609729743，-0.600451889，-17455.9074，-17601.0439，-13991.5163,10.0]，[4.0,0.0295554749，-0.53757783，-0.3710939,0.757165368，20106.124，-171013.738，-14052.1145,10.0]，[4.0， 0.0189505245，-0.773354757，-0.0747623556，-0.629549847，-71468.2726，-53145.1259,36948.4058,10.0]]

來源

2017-07-24 16:43:24 williezh

或者像這樣，因爲你知道每個案例有兩行。

每次通過循環讀取兩行輸入。當第一行爲空時，這意味着輸入文件中沒有更多的行可用。每次讀取一對行時，首先丟棄從第一行開始的行。

熊貓可以讀取使用空格代替逗號的'csv'文件。

>>> import pandas as pd 
>>> with open('temp.txt') as input, open('temp.csv', 'w') as the_csv: 
...  while True: 
...   first = input.readline() 
...   if not first: 
...    break 
...   second = input.readline() 
...   r = the_csv.write(first.strip()+second) 
... 
>>> df = pd.read_csv('temp.csv', sep='\s+') 
>>> df 
    4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 \ 
0 4  0.024569  -0.805261 -0.152373  0.573006 -39801.6960 
1 4  0.042777  -0.018513 -0.143136 -0.989530 38770.6518 
2 4  0.026268  0.137741 -0.820260 -0.555159 25293.3918 
3 4  0.145932  0.466618 -0.008056 -0.884422 90951.8483 
4 4  0.090782  0.584060 -0.671576  0.455916 -78193.2124 
5 4  0.079490  0.654043  0.537625  0.532153 24643.9195 
6 4  0.089692  -0.517385 -0.609730 -0.600452 -17455.9074 
7 4  0.029555  -0.537578 -0.371094  0.757165 20106.1240 
8 4  0.018951  -0.773355 -0.074762 -0.629550 -71468.2726 

    -2086.17061 73540.4807 10 
0 49084.2418 16665.3857 10 
1 -70784.7024 32640.6307 10 
2 -51148.4003 -126522.8590 10 
3 19221.4234 -40205.3438 10 
4 -31269.5848 47260.3380 10 
5 39614.3788 97184.4856 10 
6 -17601.0439 -13991.5163 10 
7 -171013.7380 -14052.1145 10 
8 -53145.1259 36948.4058 10

來源

2017-07-24 17:04:52

讀取非統一行ascii數據 - Python

回答

相關問題