將字符串列表轉換爲Numpy數組（Python）

所以我試圖從文本文件中提取一些數據。目前我能得到包含數據，這又使我的輸出看起來像這樣的正確的路線：將字符串列表轉換爲Numpy數組（Python）

[ 0.2  0.148 100. ] 
[ 0.3  0.222 100. ] 
[ 0.4  0.296 100. ] 
[ 0.5  0.37 100. ] 
[ 0.6  0.444 100. ]

所以基本上我有5名名單在每一個字符串。然而，正如你可以想象的，我想把所有這些都分解成一個numpy數組，每個字符串分成3個值。就像這樣：

[[0.2, 0.148, 100], 
[0.3, 0.222, 100], 
[0.4, 0.296, 100], 
[0.5, 0.37, 100], 
[0.6, 0.444, 100]]

但是由於是在輸出的分離器是隨機的，即我不知道這是否是3位，5個空格或製表符，我是那種在如何做到這一點失去了。

UPDATE：

所以數據看起來有點像這樣：

data_file = 

Equiv. Sphere Diam. [cm]: 6.9 
Conformity Index: N/A 
Gradient Measure [cm]: N/A 

Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%] 
       0     0      100 
       0.1    0.074      100 
       0.2    0.148      100 
       0.3    0.222      100 
       0.4    0.296      100 
       0.5    0.37      100 
       0.6    0.444      100 
       0.7    0.518      100 
       0.8    0.592      100 

Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1) 
Dose Cover.[%]: 100.0 
Sampling Cover.[%]: 100.0 

Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%] 
       0     0      100 
       0.1    0.074      100 
       0.2    0.148      100 
       0.3    0.222      100 
       0.4    0.296      100 
       0.5    0.37      100 
       0.6    0.444      100

和代碼來獲得線是：

with open(data_file) as input_data: 
     # Skips text before the beginning of the interesting block: 
     for line in input_data: 
      if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
       break 
     # Reads text until the end of the block: 
     for line in input_data: # This keeps reading the file 
      if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
       break 
      text_line = np.fromstring(line, sep='\t') 
      print text_line

所以數據也自之前的文本是隨機的，所以我不能只說「跳過前5行」，但是標題總是相同的，並且它也是一樣的（在下一個數據開始之前）。所以我只需要一種方法來獲得原始數據，將其放入一個數組中，然後我可以從那裏使用它。

希望它現在更有意義。

來源

2017-03-13 Denver Dang

使用正則表達式來分割'\ s +' – BlackBear

輸入在缺少引號的情況下應該是字符串嗎？ – languitar

它沒有引號，這是肯定的。如果不是字符串，那麼正確的術語是什麼？ –

使用print text_line，您將看到陣列格式化爲字符串。它們被單獨格式化，所以列不排隊。

[ 0.2  0.148 100. ] 
[ 0.3  0.222 100. ] 
[ 0.4  0.296 100. ] 
[ 0.5  0.37 100. ] 
[ 0.6  0.444 100. ]

而不是打印，你可以收集列表中的值，並在最後連接。

沒有實際測試，我認爲這會工作：

data = [] 
with open(data_file) as input_data: 
     # Skips text before the beginning of the interesting block: 
     for line in input_data: 
      if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
       break 
     # Reads text until the end of the block: 
     for line in input_data: # This keeps reading the file 
      if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
       break 
      arr_line = np.fromstring(line, sep='\t') 
      # may need a test on len(arr_line) to weed out blank lines 
      data.append(arr_line) 
data = np.vstack(data)

另一種選擇是收集行不解析，並傳遞給np.genfromtxt。換句話說，使用你的代碼作爲過濾器來給numpy函數提供正確的線條。它從任何提供它的行輸入 - 文件，列表，生成器。

def filter(input_data): 
    # Skips text before the beginning of the interesting block: 
    for line in input_data: 
     if line.strip() == 'Relative dose [%]   Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed 
      break 
    # Reads text until the end of the block: 
    for line in input_data: # This keeps reading the file 
     if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)': 
      break 
     yield line 
with open(data_file) as f: 
    data = np.genfromtxt(filter(f)) # delimiter? 
print(data)

來源

2017-03-13 16:44:59 hpaulj

鑑於稱爲tmp.txt這樣一個文本文件：

0.2  0.148 100. 
    0.3  0.222 100. 
    0.4  0.296 100. 
    0.5  0.37 100. 
    0.6  0.444 100.

的片段：

with open('tmp.txt', 'r') as in_file: 
    print [map(float, line.split()) for line in in_file.readlines()]

將輸出：

[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]

哪個是你想要的希望之一。

來源

2017-03-13 13:23:09 Szabolcs

問題（我認爲）是，我解析了整個.txt文件，其中有很多不僅僅是所看到的值的內容。所以我不太確定這個程序是否可行？（我更新了我的問題，所以它可能更有意義） –

1）添加with open之前：

import re 
d_input = []

2）取代

 text_line = np.fromstring(line, sep='\t') 
     print text_line

到

 d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])

3）添加底：

d_array = np.array(d_input)

來源

2017-03-13 13:46:29

將字符串列表轉換爲Numpy數組（Python）

回答

相關問題