爲什麼這個值會模糊不清？

我完全困惑，爲什麼我得到這個代碼ValueError;任何幫助讚賞。爲什麼這個值會模糊不清？

我有一個名爲global_output的數據框，有兩列：一列字和一列相應的值。我想對這些值執行一箇中位數分割，並將這些詞分配到兩個列表中 - 高和低 - 取決於它們是高於還是低於中值。

 Word   Ranking 
0  shuttle  0.9075 
1  flying  0.7750 
2  flight  0.7250 
3  trip   0.6775 
4  transport 0.6250 
5  escape  0.5850 
6  trajectory 0.5250 
7  departure 0.5175 
8  arrival  0.5175

我這樣做的代碼如下：

split = global_output['Abstraction'].quantile([0.5]) 

high = [] 
low = [] 


for j in range(len(global_output)): 
    if global_output['Ranking'][j] > split: 
     low_clt.append(global_output['Word'][j]) 
    else: 
     high.append(global_output['Word'][j])

不過，我不斷收到此錯誤。

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

現在，我明白了什麼錯誤意味着：它說，我想，如果它是一個單一的值來評估具有多個值的系列。不過，我實在看不出

global_output['Ranking'][j]

如何以任何方式不明確時，j取從循環的整數值。當我將它輸入到控制檯時，它每次都會產生一個浮點值。我在這裏錯過了什麼？

來源

2017-02-21 Lodore66

你跟arrays工作，所以最好是使用boolean indexing與mask和loc選擇欄：

#if need column Abstraction, change it 
split = global_output['Ranking'].quantile([0.5]).item() 
print (split) 
0.625 

mask = global_output['Ranking'] <= split 
print (mask) 
0 False 
1 False 
2 False 
3 False 
4  True 
5  True 
6  True 
7  True 
8  True 
Name: Ranking, dtype: bool 

high = global_output.loc[~mask, 'Word'].tolist() 
low = global_output.loc[mask, 'Word'].tolist() 

print (high) 
['shuttle', 'flying', 'flight', 'trip'] 

print (low) 
['transport', 'escape', 'trajectory', 'departure', 'arrival']

您的解決方案作品也，只需要通過item()一個項目Series轉換爲scalar，似乎>必須是<：

split = global_output['Ranking'].quantile([0.5]) 
print (split) 
0.5 0.625 
Name: Ranking, dtype: float64 

split = global_output['Ranking'].quantile([0.5]).item() 
print (split) 
0.625

而且您得到error，因爲您比較了一個項目Series。

來源

2017-02-21 13:19:27 jezrael

啊，我明白了！含糊不清來自global_output ['Ranking']。quantile（[0.5]）。這很好地解決了這個問題。感謝關於掩蔽的真正有用的建議！確切地說，是 – Lodore66

。你是對的 - 需要比較兩個標量。 – jezrael

爲什麼這個值會模糊不清？

回答

相關問題