2014-05-12 42 views
0

我有一個帶有DatetimeIndex和ohlcv股票報價欄的熊貓數據框。我想提取符合特定閾值的價格波動/趨勢:波動幅度/趨勢/波動幅度大於0.3美元,波動幅度/趨勢波動幅度超過-0.3美元。用股票報價識別熊貓數據框的價格波動/趨勢

df[:10] 
          close high low open volume 
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600 
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400 
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800 
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700 
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200 
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400 
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900 
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000 
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800 
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700 

研究大熊貓的文件後,它看起來像Dataframe.apply()會的辦法,但我被困在建築物的功能(S)。由於我的編碼能力有限,我需要一些幫助。

global row_nr 
row_nr = 1 
def extract_swings() 
    if row_nr == 1 : pivot = row.open ; row_nr += 1 
    else : if (row.high-pivot) >= 0.3 : ???? 
    ... ???? 

df['swings'] = df.apply(extract_swings, axis=1) 

結果應該是這樣的:

df['swings'][:10] 
2014-05-09 09:30:00-04:00 NaN 
2014-05-09 09:31:00-04:00 NaN 
2014-05-09 09:32:00-04:00 -0.35 
2014-05-09 09:33:00-04:00 NaN 
2014-05-09 09:34:00-04:00 NaN 
2014-05-09 09:35:00-04:00 0.36 
2014-05-09 09:36:00-04:00 NaN 
2014-05-09 09:37:00-04:00 NaN 
2014-05-09 09:38:00-04:00 NaN 
2014-05-09 09:39:00-04:00 -0.59 

UPDATE:爲了避免任何混淆這裏是請求的功能應如何通過數據框:

      close high low open volume 
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600 
# this is the first line, first minute and we well take row.open 187.70 as \ 
# the starting point or first pivot 
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400 
# next minute we check if either (row.high - pivot) >= 0.3 or \ 
# (row.low-pivot) <= -0.3. Neither is true so nothing to do here. 
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800 
# next minute same check ... we see that row.low-pivot = -0.35. \ 
# We consider 187.35 a second pivot and the diff value -0.35 a first trend down 
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700 
# next minute we check if the identified trend/swing down goes further \ 
# down by having a row.low lower than previous row.low. If we would \ 
# have found here a new lower row.low that would be the second pivot \ 
# and we would forget about 187.35 as being a pivot ... and so on. \ 
# We don't see that on this row, instead we see prices are higher than \ 
# previous row, so we start checking the diff for a potential up trend \ 
# starting from second pivot 187.35. As long as we do not encounter a \ 
# higher high with over 0.3 above last pivot we are still within the identified down trend. 
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200 
# we don't see a lower low to reconsider the second pivot neither \ 
# a (row.high- second_pivot) >= 0.3 
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400 
# here we see (row.high- second_pivot) = 0.36. We consider 187.71 as \ 
# a third_pivot and the diff value 0.36 as an up trend (from second pivot to here) 
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900 
# next minute we check if the identified trend/swing up goes further up \ 
# by having a row.high higher than third pivot. If we would have found here \ 
# a new higher row.high that would be the third pivot and we would forget \ 
# about 187.71 as being a pivot ... and so on. We don't see that on this row,\ 
# instead we see prices are lower than previous row, so we start \ 
# checking the diff for a potential down trend starting from third \ 
# pivot 187.71. As long as we do not encounter a lower low with \ 
# over 0.3 below last pivot we are still within the identified up trend. 
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000 
# we find here a (row.low - third_pivot) = 0.43 so we have identified \ 
# a new down trend starting from third pivot and now we have a potential\ 
# fourth pivot 187.28 
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800 
# we find here a lower low so we don't consider 187.28 the fourth \ 
# pivot anymore but this lower low 187.26 
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700 
# we find here a lower low so we don't consider 187.26 the fourth pivot anymore \ 
# but this lower low 187.12. Being this the lowest low we consider this one \ 
# to be the fourth pivot and the diff 187.12-187.71=-0.59 as a downtrend with that value 
+0

我需要非常相似,在這裏這個曲折庫的解決方案:[鏈接](http://nbviewer.ipython.org/github/jbn/ZigZag/blob /master/zigzag_demo.ipynb) –

回答

1

這有點棘手,因爲直到找到下一個潛在支點(也就是說,如果你是在一個上升的趨勢,你不能說,它的完成,直到找到一個低足夠低,你不能標記一個點作爲支點)。

這段代碼的竅門 - 我已經把你的數據放在tmpData.txt文件中以方便使用,並獲得所需的結果。請檢查

def get_pivots(): 
    data = pd.DataFrame.from_csv('tmpData.txt') 
    data['swings'] = np.nan 

    pivot = data.irow(0).open 
    last_pivot_id = 0 
    up_down = 0 

    diff = .3 

    for i in range(0, len(data)): 
     row = data.irow(i) 

     # We don't have a trend yet 
     if up_down == 0: 
      if row.low < pivot - diff: 
       data.ix[i, 'swings'] = row.low - pivot 
       pivot, last_pivot_id = row.low, i 
       up_down = -1 
      elif row.high > pivot + diff: 
       data.ix[i, 'swings'] = row.high - pivot 
       pivot, last_pivot_id = row.high, i 
       up_down = 1 

     # Current trend is up 
     elif up_down == 1: 
      # If got higher than last pivot, update the swing 
      if row.high > pivot: 
       # Remove the last pivot, as it wasn't a real one 
       data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high']) 
       data.ix[last_pivot_id, 'swings'] = np.nan 
       pivot, last_pivot_id = row.high, i 
      elif row.low < pivot - diff: 
       data.ix[i, 'swings'] = row.low - pivot 
       pivot, last_pivot_id = row.low, i 
       # Change the trend indicator 
       up_down = -1 

     # Current trend is down 
     elif up_down == -1: 
      # If got lower than last pivot, update the swing 
      if row.low < pivot: 
       # Remove the last pivot, as it wasn't a real one 
       data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low']) 
       data.ix[last_pivot_id, 'swings'] = np.nan 
       pivot, last_pivot_id = row.low, i 
      elif row.high > pivot - diff: 
       data.ix[i, 'swings'] = row.high - pivot 
       pivot, last_pivot_id = row.high, i 
       # Change the trend indicator 
       up_down = 1 

    print data 

輸出:

date     close high low  open volume swings            
2014-05-09 13:30:00 187.56 187.73 187.54 187.70 1922600  NaN 
2014-05-09 13:31:00 187.49 187.56 187.42 187.55 534400  NaN 
2014-05-09 13:32:00 187.42 187.51 187.35 187.49 224800 -0.35 
2014-05-09 13:33:00 187.55 187.58 187.39 187.40 303700  NaN 
2014-05-09 13:34:00 187.67 187.67 187.53 187.56 438200  NaN 
2014-05-09 13:35:00 187.60 187.71 187.56 187.68 296400 0.36 
2014-05-09 13:36:00 187.41 187.67 187.38 187.60 329900  NaN 
2014-05-09 13:37:00 187.31 187.44 187.28 187.40 404000  NaN 
2014-05-09 13:38:00 187.26 187.37 187.26 187.30 912800  NaN 
2014-05-09 13:39:00 187.22 187.28 187.12 187.25 607700 -0.59 
+0

是@Pewel-Kozela ......感謝你的努力。它似乎在第一次檢查時得到了期望的結果......但是通過超過1百萬行的循環使用普通Python需要很長的時間。如果可能的話,我更喜歡矢量化的解決方案... –

+0

嘿,我不確定它可以通過apply()操作來完成,因爲我們需要當前行評估的庫存歷史記錄。什麼樣的加速因子適合你?我已經檢查了我的解決方案,使用iterrows()將執行時間減少了4,但對於1.5密耳線,它仍然會在1m30秒左右。 –

+0

看起來Pawel的解決方案是唯一可以工作的解決方案......畢竟。我做了一些小的調整:1.將「irow(0)」替換爲「iloc [0]」,因爲它將被棄用; 2.使用iterrows()並使其更快一點(2m51s + 5min中的1.5mil線); 3.用((row.high-pivot)/ diff替換「row.low 1」和「row.high> pivot + diff」 )> 1「,因爲我的門檻低於閾值......不能解釋爲什麼!? –

0
import numpy as np 

df['diff'] = df['high'] - df['low'] 
df['result'] = df['diff'].apply(lambda x: x if abs(x)>=0.3 else np.nan) 

print df['result'][:10] 
+1

如果你可以向量化,例如''df ['diff'],你應該避免使用''apply''其中(df ['diff']。abs()> = 0.3)''會要快得多 – Jeff

+0

這是別的,更簡單,但不同於我所需要的。結果應該與請求的完全一致。這實際上是真正的結果。 –

+0

我需要比較每行與之前的行。我們從第一分鐘的開盤價開始。我們稱這是第一個關鍵。然後,我們檢查每一行是否高於或低於第一個以0.3爲中心。如果是,那麼我們有更高或更低的低點。我們進一步進一步研究擺動的結束位置,以至少0.3的方式走向另一條腿......等等。它類似於這裏找到的peak_valley_pivots():[link](https://github.com/jbn/ZigZag/blob/master/zigzag/__init__.py),但我希望它沒有使用nanda只有熊貓快。 –

0

怎麼樣假設你只關心高點的時刻:

startPx = df.open.iloc[0] 
level = ((df.high - startPx)/.3).astype(int) 
df['swings'] = level - level.shift(1) 

現在,找出差異是什麼,你只是這樣做:

changes = df[df.swings != 0] 
diffs = changes.high - changes.open.shift(1) 
+0

不是一個解決方案。請參閱我上面的每行解釋。 –

+0

是的,我今天早上在想它,並意識到它不會工作,對此感到遺憾。 – acushner

0

,所以我沒有測試過這一點,但這樣的事情會得到你想要的東西。如果在同一分鐘內low < pivot - diffhigh > pivot + diff會發生什麼情況?

def f(df): 
    pivot = df.open.iloc[0] 
    diff = .3 
    def proc(ser): 
     res = np.nan 
     if ser.low < pivot - diff: 
      res, pivot = ser.low - pivot, ser.low 
     elif ser.high > pivot + diff: 
      res, pivot = ser.high - pivot, ser.high 
     return res 

    df['swings'] = df.apply(proc, axis=1) 
+0

行不通......因爲訣竅就像@Pewel-Kozela已經確定的那樣,直到找到下一個潛在的支點時(例如,如果你處於上升趨勢中,你不能說直到你找到0.3的低點)。他的解決方案很有效,但速度很慢,因爲它是純粹的蟒蛇循環(對於1.5密耳線〜〜5分鐘) –