用股票報價識別熊貓數據框的價格波動/趨勢

我有一個帶有DatetimeIndex和ohlcv股票報價欄的熊貓數據框。我想提取符合特定閾值的價格波動/趨勢：波動幅度/趨勢/波動幅度大於0.3美元，波動幅度/趨勢波動幅度超過-0.3美元。用股票報價識別熊貓數據框的價格波動/趨勢

df[:10] 
          close high low open volume 
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600 
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400 
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800 
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700 
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200 
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400 
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900 
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000 
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800 
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700

研究大熊貓的文件後，它看起來像Dataframe.apply（）會的辦法，但我被困在建築物的功能（S）。由於我的編碼能力有限，我需要一些幫助。

global row_nr 
row_nr = 1 
def extract_swings() 
    if row_nr == 1 : pivot = row.open ; row_nr += 1 
    else : if (row.high-pivot) >= 0.3 : ???? 
    ... ???? 

df['swings'] = df.apply(extract_swings, axis=1)

結果應該是這樣的：

df['swings'][:10] 
2014-05-09 09:30:00-04:00 NaN 
2014-05-09 09:31:00-04:00 NaN 
2014-05-09 09:32:00-04:00 -0.35 
2014-05-09 09:33:00-04:00 NaN 
2014-05-09 09:34:00-04:00 NaN 
2014-05-09 09:35:00-04:00 0.36 
2014-05-09 09:36:00-04:00 NaN 
2014-05-09 09:37:00-04:00 NaN 
2014-05-09 09:38:00-04:00 NaN 
2014-05-09 09:39:00-04:00 -0.59

UPDATE：爲了避免任何混淆這裏是請求的功能應如何通過數據框：

      close high low open volume 
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600 
# this is the first line, first minute and we well take row.open 187.70 as \ 
# the starting point or first pivot 
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400 
# next minute we check if either (row.high - pivot) >= 0.3 or \ 
# (row.low-pivot) <= -0.3. Neither is true so nothing to do here. 
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800 
# next minute same check ... we see that row.low-pivot = -0.35. \ 
# We consider 187.35 a second pivot and the diff value -0.35 a first trend down 
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700 
# next minute we check if the identified trend/swing down goes further \ 
# down by having a row.low lower than previous row.low. If we would \ 
# have found here a new lower row.low that would be the second pivot \ 
# and we would forget about 187.35 as being a pivot ... and so on. \ 
# We don't see that on this row, instead we see prices are higher than \ 
# previous row, so we start checking the diff for a potential up trend \ 
# starting from second pivot 187.35. As long as we do not encounter a \ 
# higher high with over 0.3 above last pivot we are still within the identified down trend. 
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200 
# we don't see a lower low to reconsider the second pivot neither \ 
# a (row.high- second_pivot) >= 0.3 
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400 
# here we see (row.high- second_pivot) = 0.36. We consider 187.71 as \ 
# a third_pivot and the diff value 0.36 as an up trend (from second pivot to here) 
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900 
# next minute we check if the identified trend/swing up goes further up \ 
# by having a row.high higher than third pivot. If we would have found here \ 
# a new higher row.high that would be the third pivot and we would forget \ 
# about 187.71 as being a pivot ... and so on. We don't see that on this row,\ 
# instead we see prices are lower than previous row, so we start \ 
# checking the diff for a potential down trend starting from third \ 
# pivot 187.71. As long as we do not encounter a lower low with \ 
# over 0.3 below last pivot we are still within the identified up trend. 
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000 
# we find here a (row.low - third_pivot) = 0.43 so we have identified \ 
# a new down trend starting from third pivot and now we have a potential\ 
# fourth pivot 187.28 
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800 
# we find here a lower low so we don't consider 187.28 the fourth \ 
# pivot anymore but this lower low 187.26 
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700 
# we find here a lower low so we don't consider 187.26 the fourth pivot anymore \ 
# but this lower low 187.12. Being this the lowest low we consider this one \ 
# to be the fourth pivot and the diff 187.12-187.71=-0.59 as a downtrend with that value

來源

2014-05-12 Ovidiu Susan

我需要非常相似，在這裏這個曲折庫的解決方案：[鏈接]（http://nbviewer.ipython.org/github/jbn/ZigZag/blob /master/zigzag_demo.ipynb） –

這有點棘手，因爲直到找到下一個潛在支點（也就是說，如果你是在一個上升的趨勢，你不能說，它的完成，直到找到一個低足夠低，你不能標記一個點作爲支點）。

這段代碼的竅門 - 我已經把你的數據放在tmpData.txt文件中以方便使用，並獲得所需的結果。請檢查

def get_pivots(): 
    data = pd.DataFrame.from_csv('tmpData.txt') 
    data['swings'] = np.nan 

    pivot = data.irow(0).open 
    last_pivot_id = 0 
    up_down = 0 

    diff = .3 

    for i in range(0, len(data)): 
     row = data.irow(i) 

     # We don't have a trend yet 
     if up_down == 0: 
      if row.low < pivot - diff: 
       data.ix[i, 'swings'] = row.low - pivot 
       pivot, last_pivot_id = row.low, i 
       up_down = -1 
      elif row.high > pivot + diff: 
       data.ix[i, 'swings'] = row.high - pivot 
       pivot, last_pivot_id = row.high, i 
       up_down = 1 

     # Current trend is up 
     elif up_down == 1: 
      # If got higher than last pivot, update the swing 
      if row.high > pivot: 
       # Remove the last pivot, as it wasn't a real one 
       data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high']) 
       data.ix[last_pivot_id, 'swings'] = np.nan 
       pivot, last_pivot_id = row.high, i 
      elif row.low < pivot - diff: 
       data.ix[i, 'swings'] = row.low - pivot 
       pivot, last_pivot_id = row.low, i 
       # Change the trend indicator 
       up_down = -1 

     # Current trend is down 
     elif up_down == -1: 
      # If got lower than last pivot, update the swing 
      if row.low < pivot: 
       # Remove the last pivot, as it wasn't a real one 
       data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low']) 
       data.ix[last_pivot_id, 'swings'] = np.nan 
       pivot, last_pivot_id = row.low, i 
      elif row.high > pivot - diff: 
       data.ix[i, 'swings'] = row.high - pivot 
       pivot, last_pivot_id = row.high, i 
       # Change the trend indicator 
       up_down = 1 

    print data

輸出：

date     close high low  open volume swings            
2014-05-09 13:30:00 187.56 187.73 187.54 187.70 1922600  NaN 
2014-05-09 13:31:00 187.49 187.56 187.42 187.55 534400  NaN 
2014-05-09 13:32:00 187.42 187.51 187.35 187.49 224800 -0.35 
2014-05-09 13:33:00 187.55 187.58 187.39 187.40 303700  NaN 
2014-05-09 13:34:00 187.67 187.67 187.53 187.56 438200  NaN 
2014-05-09 13:35:00 187.60 187.71 187.56 187.68 296400 0.36 
2014-05-09 13:36:00 187.41 187.67 187.38 187.60 329900  NaN 
2014-05-09 13:37:00 187.31 187.44 187.28 187.40 404000  NaN 
2014-05-09 13:38:00 187.26 187.37 187.26 187.30 912800  NaN 
2014-05-09 13:39:00 187.22 187.28 187.12 187.25 607700 -0.59

來源

2014-05-13 16:17:08

是@Pewel-Kozela ......感謝你的努力。它似乎在第一次檢查時得到了期望的結果......但是通過超過1百萬行的循環使用普通Python需要很長的時間。如果可能的話，我更喜歡矢量化的解決方案... –

嘿，我不確定它可以通過apply（）操作來完成，因爲我們需要當前行評估的庫存歷史記錄。什麼樣的加速因子適合你？我已經檢查了我的解決方案，使用iterrows（）將執行時間減少了4，但對於1.5密耳線，它仍然會在1m30秒左右。 –

看起來Pawel的解決方案是唯一可以工作的解決方案......畢竟。我做了一些小的調整：1.將「irow（0）」替換爲「iloc [0]」，因爲它將被棄用; 2.使用iterrows（）並使其更快一點（2m51s + 5min中的1.5mil線）; 3.用（（row.high-pivot）/ diff替換「row.low 1」和「row.high> pivot + diff」）> 1「，因爲我的門檻低於閾值......不能解釋爲什麼！？ –

import numpy as np 

df['diff'] = df['high'] - df['low'] 
df['result'] = df['diff'].apply(lambda x: x if abs(x)>=0.3 else np.nan) 

print df['result'][:10]

來源

2014-05-12 18:58:28 maxbellec

如果你可以向量化，例如''df ['diff']，你應該避免使用''apply''其中（df ['diff']。abs（）> = 0.3）''會要快得多 – Jeff

這是別的，更簡單，但不同於我所需要的。結果應該與請求的完全一致。這實際上是真正的結果。 –

我需要比較每行與之前的行。我們從第一分鐘的開盤價開始。我們稱這是第一個關鍵。然後，我們檢查每一行是否高於或低於第一個以0.3爲中心。如果是，那麼我們有更高或更低的低點。我們進一步進一步研究擺動的結束位置，以至少0.3的方式走向另一條腿......等等。它類似於這裏找到的peak_valley_pivots（）：[link]（https://github.com/jbn/ZigZag/blob/master/zigzag/__init__.py），但我希望它沒有使用nanda只有熊貓快。 –

怎麼樣假設你只關心高點的時刻：

startPx = df.open.iloc[0] 
level = ((df.high - startPx)/.3).astype(int) 
df['swings'] = level - level.shift(1)

現在，找出差異是什麼，你只是這樣做：

changes = df[df.swings != 0] 
diffs = changes.high - changes.open.shift(1)

來源

2014-05-12 19:52:29 acushner

不是一個解決方案。請參閱我上面的每行解釋。 –

是的，我今天早上在想它，並意識到它不會工作，對此感到遺憾。 – acushner

，所以我沒有測試過這一點，但這樣的事情會得到你想要的東西。如果在同一分鐘內low < pivot - diff和high > pivot + diff會發生什麼情況？

def f(df): 
    pivot = df.open.iloc[0] 
    diff = .3 
    def proc(ser): 
     res = np.nan 
     if ser.low < pivot - diff: 
      res, pivot = ser.low - pivot, ser.low 
     elif ser.high > pivot + diff: 
      res, pivot = ser.high - pivot, ser.high 
     return res 

    df['swings'] = df.apply(proc, axis=1)

來源

2014-05-13 13:39:11 acushner

行不通......因爲訣竅就像@Pewel-Kozela已經確定的那樣，直到找到下一個潛在的支點時（例如，如果你處於上升趨勢中，你不能說直到你找到0.3的低點）。他的解決方案很有效，但速度很慢，因爲它是純粹的蟒蛇循環（對於1.5密耳線〜〜5分鐘） –

用股票報價識別熊貓數據框的價格波動/趨勢

回答

相關問題