2016-10-06 121 views
0

我一直在研究一段代碼,該代碼讀取製表符分隔的CSV文件,它代表一系列進程及其啓動時間和持續時間,並使用熊貓爲其創建數據幀。然後,我需要應用簡化的循環調度形式來查找過程的週轉時間,並從用戶輸入中獲取時間片。熊貓數據幀的循環調度

到目前爲止,我可以在CSV文件中讀取標籤並對其進行正確排序。但是,當試圖構建循環遍歷行以查找每個進程的完成時間時,我會卡住。

到目前爲止的代碼看起來像:

# round robin 
def rr(): 
    docname = (sys.argv[1]) 
    method = (sys.argv[2]) 
    # creates a variable from the user input to define timeslice 
    timeslice = int(re.search(r'\d+', method).group()) 
    # use pandas to create a 2-d data frame from tab delimited file, set column 0 (process names) to string, set column 
    # 1 & 2 (start time and duration, respectively) to integers 
    d = pd.read_csv(docname, delimiter="\t", header=None, dtype={'0': str, '1': np.int32, '2': np.int32}) 
    # sort d into d1 by values of start times[1], ascending 
    d1 = d.sort_values(by=1) 
    # Create a 4th column, set to 0, for the Completion time 
    d1[3] = 0 
    # change column names 
    d1.columns = ['Process', 'Start', 'Duration', 'Completion'] 
    # intialize counter 
    counter = 0 
    # if any values in column 'Duration' are above 0, continue the loop 
    while (d1['Duration']).any() > 0: 
     for index, row in d1.iterrows(): 
      # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter, 
      # subtract it from the the current value in column 'Duration' 
      if row.Duration > timeslice: 
       counter += timeslice 
       row.Duration -= timeslice 
       print(index, row.Duration) 
      # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter 
      # subtract the Duration from itself, to make it 0 
      # set row:Completion to the current counter, which is the completion time for the process 
      elif row.Duration <= timeslice and row.Duration != 0: 
       counter += row.Duration 
       row.Duration -= row.Duration 
       row.Completion = counter 
       print(index, row.Duration) 
      # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator 
      else: 
       print(index, "Done") 

鑑於樣本CSV文件,d1看起來像

Process Start Duration Completion 
3  p4  0  280   0 
0  p1  5  140   0 
1  p2  14  75   0 
2  p3  36  320   0 
5  p6  40   0   0 
4  p5  67  125   0 

當我與timeslice = 70我的代碼運行,我得到一個無限循環:

3 210 
0 70 
1 5 
2 250 
5 Done 
4 55 
3 210 
0 70 
1 5 
2 250 
5 Done 
4 55 

它似乎是正確迭代循環ONC e,然後無限重複。但是,print(d1['Completion'])給出了全0的值,這意味着它不會將正確的counter值分配給d1['Completion']

理想情況下,Completion值將填寫自己的相應時間,給出timeslice=70,如:

Process Start Duration Completion 
3  p4  0  280   830 
0  p1  5  140   490 
1  p2  14  75   495 
2  p3  36  320   940 
5  p6  40   0   280 
4  p5  67  125   620 

,我可以再使用查找平均週轉時間。但是,出於某種原因,我的循環看起來會迭代一次,然後無休止地重複。當我嘗試切換whilefor語句的順序時,它會重複迭代每一行直到它達到0,同時給出不正確的完成時間。

在此先感謝。

+0

其實你不修改的dataframe.Try每一行的值列表中的分析數據,然後在列表中修改它們。 – Acepcs

+0

有沒有一種方法來保持與進程名稱相關的已分析數據的順序?我想到了你在說什麼,但無法確定哪一個完成時間是哪個過程。我最終完成了按完成時間排序的按字母排序。對不起,我對Python非常新,閱讀文檔無法以我理解的方式解釋它。 –

回答

0

我修改了一下你的代碼,它工作正常。你不能用你修改的值覆蓋原始值,所以循環不會結束。

while (d1['Duration']).any() > 0: 
    for index, row in d1.iterrows(): 
     # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter, 
     # subtract it from the the current value in column 'Duration' 
     if row.Duration > timeslice: 
      counter += timeslice 
      #row.Duration -= timeslice 
      # !!!LOOK HERE!!! 
      d1['Duration'][index] -= timeslice 
      print(index, row.Duration) 
     # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter 
     # subtract the Duration from itself, to make it 0 
     # set row:Completion to the current counter, which is the completion time for the process 
     elif row.Duration <= timeslice and row.Duration != 0: 
      counter += row.Duration 
      #row.Duration -= row.Duration 
      #row.Completion = counter 
      # !!!LOOK HERE!!! 
      d1['Duration'][index] = 0 
      d1['Completion'][index] = counter 
      print(index, row.Duration) 
     # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator 
     else: 
      print(index, "Done") 

順便說一句,我猜你可能要模擬的進程調度算法。在這種情況下,你必須考慮'開始',因爲不是每個過程都在同一時間開始。

(你理想的表是不知何故錯誤。)