Python提取字段和符號位置的子字符串

我一直在嘗試清理csv文件中的字段。該字段填充了數字和字符，我讀入熊貓數據框並轉換爲字符串。Python提取字段和符號位置的子字符串

目標是提取以下變量：StopId，StopCode（可能有多個爲每個記錄），RTE，路由ID從長字符串。這是我到目前爲止的嘗試。

在提取上面列出的變量後，我需要將變量/代碼與每個stop/route/rte的位置數據合併到另一個文件中。

用於FIELD記錄樣本：

「Web日誌：頁面生成的查詢[CID = SM & RTE = 50183 & DIR = S &天= 5761 &大衛·= 5761 & FST = 0％2C & TST = 0％2C]」
'Web日誌：頁面生成查詢：[_ = 1407744540393 & agencyId = SM & stopCode = 361096 & RTE = 7878％7eBus％7e251 & DIR = W]'
Web日誌：頁面生成查詢：[_ = 1407744956001 & agencyId = AC & stopCode = 55451 & stopCode = 55452stopCode = 55489 & & RTE = 43783％7eBus％7e88 & DIR = S]

解我試圖下面，但我卡住了！意見和建議表示讚賞

# Idea 1: Splits field above in a loop by '&' into a list. This is useful but I'll 
    # have to write additional code to pull out relevant variables 
    i = 0 
    for t in data['EVENT_DESCRIPTION']: 
     s = list(t.split('&')) 
     data['STOPS'][i] = [ x for x in s if "Web Log" not in x ] 
     i+=1 
    # Idea 1 next step help - how to pull out necessary variables from the list in data['STOPS'] 

    # Idea2: Loop through field with string to find the start and end of variable names. The output for stopcode_pl (et. al. variables) is tuple or list of tuples (if there are more than one in the string) 

    for i in data['EVENT_DESCRIPTION']: 
     stopcode_pl = [(a.start(), a.end()) for a in list(re.finditer('stopCode=', i))] 
     stopid_pl = i[(a.start(), a.end()) for a in list(re.finditer('stopId=', i))] 
     rte_pl = [(a.start(), a.end()) for a in list(re.finditer('rte=', i))] 
     routeid_pl = [(a.start(), a.end()) for a in list(re.finditer('routeId=', i))] 
    #Idea2: Next Step Help - how to use the string location for variable names to pull the number of the relevant variable. Is there a trick to grab the characters in between the variable name last place (i.e. after the '=' of the variable name) and the next '&'?

來源

2014-09-04 Pow Chow

此功能

def qdata(rec): 
    return [tuple(item.split('=')) for item in rec[rec.find('[')+1:rec.find(']')].split('&')]

產量，例如，在第一個記錄：

[('cid', 'SM'), ('rte', '50183'), ('dir', 'S'), ('day', '5761'), ('dayid', '5761'), ('fst', '0%2c'), ('tst', '0%2c')]

然後你可以跨列表步驟搜索您的具體項目。

來源

2014-11-07 18:38:03

Python提取字段和符號位置的子字符串

回答

相關問題