如何根據數據可用性從excel或csv文件中讀取數據？

我有兩種類型的文件，excel和csv，我正在使用它讀取帶有兩個永久列的數據：問題，答案和兩個臨時列，可能存在或不存在Word和Replacement。如何根據數據可用性從excel或csv文件中讀取數據？

我已經做了不同的功能，從csv和excel文件中讀取數據，這將根據文件的擴展名來調用。

是否有一種方法可以根據它們何時存在以及何時不存在，從臨時列（Word和Replacement）中讀取數據。請參考下面的函數定義：

1）CSV文件：

def read_csv_file(path): 
    quesData = [] 
    ansData = [] 
    asciiIgnoreQues = [] 
    qWithoutPunctuation = [] 
    colnames = ['Question','Answer'] 
    data = pandas.read_csv(path, names = colnames) 
    quesData = data.Question.tolist() 
    ansData = data.Answer.tolist() 
    qWithoutPunctuation = quesData 

    qWithoutPunctuation = [''.join(c for c in s if c not in string.punctuation) for s in qWithoutPunctuation] 

    for x in qWithoutPunctuation: 
     asciiIgnoreQues.append(x.encode('ascii','ignore')) 

    return asciiIgnoreQues, ansData, quesData

2）功能來讀取Excel數據：

def read_excel_file(path): 
    book = open_workbook(path) 
    sheet = book.sheet_by_index(0) 
    quesData = [] 
    ansData = [] 
    asciiIgnoreQues = [] 
    qWithoutPunctuation = [] 

    for row in range(1, sheet.nrows): 
     quesData.append(sheet.cell(row,0).value) 
     ansData.append(sheet.cell(row,1).value) 

    qWithoutPunctuation = quesData 
    qWithoutPunctuation = [''.join(c for c in s if c not in string.punctuation) for s in qWithoutPunctuation] 

    for x in qWithoutPunctuation: 
     asciiIgnoreQues.append(x.encode('ascii','ignore')) 

    return asciiIgnoreQues, ansData, quesData

來源

2017-04-19 Rishabh Rusia

你認爲'pandas.read_csv'和'pandas.read_excel'嗎？他們將根據列出現的情況自動讀取。 – tmrlvi

@tmrlvi，我在讀取csv函數時使用了pandas.read_csv，但列標題必須在colnames中提供。但是如果我沒有單詞和替換曲面怎麼辦？ –

你不必提供它們。如果你不這樣做，'pandas'推斷出這些名字。還是你的數據不包含標題？ – tmrlvi

我不完全相信你試圖達到什麼，但是讀取和轉換數據的方式如下：

def read_file(path, typ): 
    if typ == "excel": 
     df = pd.read_excel(path, sheetname=0) # Default is zero 
    else: # Assuming "csv". You can make it explicit 
     df = pd.read_csv(path) 

    qWithoutPunctuation = df["Question"].apply(lambda s: ''.join(c for c in s if c not in string.punctuation)) 
    df["asciiIgnoreQues"] = qWithoutPunctuation.apply(lambda x: x.encode('ascii','ignore')) 

    return df 

# Call it like this: 
read_data("file1.csv","csv") 
read_data("file2.xls","excel") 
read_data("file2.xlsx","excel")

如果數據不包括Word和Replacement和["Question", "Word", "Replacemen", "Answer", "asciiIgnoreQues"]（如果包含），則這將返回DataFrame和["Question","Answer", "asciiIgnoreQues"]列。

請注意，我已經使用了apply，它使您能夠在所有系列上按元素運行函數。

來源

2017-04-19 19:34:22 tmrlvi

如何根據數據可用性從excel或csv文件中讀取數據？

回答

相關問題