R代碼自動化

我在做PCA。下面是各項─R代碼自動化

### Read .csv file ##### 
    data<-read.csv(file.choose(),header=T,sep=",") 
    names(data) 
    data$qcountry 
#### for the country-ARGENTINA####### 
ar_data<-data[which(data$qcountry=="ar"),] 
ar_data$qcountry<-NULL 
names(ar_data) 
names(ar_data)<-c("01_insufficient_efficacy","02_safety_issues","03_inconvenient_dosage_regimen","04_price_issues" 
        ,"05_not_reimbursed","06_not_inculed_govt","07_insuficient_clinicaldata","08_previously_used","09_prescription_opted_for_some_patients","10_scientific_info_NA","12_involved_in_diff_clinical_trial" 
        ,"13_patient_inappropriate_for_TT","14_patient_inappropriate_Erb","16_patient_over_65","17_Erbitux_alternative","95_Others") 

     names(ar_data) 
     ar_data_wdt_zero_columns<-ar_data[, colSums(ar_data != 0) > 0] 
####Testing multicollinearity#### 
     vif(ar_data_wdt_zero_columns) 

#### Testing appropriatness of PCA #### 
      KMO(ar_data_wdt_zero_columns) 
      cortest.bartlett(ar_data_wdt_zero_columns) 

    #### Run PCA #### 
     pca<-prcomp(ar_data_wdt_zero_columns,center=F,scale=F) 
     summary(pca) 

#### Compute the loadings for deciding the top4 most correlated variables### 
     load<-pca$rotation 
     write.csv(load,"loadings_argentina_2015_Q4.csv")

我在這裏爲一個國家所示的代碼，我已經爲9countries做到了這一點。對於每個國家我都必須運行此代碼。我確信必須有更簡單的方法來自動執行此代碼。請建議！謝謝！

來源

2016-07-07 Kavya

是的，這對每個國家都是可行的。你可以使你的自定義功能採用適當的參數，例如國家名稱和數據。你在裏面做魔法並返回一個適當的對象（或不）。把這個魔法傳遞給你導入的一個處理過的數據，並且只做一次。下面的代碼沒有經過測試，但應該讓你開始。

有幾點意見。請勿使用file.choose()，因爲它會在行後3天內破壞您的代碼。你怎麼知道要選擇什麼文件？爲什麼在每次運行腳本時都點擊，以便可以使腳本適合您？在這個意義上說是懶惰的。

你的腳本中有很多混亂。堅持一些風格，不要隨意留下任何線索，以便嘗試「屎和咯咯」。至少在代碼中使用空格。

在選擇對象名稱時更有想象力。如果可能對象已經以基本函數的形式存在，例如， load。

myPCA <- function(my.country, my.data) { 

    ar_data <- data[data$qcountry %in% "ar", ] 
    ar_data$qcountry <- NULL 

    ar_data_wdt_zero_columns <- ar_data[, colSums(ar_data != 0) > 0] 

    #### Run PCA #### 
    pca <- prcomp(ar_data_wdt_zero_columns, center = FALSE, scale = FALSE) 

    #### Compute the loadings for deciding the top4 most correlated variables### 
    write.csv(pca$rotation, paste("loadings_", my.country, ".csv", sep = "")) # may need tweaking 

    return(list(pca = pca, vif = vif(ar_data_wdt_zero_columns), 
       kmo = KMO(ar_data_wdt_zero_columns), correlation = cortest.bartlett(ar_data_wdt_zero_columns)) 
} 

data <- read.csv("relative_link_to_file", header = TRUE, sep = ",") 
names(data) <- c("01_insufficient_efficacy","02_safety_issues","03_inconvenient_dosage_regimen","04_price_issues" 
        ,"05_not_reimbursed","06_not_inculed_govt","07_insuficient_clinicaldata","08_previously_used","09_prescription_opted_for_some_patients","10_scientific_info_NA","12_involved_in_diff_clinical_trial" 
        ,"13_patient_inappropriate_for_TT","14_patient_inappropriate_Erb","16_patient_over_65","17_Erbitux_alternative","95_Others") 

sapply(data$qcountry, FUN = myPCA)

來源

2016-07-07 05:27:02

謝謝!!還有一件事我需要問一下，如果我想要讀取不同文件的文件夾，然後在這些文件的國家/地區運行PCA，該怎麼辦？ – Kavya

@Kavya看看'list.files（）'函數。這個問題在很多時候都被問及過。 –

這就是正確的，但我不能使用上述功能（myPCA）的相同。 – Kavya

回答

相關問題