2016-07-07 49 views
1

我在做PCA。下面是各項─R代碼自動化

### Read .csv file ##### 
    data<-read.csv(file.choose(),header=T,sep=",") 
    names(data) 
    data$qcountry 
#### for the country-ARGENTINA####### 
ar_data<-data[which(data$qcountry=="ar"),] 
ar_data$qcountry<-NULL 
names(ar_data) 
names(ar_data)<-c("01_insufficient_efficacy","02_safety_issues","03_inconvenient_dosage_regimen","04_price_issues" 
        ,"05_not_reimbursed","06_not_inculed_govt","07_insuficient_clinicaldata","08_previously_used","09_prescription_opted_for_some_patients","10_scientific_info_NA","12_involved_in_diff_clinical_trial" 
        ,"13_patient_inappropriate_for_TT","14_patient_inappropriate_Erb","16_patient_over_65","17_Erbitux_alternative","95_Others") 

     names(ar_data) 
     ar_data_wdt_zero_columns<-ar_data[, colSums(ar_data != 0) > 0] 
####Testing multicollinearity#### 
     vif(ar_data_wdt_zero_columns) 

#### Testing appropriatness of PCA #### 
      KMO(ar_data_wdt_zero_columns) 
      cortest.bartlett(ar_data_wdt_zero_columns) 

    #### Run PCA #### 
     pca<-prcomp(ar_data_wdt_zero_columns,center=F,scale=F) 
     summary(pca) 

#### Compute the loadings for deciding the top4 most correlated variables### 
     load<-pca$rotation 
     write.csv(load,"loadings_argentina_2015_Q4.csv") 

我在這裏爲一個國家所示的代碼,我已經爲9countries做到了這一點。對於每個國家我都必須運行此代碼。我確信必須有更簡單的方法來自動執行此代碼。請建議! 謝謝!

回答

1

是的,這對每個國家都是可行的。你可以使你的自定義功能採用適當的參數,例如國家名稱和數據。你在裏面做魔法並返回一個適當的對象(或不)。把這個魔法傳遞給你導入的一個處理過的數據,並且只做一次。下面的代碼沒有經過測試,但應該讓你開始。

有幾點意見。 請勿使用file.choose(),因爲它會在行後3天內破壞您的代碼。你怎麼知道要選擇什麼文件?爲什麼在每次運行腳本時都點擊,以便可以使腳本適合您?在這個意義上說是懶惰的。

你的腳本中有很多混亂。堅持一些風格,不要隨意留下任何線索,以便嘗試「屎和咯咯」。至少在代碼中使用空格。

在選擇對象名稱時更有想象力。如果可能對象已經以基本函數的形式存在,例如, load

myPCA <- function(my.country, my.data) { 

    ar_data <- data[data$qcountry %in% "ar", ] 
    ar_data$qcountry <- NULL 

    ar_data_wdt_zero_columns <- ar_data[, colSums(ar_data != 0) > 0] 

    #### Run PCA #### 
    pca <- prcomp(ar_data_wdt_zero_columns, center = FALSE, scale = FALSE) 

    #### Compute the loadings for deciding the top4 most correlated variables### 
    write.csv(pca$rotation, paste("loadings_", my.country, ".csv", sep = "")) # may need tweaking 

    return(list(pca = pca, vif = vif(ar_data_wdt_zero_columns), 
       kmo = KMO(ar_data_wdt_zero_columns), correlation = cortest.bartlett(ar_data_wdt_zero_columns)) 
} 

data <- read.csv("relative_link_to_file", header = TRUE, sep = ",") 
names(data) <- c("01_insufficient_efficacy","02_safety_issues","03_inconvenient_dosage_regimen","04_price_issues" 
        ,"05_not_reimbursed","06_not_inculed_govt","07_insuficient_clinicaldata","08_previously_used","09_prescription_opted_for_some_patients","10_scientific_info_NA","12_involved_in_diff_clinical_trial" 
        ,"13_patient_inappropriate_for_TT","14_patient_inappropriate_Erb","16_patient_over_65","17_Erbitux_alternative","95_Others") 

sapply(data$qcountry, FUN = myPCA) 
+0

謝謝!!還有一件事我需要問一下,如果我想要讀取不同文件的文件夾,然後在這些文件的國家/地區運行PCA,該怎麼辦? – Kavya

+0

@Kavya看看'list.files()'函數。這個問題在很多時候都被問及過。 –

+0

這就是正確的,但我不能使用上述功能(myPCA)的相同。 – Kavya