將rqda文件轉換爲sql文件

我正在使用RQDA，它是rstudio中的一個包，用於手動編寫文本。最終的rqda文件是一個sql數據庫。我在文本中編碼語句，並使用不同的代碼，並在代碼類別中包含它們（例如：代碼類別「actor_party」和，然後是相關代碼「社會主義」，「自由主義」，「保守」等）。我完成了編碼，並希望與其執行社交網絡分析它。爲此，我想創建一個sql數據庫，以便每個代碼都可以在每行中獲得具有特定代碼的自己的列。每個代碼可以通過以下屬性進行標識：catid（=代碼類別號碼），fid（文件標識號碼）& selfirst（每個代碼的開始）。通過這樣做，爲每個編碼語句選擇了特定的catid，fid &自助式，這樣sqlite就可以將每個編碼標識爲唯一的（另外，如您在R腳本中所看到的，每個有效編碼的status = 1必須也可以當選）。
我使用版本爲0.99.879的rstudio，版本爲0.2-7的rqda和版本爲012.0的rsqlite。將rqda文件轉換爲sql文件

因此，下列R-代碼用於：

library(RSQLite) # load Package RSQLite 
setwd("C:/...") 

system("ls *.rqda", show=TRUE) 
sqlite <- dbDriver("SQLite") 
#specifing the file 
qdadb <- dbConnect(sqlite,"My_data.rqda") 


dbListTables(qdadb) 
dbListFields(qdadb, "coding") # that's where the codings are stored 


catid <- dbGetQuery(qdadb, "select distinct(catid) from treecode where status = 1 ORDER BY catid") 
i <- 1 
table <- dbGetQuery(qdadb, "select fid, selfirst from coding where status = 1 GROUP BY fid, selfirst") 
while(i <= max(catid)) { 
    ids <- dbGetQuery(qdadb, paste("select cid from treecode where (catid = ",i," and status = 1)", sep="")); 
    t <- dbGetQuery(qdadb, paste("select cid, fid, selfirst from coding where (cid in (", paste(as.character(ids$cid), sep="' '", collapse=","), ") and status = 1)", sep="")); 
    table <- merge(table, t, by = c("fid","selfirst"), all.x = T); 
    i <- i + 1; 
    } 
# warnings are created because of the same columns which are duplicated by the merging 

colnames(table) <- c("fid", "selfirst", dbGetQuery(qdadb, "select name from codecat where status = 1")[,1]) #each code has attributed a unique f(ile)id and selfirst (it's the unique starting point of each coding) 

# see below for an example of such a created table 

library(car) # Companion to Applied Regression package 

# years - catid = 1 
table$A00_time_frame <- recode(table$A00_time_frame, '1 = 2010; 2 = 2011; 3 = 2012; 4 = 2013; 5 = 2014; 6 = 2015') 

# Sources - catid = 2 
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 2 and status = 1)")[,1] 
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1] 
table$B00_source <- recode(table$B00_source, paste0("'", paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep="")) 

# Claimant type - catid = 3 
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 3 and status = 1)")[,1] 
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1] 
table$C00_claimant_type <- recode(table$C00_claimant_type, paste0("'", 
paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep="")) 

and so until "catid = 20"

這工作了，看起來像這樣： example_table [和這個表的推移，直到844行 - 只有FID是上升]

即使這樣做，並且創建的表格與總編碼數量相匹配，但仍有一些錯誤發生。有些代碼沒有鏈接到正確的語句（即使它們鏈接到正確的代碼類別，但沒有鏈接到正確的代碼語句）

我仍然是R（工作室）的初學者，無法解釋什麼地方出了錯。

有沒有人有一個想法這裏可能是什麼問題或錯誤，以及如何解決它？

根據要求，我很樂意分享我的文件：

任何建議或幫助是非常歡迎！

編輯： 這裏是a link我的數據，你可以複製它（該文件是在rqda格式，因爲我認爲，它的轉換可能是問題本身）的一個子集。
此外，給你兩個例子在哪裏看。

通過R中創建「表」，下面的行可以被識別

1 - 裂95，selfirst 4553然後將值編碼「世界報」，然後「E02_European_Commission」 +「G10_Cameroon」以後
然而，如果您檢查原始rqda文件中的編碼，代碼'喀麥隆'不在此文件中，而是在fid 70，自制5082和'2010年'中的'Welt'中。

- fid 90，自制959和2011年代表代碼'CDU'，最後一行'特殊申請人'顯示名稱'Martin Schulz'。
  但是，如果您檢查原始rqda文件中的編碼，那麼子集中的代碼'Martin Schulz'沒有附加任何編碼。

我希望，這兩個例子說明這個問題，並給你一個想法，其中在分別的問題是什麼樣子。

對不起，我還沒有提供它在第一位！

來源

2017-05-14 Stefan_W

這個問題太長了，請您發表一個可重現的例子嗎？ http://stackoverflow.com/q/5963269/946850 – krlmlr

也許先簡化代碼，更好地看看可能出現什麼問題？我個人更傾向於依賴SQL而不是R來整理所有信息：

t <- dbGetQuery(qdadb, "SELECT codecat.name, coding.cid, coding.fid, coding.selfirst 
     FROM treecode, coding, codecat 
     WHERE treecode.cid = coding.cid 
     AND treecode.catid = codecat.catid 
     AND treecode.status = 1 
     AND coding.status = 1") 
head(reshape(t, idvar = c("fid", "selfirst"), timevar = "name", direction = "wide"))

不知道這是你在找什麼或者它是否工作得更好。但它似乎更簡單的代碼來評估。

來源

2017-05-15 20:17:36

感謝您的幫助。簡化運作良好，並且更容易重新考慮「SELECT」類別。據我現在可以告訴，上面的@JosElkink代碼必須添加'coding.selend'cloumn。因此，'t < - dbGetQuery（qdadb，「Select codecat.name，coding.cid，coding.fid，coding.selfirst，coding.selend FROM ... = 1」）Selend必須被添加到頭部）「。這個新增功能幫助我確定了以前沒有刪除過的小編碼錯誤！ –

將rqda文件轉換爲sql文件

回答

相關問題