2017-02-09 36 views
0

我擁有json格式的原始數據集。讓我們來加載它R.與R中的原始文件比較單詞

library("rjson") 
setwd("mydir") 
getwd() 
json_data <- fromJSON(paste(readLines("N1.json"), collapse="")) 
uu <- unlist(json_data) 
uutext <- uu[names(uu) == "text"] 

而且我還有一個數據集mydata2

mydata=read.csv(path to data/words) 

我需要找到mydata2的話,只有那些出現在JSON文件的消息。然後將這些消息寫入新文檔「xyz.txt」如何操作?

chalk  indirect   pick reaction   team  skip pumpkin  surprise   bless ignorance 
1  time  patient   road  extent   decade cemetery staircase  monarch  bubble  abbey 
2 service conglomerate  banish   pan  friendly position  tight highlight   rice disappear 
3 write   swear   break  tire    jam neutral momentum requirement relationship  matrix 
4 inspire   dose   jump  promote   trace latest absolute  adjust  joystick  habit 
5 wrong  behave   claim dedicate   threat  sell particle statement   teach  lamb 
6  eye  tissue prescription  problem  secretion revenge  barrel  beard  mechanism platform 
7 forest   kick   face wisecrack  uncertainty  ratio complain  doubt reflection realism 
8 total   fee  debate  hall   soft  smart  sip  ritual   pill category 
9 contain  headline   lump absorption superintendent digital increase   key  banner  second 

i mean 
chalk -1 number1  indirect -2 number2 

模板

Word1-1 number1-1; Word1-2 number 1-2; …; Word 1-10 number 1-10 
Word2-1 number2-1; Word2-2 number 2-2; …; Word 2-10 number 2-10 
+0

我們不太可能去下載的東西只是爲了回答你。請發佈您的數據樣本。 – GGamba

回答

0

下一次請包括實時數據。簡化模型:

library(data.table) 
word = c("test","meh","blah") 
jsonF = c("let's do test", "blah is right", "test blah", "test test") 

outp <- list() 
for (i in 1:length(word)) { 
outp[[i]] = as.data.frame(grep(word[i],jsonF,v=T,fixed=T)) # possibly, ignore.case=T 
} 

qq = rbindlist(outp) 
qq = unique(qq) 
print(qq) 

1:    let's do test 
2:     test blah 
3:     test test 
4:    blah is right 

編輯:快速和骯髒的粘貼/崩潰:

library(data.table) 

x = LETTERS[1:10] 
y = LETTERS[11:20] 

df = rbind(x,y)  

L = list() 
for (i in 1:nrow(df)) { 
    L[i] = paste0(df[i,],"-",seq(1,10)," ",i,"-",seq(1,10),collapse="; ") 

} 
Fin = cbind(L) 
View(Fin) 

給出:

> Fin 
L                       
    [1,] "A-1 1-1; B-2 1-2; C-3 1-3; D-4 1-4; E-5 1-5; F-6 1-6; G-7 1-7; H-8 1-8; I-9 1-9; J-10 1-10" 
    [2,] "K-1 2-1; L-2 2-2; M-3 2-3; N-4 2-4; O-5 2-5; P-6 2-6; Q-7 2-7; R-8 2-8; S-9 2-9; T-10 2-10" 
+0

謝謝Alexey!你可以告訴我嗎。在mydata2中https://www.mediafire.com/?fw6x4c0mv7r5gar如何在新數據集中寫入這種格式的前10個單詞。 Word1-1 number1-1; Word1-2 1-2號; ...; Word 1-10 number 1-10 Word2-1 number2-1; Word2-2數字2-2; ...;字2-10數字2-10,我需要安排它進一步分析。 謝謝 – fenton

+0

@fenton。請問,請輸入該文件的前10個單詞以及JSON文件的前10-15行?出於安全原因,不是每個人都會去下載未知來源的未知文件 –

+0

done,我編輯了我的文章 – fenton