Get元數據備份的保存WebCorpus

我已經Get元數據備份的保存WebCorpus

lapply(inspect(gsrc), write, filename, append=TRUE, ncolumns=1000) 
meta(gsrc[[1]]) 
Available meta data pairs are: 
Author  : 
DateTimeStamp: 2013-10-23 11:46:47 
Description : BDliveShutdown Will .......................... 
Heading  : Shutdown Will Hinder True Gauge of US Economy - New York Times 
ID   :

救了我100的文本文檔webCorpus成一個單一的文件我保存到一個文件中，以便將讀取

cop <- Corpus(DirSource("/home/ashish/tm_web/23", encoding = "UTF-8"),readerControl = list(language = "lat")) 
meta(cop[[1]]) 
Available meta data pairs are: 
Author  : 
DateTimeStamp: 2013-10-23 11:38:20 
Description : 
Heading  : 
ID   : ABC22.txt 
Language  : lat 
Origin  :

是否有可能（gsrc）或我是否必須保存元（gsrc [[1]]）以獲取元數據的保存語料庫或我必須保存100文本文件才能獲得元（cop）作爲元爲了得到它，任何幫助，謝謝。

來源

2013-10-23 Aashu

是否要保存所有的元標記或僅通過語料庫標記一些標記？ – agstudy

@agstudy兩者都可以，但我只希望某些標記爲DateTimeStamp和Heading .thanks。 – Aashu

你可以做這樣的事情。我使用tm包中的crude數據來顯示下面的想法。我想你可以輕鬆地更改代碼以便在代碼中使用它。

## For each tag , for each corpus , I apply meta 
## to get a list of list (list of tags, for each tag a list of metas) 
library(tm) 
data("crude") 
tags <- c('DateTimeStamp','Heading') 
res <- lapply(tags,function(tag) 
    lapply(crude,meta,tag)) 
names(res) <- tags 
## I save the list 
save(res,file = "meta.RData")

現在我加載保存的元，我做相反的工作。

## load the data 
load("meta.RData") 
## for each tag, for each corpus, assign the meta 
for(tag in tags){ 
     meta.tag <- res[[tag]] 
     lapply(seq_along(crude),function(y) 
      meta(crude[[y]],tag) <- meta.tag[[y]]) 
}

來源

2013-10-23 14:14:14 agstudy

將.RData中的元數據保存爲現在的100個文本是否好，因爲每天我都要添加res，這將導致信日期的內存問題！是不可能從文件中檢索？ – Aashu

@Aashu將它保存在一個文件中。 – agstudy

+1用於取消作業。 – Aashu

Get元數據備份的保存WebCorpus

回答

相關問題