2013-02-12 39 views
1

我正在使用tm軟件包,並且正在尋找使用R的文檔的Flesch-Kincaid分數。我發現koRpus軟件包有很多指標,包括閱讀級別,並開始使用它。但是,返回的對象似乎是一個非常複雜的s4對象,我不明白如何解析。如何從R中的koRpus對象提取內容?

所以,我將此我的文集:

txt <- system.file("texts", "txt", package = "tm") 
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat"))) 

f <- function(x) tokenize(x, format="obj", lang='en') 
g <- function(x) flesch.kincaid(x) 
x <- foreach(i=1:5) %dopar% g(f(d[[i]])) 

x是然後應用到奧維flesch.kincaid的載體。

> x[[1]] 

Flesch-Kincaid Grade Level 
    Parameters: default 
     Grade: 13.62 
     Age: 18.62 

Text language: en 

我怎樣才能得到返回值等級= 13.62,年齡= 18.62?該STR(x)是如此之大,很難分析,即:

> str(x[[1]]) 
Formal class 'kRp.readability' [package "koRpus"] with 49 slots 
    [email protected] hyphen     :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots 
    .. .. [email protected] lang : chr "en" 
    .. .. [email protected] desc :List of 5 
    .. .. .. ..$ num.syll   : num 196 
    .. .. .. ..$ syll.distrib  : num [1:6, 1:4] 25 25 65 27.8 27.8 ... 
    .. .. .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ... 
    .. .. .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. .. .. ..$ avg.syll.word : num 2.18 
    .. .. .. ..$ syll.per100  : num 218 
    .. .. [email protected] hyphen:'data.frame': 90 obs. of 2 variables: 
    .. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ... 
    .. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ... 
    [email protected] param     :List of 1 
    .. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59 
    .. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const" 
    [email protected] ARI      :List of 1 
    .. ..$ : logi NA 
    [email protected] ARI.NRI     :List of 1 
    .. ..$ : logi NA 
    [email protected] ARI.simple    :List of 1 
    .. ..$ : logi NA 
    [email protected] Bormuth     :List of 1 
    .. ..$ : logi NA 
    [email protected] Coleman     :List of 1 
    .. ..$ : logi NA 
    [email protected] Coleman.Liau    :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall    :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall.PSK   :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall.old   :List of 1 
    .. ..$ : logi NA 
    [email protected] Danielson.Bryan   :List of 1 
    .. ..$ : logi NA 
    [email protected] Dickes.Steiwer   :List of 1 
    .. ..$ : logi NA 
    [email protected] DRP      :List of 1 
    .. ..$ : logi NA 
    [email protected] ELF      :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch     :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.PSK    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.de    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.es    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.fr    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.nl    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.Kincaid   :List of 3 
    .. ..$ flavour: chr "default" 
    .. ..$ grade : num 13.6 
    .. ..$ age : num 18.6 
    [email protected] Farr.Jenkins.Paterson :List of 1 
    .. ..$ : logi NA 
    [email protected] Farr.Jenkins.Paterson.PSK:List of 1 
    .. ..$ : logi NA 
    [email protected] FOG      :List of 1 
    .. ..$ : logi NA 
    [email protected] FOG.PSK     :List of 1 
    .. ..$ : logi NA 
    [email protected] FOG.NRI     :List of 1 
    .. ..$ : logi NA 
    [email protected] FORCAST     :List of 1 
    .. ..$ : logi NA 
    [email protected] FORCAST.RGL    :List of 1 
    .. ..$ : logi NA 
    [email protected] Fucks     :List of 1 
    .. ..$ : logi NA 
    [email protected] Harris.Jacobson   :List of 1 
    .. ..$ : logi NA 
    [email protected] Linsear.Write   :List of 1 
    .. ..$ : logi NA 
    [email protected] LIX      :List of 1 
    .. ..$ : logi NA 
    [email protected] RIX      :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG      :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.de     :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.C     :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.simple    :List of 1 
    .. ..$ : logi NA 
    [email protected] Spache     :List of 1 
    .. ..$ : logi NA 
    [email protected] Spache.old    :List of 1 
    .. ..$ : logi NA 
    [email protected] Strain     :List of 1 
    .. ..$ : logi NA 
    [email protected] Traenkle.Bailer   :List of 1 
    .. ..$ : logi NA 
    [email protected] TRI      :List of 1 
    .. ..$ : logi NA 
    [email protected] Wheeler.Smith   :List of 1 
    .. ..$ : logi NA 
    [email protected] Wheeler.Smith.de   :List of 1 
    .. ..$ : logi NA 
    [email protected] Wiener.STF    :List of 1 
    .. ..$ : logi NA 
    [email protected] lang      : chr "en" 
    [email protected] desc      :List of 26 
    .. ..$ sentences   : int 10 
    .. ..$ words    : int 90 
    .. ..$ letters   : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ... 
    .. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ... 
    .. ..$ all.chars   : int 692 
    .. ..$ syllables   : Named num [1:5] 196 25 32 25 8 
    .. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ... 
    .. ..$ lttr.distrib  : num [1:6, 1:11] 0 0 90 0 0 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ... 
    .. ..$ syll.distrib  : num [1:6, 1:4] 25 25 65 27.8 27.8 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. ..$ syll.uniq.distrib : num [1:6, 1:4] 15 15 61 19.7 19.7 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. ..$ punct    : int 17 
    .. ..$ conjunctions  : int 0 
    .. ..$ prepositions  : int 0 
    .. ..$ pronouns   : int 0 
    .. ..$ foreign   : int 0 
    .. ..$ TTR    : num 0.844 
    .. ..$ avg.sentc.length : num 9 
    .. ..$ avg.word.length : num 5.47 
    .. ..$ avg.syll.word  : num 2.18 
    .. ..$ sntc.per.word  : num 0.111 
    .. ..$ sntc.per100  : num 11.1 
    .. ..$ lett.per100  : num 547 
    .. ..$ syll.per100  : num 218 
    .. ..$ FOG.hard.words  : NULL 
    .. ..$ Bormuth.NOL  : NULL 
    .. ..$ Dale.Chall.NOL  : NULL 
    .. ..$ Harris.Jacobson.NOL: NULL 
    .. ..$ Spache.NOL   : NULL 
    [email protected] TT.res     :'data.frame': 107 obs. of 6 variables: 
    .. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ... 
    .. ..$ tag : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ... 
    .. ..$ lemma : chr [1:107] "" "" "" "" ... 
    .. ..$ lttr : num [1:107] 2 4 2 3 5 6 3 5 6 1 ... 
    .. ..$ wclass: chr [1:107] "word" "word" "word" "word" ... 
    .. ..$ desc : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ... 

我非常喜歡的F-K分數分配給元(d)早在TM。

我很欣賞學習如何理解這個返回對象並拿出它的價值,但是,如果還有另一種更好,更快的方式來獲得F-K分數,我全都是耳朵!

+0

我用foreach選擇的策略似乎限制了我的錯誤處理能力。如果任何人有如何直接推薦這個建議,我會很感激。 – Mittenchops 2013-02-13 15:29:45

回答

3

類似@保羅的答案,但一個班輪解決方案

sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade')) 
     [,1]  [,2]  [,3]  [,4]  [,5] 
age 18.61778 17.62351 17.77699 18.29032 18.645 
grade 13.61778 12.62351 12.77699 13.29032 13.645 
+0

對於將來的人來說,只需要在tm包中使用這個更新:(我只對年齡感興趣,而不是年級---自成績=年齡-5歲)。我發現我不得不這樣做(說它被分配給y),然後將其重新分配給元變量,即'meta(d,'f')< - unlist(y,use.names = F)' – Mittenchops 2013-02-13 14:17:23

3

只需使用:

slot(x[[1]], "Flesch.Kincaid") 

獲取包含這些值對象的子集。要在x每個元素的列表得到這些,做這樣的事情:

list_fk = lapply(x, slot, "Flesch.Kincaid) 

...並得到一個向量與grade

grades = sapply(list_fk, "[[", "grade")