我正在使用XML包處理R中的XML文件。我的最終目標是創建一個包含以下信息的數據框。R - 自然語言語料庫到數據框的XML
LUWPOS LUWDictionaryForm LUWLemma OrthographicTranscription PhoneticTranscription PlainOrthographicTranscription Devoiced MoraID ToneClass MoraID
動詞 ダイスル 題する 題し ダイシ 題し 1 3 accent 1
LUWPOS,LUWDictionaryForm,LUWLemma是LUW節點的基礎。 OrthographicTranscription,PhoneticTranscriptio,PlainOrthographicTranscription在SUW中,是LUW的女兒。已取消發票位於SUW的後代電話節點中。 MoraID是Mora節點的一員,它是Phone的祖母。 ToneClass是節點XJToBILabelTone的一個屬性,它是Phone的後代。第二個MoraID是包含Toneclass =口音的XJToBILabelTone的最接近的祖先。 即使所有手機節點都不包含att清音。在這種情況下,我不需要第一個MoraID。當XJToBILabelTone不包含ToneClass =「重音」時,我不需要第二個MoraID。
到目前爲止,我可以做到以下幾點:
doc= xmlInternalTreeParse(file="A01F0122.xml") #opens the file
luw <- xpathSApply(doc, "//LUW", xmlAttrs) #extracts the attributes of the node LUW
df <- data.frame(Reduce(rbind, luw)) #creates the dataframe
它給了我下面的輸出。
LUWID LUWPOS IsNewLine LineID LUWDictionaryForm LUWLemma LUWMiscPOSInfo1
19 2 名詞 1 002 ホンジツ 本日 2
20 3 名詞 1 003 ハッピョウシャ 発表者 3
21 4 助詞 0 003 ノ の 格助詞
22 5 名詞 1 004 ×××× ×× 固有名詞
23 6 名詞 1 005 キュウヨウ 急用 6
24 7 助詞 0 005 ニツキマシテ につきまして 格助詞
25 1 名詞 1 001 ケッセキ 欠席 1
26 2 助動詞 0 001 デゴザイマス でございます 連用形
27 3 助詞 0 001 テ て 接続助詞
28 4 名詞 1 002 カワリ 代わり 4
29 5 助詞 0 002 ニ に 格助詞
30 6 代名詞 1 003 ワタクシ 私 6
它包含我想要的一些信息,但我不知道如何得到LUW的後代。
<?xml version="1.0" encoding="UTF-8"?>
<Talk TalkID="A01F0122" SpeakerID="463" SpeakerBirthPlace="神奈川県" SpeakerBirthGeneration="70to74" SpeakerSex="女">
<TalkComment>
<Comment CommentStrings="講演ID:A01F0122"/>
<Comment CommentStrings=""/>
<Comment CommentStrings=""/>
</TalkComment>
<IPU IPUID="0001" IPUStartTime="00000.312" IPUEndTime="00001.973" Channel="L">
<LUW LUWID="9" LUWPOS="動詞" IsNewLine="1" LineID="006" LUWDictionaryForm="ダイスル" LUWLemma="題する" LUWConjugateType="サ行変格" LUWConjugateForm="連用形">
<SUW SUWID="1" ColumnID="001" SUWDictionaryForm="ダイスル" SUWLemma="題する" SUWConjugateForm="連用形" SUWConjugateType="サ行変格" SUWConjugateForm2="連用形" SUWConjugateType2="サ行変格" SUWPOS="動詞" OrthographicTranscription="題し" PhoneticTranscription="ダイシ" PlainOrthographicTranscription="題し" APID="7" Dep_BunsetsuUnitID="6" Dep_ModifieeBunsetsuUnitID="7">
<TransSUW TransSUWID="1">
<Mora MoraEntity="ダ" MoraID="1" PerceivedAcc="1">
<Phoneme PhonemeEntity="d" PhonemeID="1">
<Phone PhoneID="1" PhoneEntity="SclS" PhoneClass="others" PhoneStartTime="6.188682" PhoneEndTime="6.19458"/>
<Phone PhoneID="2" PhoneEntity="d" PhoneClass="consonant" PhoneStartTime="6.19458" PhoneEndTime="6.207031"/>
</Phoneme>
<Phoneme PhonemeEntity="a" PhonemeID="2">
<Phone PhoneID="1" PhoneEntity="a" PhoneClass="vowel" PhoneStartTime="6.207031" PhoneEndTime="6.317124">
<XJToBILabelTone Time="6.212447" F0="209.865" ToneClass="IBT">%L</XJToBILabelTone>
<XJToBILabelTone Time="6.275146" F0="195.496" ToneClass="accent">A</XJToBILabelTone>
</Phone>
</Phoneme>
</Mora>
<Mora MoraEntity="イ" MoraID="2">
<Phoneme PhonemeEntity="i" PhonemeID="1">
<Phone PhoneID="1" PhoneEntity="i" PhoneClass="vowel" PhoneStartTime="6.317124" PhoneEndTime="6.361029"/>
</Phoneme>
</Mora>
<Mora MoraEntity="シ" MoraID="3">
<Phoneme PhonemeEntity="sj" PhonemeID="1">
<Phone PhoneID="1" PhoneEntity="sj" PhoneClass="consonant" PhoneStartTime="6.361029" PhoneEndTime="6.406245" EndTimeUncertain="1"/>
</Phoneme>
<Phoneme PhonemeEntity="i" PhonemeID="2">
<Phone PhoneID="1" PhoneEntity="i" PhoneClass="vowel" Devoiced="1" PhoneStartTime="6.406245" PhoneEndTime="6.451461" StartTimeUncertain="1">
<XJToBILabelWord Time="6.451461" PerceivedAccPos="1">daisji</XJToBILabelWord>
<XJToBILabelBreak Time="6.451461">1</XJToBILabelBreak>
</Phone>
</Phoneme>
</Mora>
</TransSUW>
</SUW>
</LUW>
<LUW LUWID="10" LUWPOS="助詞" IsNewLine="0" LineID="006" LUWDictionaryForm="テ" LUWLemma="て" LUWMiscPOSInfo1="接続助詞">
<SUW SUWID="1" ColumnID="005" SUWDictionaryForm="テ" SUWLemma="て" SUWMiscPOSInfo1="接続助詞" SUWPOS="助詞" OrthographicTranscription="て" PhoneticTranscription="テ" PlainOrthographicTranscription="て" APID="7">
<TransSUW TransSUWID="1">
<Mora MoraEntity="テ" MoraID="1">
<Phoneme PhonemeEntity="t" PhonemeID="1">
<Phone PhoneID="1" PhoneEntity="SclS" PhoneClass="others" PhoneStartTime="6.451461" PhoneEndTime="6.484228">
<XJToBILabelTone Time="6.451887" ToneClass="LTBPM" F0Uncertain="1">L%</XJToBILabelTone>
</Phone>
<Phone PhoneID="2" PhoneEntity="t" PhoneClass="consonant" PhoneStartTime="6.484228" PhoneEndTime="6.497334"/>
</Phoneme>
<Phoneme PhonemeEntity="e" PhonemeID="2">
<Phone PhoneID="1" PhoneEntity="e" PhoneClass="vowel" PhoneStartTime="6.497334" PhoneEndTime="6.565485">
<XJToBILabelTone Time="6.536170" F0="245.046" ToneClass="Pointer">pH</XJToBILabelTone>
<XJToBILabelWord Time="6.565485" PerceivedAccPos="0">te</XJToBILabelWord>
<XJToBILabelBreak Time="6.565485">1</XJToBILabelBreak>
</Phone>
</Phoneme>
</Mora>
</TransSUW>
</SUW>
</LUW>
</IPU>
</Talk>