從記事本中的文件中提取每個引號之間的文本++

我的文件包含超過2000個包含超過18000個句子的摘要，以標記開始，以標記結尾。我想找到使用記事本++的信息，我的文件的示意圖如下：從記事本中的文件中提取每個引號之間的文本++

<abstract> 
<sentence>Activationofthe<conslex="CD28_surface_receptor"sem="G#protein_family_or_group"><conslex="CD28"sem="G#protein_molecule">CD28</cons>surfacereceptor</cons>providesamajorcostimulatorysignalfor<conslex="T_cell_activation"sem="G#other_name">Tcellactivation</cons>resultinginenhancedproductionof<conslex="interleukin-2"sem="G#protein_molecule">interleukin-2</cons>(<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>)and<conslex="cell_proliferation"sem="G#other_name">cellproliferation</cons>.</sentence> 
<sentence>In<conslex="primary_T_lymphocyte"sem="G#cell_type">primaryTlymphocytes</cons>weshowthat<conslex="CD28"sem="G#protein_molecule">CD28</cons>ligationleadstotherapidintracellularformationof<conslex="reactive_oxygen_intermediate"sem="G#inorganic">reactiveoxygenintermediates</cons>(<conslex="ROI"sem="G#inorganic">ROIs</cons>)whicharerequiredfor<conslex="CD28-mediated_activation"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-mediatedactivation</cons>ofthe<conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>/<conslex="CD28-responsive_complex"sem="G#protein_complex"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-responsivecomplex</cons>and<conslex="IL-2_expression"sem="G#other_name"><conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expression</cons>.</sentence> 
<sentence>Delineationofthe<conslex="CD28_signaling_cascade"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>signalingcascade</cons>wasfoundtoinvolve<conslex="protein_tyrosine_kinase_activity"sem="G#other_name"><conslex="protein_tyrosine_kinase"sem="G#protein_family_or_group">proteintyrosinekinase</cons>activity</cons>,followedbytheactivationof<conslex="phospholipase_A2"sem="G#protein_molecule">phospholipaseA2</cons>and<conslex="5-lipoxygenase"sem="G#protein_molecule">5-lipoxygenase</cons>.</sentence> 
<sentence>Ourdatasuggestthat<conslex="lipoxygenase_metabolite"sem="G#protein_family_or_group"><conslex="lipoxygenase"sem="G#protein_molecule">lipoxygenase</cons>metabolites</cons>activate<conslex="ROI_formation"sem="G#other_name"><conslex="ROI"sem="G#inorganic">ROI</cons>formation</cons>whichtheninduce<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expressionvia<conslex="NF-kappa_B_activation"sem="G#other_name"><conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>activation</cons>.</sentence> 
<sentence>Thesefindingsshouldbeusefulfor<conslex="therapeutic_strategies"sem="G#other_name">therapeuticstrategies</cons>andthedevelopmentof<conslex="immunosuppressants"sem="G#other_name">immunosuppressants</cons>targetingthe<conslex="CD28_costimulatory_pathway"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>costimulatorypathway</cons>.</sentence> 
</abstract>

我想提取引號之間或者換句話說文本要刪除所有數據，除了是雙在整個文本引用例如我期望的輸出是這樣

CD28_surface_receptor G#protein_family_or_group CD28 G#protein_molecule 
primary_T_lymphocyte G#cell_type

我以前.*"(.*)".*在查找內容然後更換所有與\1取代。它只從每行的最後一行提取帶有引號的文本，但是我想從所有文檔和每行中提取，因爲在我的文件中有更多字符串帶有雙引號。

來源

2015-04-02 Shaheen Gul

爲什麼你是否發佈重複？ http://stackoverflow.com/questions/29409502/extracting-text-between-quotation-marks-in-notepad – deceze 2015-04-02 12:32:24

我得到註銷，不記得我的密碼 – 2015-04-02 12:34:08

我的這個問題還沒有解決 – 2015-04-02 12:35:17

您可以使用[^"]*"([^"]+)"[^"]*查找內容，並與\1\r\n取代：

enter image description here

或者，讓他們製表符分隔，與\1\t取代：

enter image description here

來源

2015-04-02 13:04:44

謝謝，這對我很好，我有一個類似問題。一個問題，如果我希望輸出包含「引號」標記，我將如何更改正則表達式？編輯：使用「\ 1」\ r \ n的作品，哇正則表達式很簡單！ ... – gakera 2015-05-13 15:14:23

或者將它們添加到它們應該在的替換字符串中，或者將它們移到'（...）'捕獲組中：'[^「] *（」[^「] +」）[^「] * ' – 2015-05-13 15:15:32

感謝：DI upvoted這個問題，即使這是有點Engrishian（Engrish印度）我想問一個類似的問題，但這個可憐的人已經因爲詢問一個副本而受到重擊：P – gakera 2015-05-13 15:19:28

從記事本中的文件中提取每個引號之間的文本++

回答

相關問題