如何提取單詞與其下一次出現之間的文本？

-1

mystr = r'''\documentclass[12pt]{article} 
\usepackage{amsmath} 
\title{\LaTeX} 
\begin{document} 
\section{Introduction} 
This is introduction paragraph 
\section{Non-Introduction} 
This is non-introduction paragraph 
\section{Sample section} 
This is sample section paragraph 
\begin{itemize} 
    \item Item 1 
    \item Item 2 
\end{itemize} 
\end{document}'''

我試圖做到的是創建一個正則表達式，會從mystr提取以下各行：

['This is introduction paragraph','This is non-introduction paragraph',' This is sample section paragraph\n \begin{itemize}\n\item Item 1\n\item Item 2\n\end{itemize}']

來源

2016-10-28 Rebhu Johymalyo Josh

「split（）」的作用。爲什麼它必須是一個正則表達式？ –

您的示例沒有說明問題。「快速大象」後面沒有出現「a」這個詞。 – roarsneer

http://stackoverflow.com/questions/743806/split-string-into-a-list-in-python有更詳細的描述，但上面的答案是正確的... –

可以使用split從str方法：在

my_string = "a quick brown fox jumps over a lazy dog than a quick elephant" 
word = "a " 
my_string.split(word)

結果：

['', 'quick brown fox jumps over ', 'lazy dog than ', 'quick elephant']

注：不要使用str作爲變量名，因爲它是一個Python關鍵字。

來源

2016-10-28 11:54:29

str不是python中的關鍵字。它只是在課堂上建立。因此從技術上講，在使用str關鍵字時沒有問題 –

@RebhuJohymalyoJosh雖然你是對的，但它不是一個關鍵字，你錯誤地指出在使用它時沒有問題。使用'str'作爲變量的名稱會掩蓋內置的'str'，從長遠來看可能會導致意想不到的問題。總之，*避免使用它*。 –

@Jose Sanchez：如果你用一個單詞餵它，最後包含一個「a」，你的解決方案會給出奇怪的結果，例如「快速的棕色狐狸跳過懶惰的喇嘛而不是快速的大象」。你可以用「a」來分割，如果你使用「」+ my_string – am2

由於任何原因你需要使用正則表達式。也許分裂字符串比「a」更多參與。該re模塊具有分割功能太：

import re 
str_ = "a quick brown fox jumps over a lazy dog than a quick elephant" 


print(re.split(r'\s?\ba\b\s?',str_)) 

# ['', 'quick brown fox jumps over', 'lazy dog than', 'quick elephant']

編輯：擴大答案與你提供的新資料...

您的編輯後在你寫你的問題，你有更好的說明包括一個看起來像LaTeX的文本，我認爲你需要提取那些不以\開頭的行，這些是乳膠命令。換句話說，你需要只有文本的行。嘗試以下操作，始終使用正則表達式：

import re 

mystr = r'''\documentclass[12pt]{article} 
\usepackage{amsmath} 
\title{\LaTeX} 
\begin{document} 
\section{Introduction} 
This is introduction paragraph 
\section{Non-Introduction} 
This is non-introduction paragraph 
\section{Sample section} 
This is sample section paragraph 
\end{document}''' 

pattern = r"^[^\\]*\n" 


matches = re.findall(pattern, mystr, flags=re.M) 

print(matches) 

# ['This is introduction paragraph\n', 'This is non-introduction paragraph\n', 'This is sample section paragraph\n']

來源

2016-10-28 11:55:23 chapelo

謝謝你......但分割並不能解決我的目的。對不起，無法完整寫出問題。 –

@RebhuJohymalyoJosh：好吧，試着更好地解釋它，或許用你想要做的事情的一個真實例子和代碼。 – chapelo

我編輯了這個問題。請看看它。 –

如何提取單詞與其下一次出現之間的文本？

回答

相關問題