我在txt中有兩個段落。我有使用Python NLTK

-3

電子商務，通常寫爲電子商務找到從兩個款常用詞，是交易中利用計算機網絡，如對商品或服務的交易或便利互聯網或在線社交網絡。電子商務借鑑移動商務，電子資金轉賬，供應鏈管理，網絡營銷，在線交易處理，電子數據交換（EDI），庫存管理系統和自動化數據收集系統等技術。

第2

現代電子商務通常使用萬維網的交易生命週期的至少一部分雖然它也可以使用其他技術，如電子郵件。電子商務的好處包括訪問速度，更廣泛的商品和服務選擇，可訪問性和國際影響力。

我一定要找到之間的兩個段落的常用詞，並打印出來

來源

2017-08-04 BMK007

這聽起來像一個家庭作業的問題，所以不是提供答案，我會給你一個提示。 NLTK無法自己做到這一點 - 這不是NLTk的目的。然而，您最需要做的就是使用NLTK的標記器將段落分成單詞，然後將這些單詞放入集合中並進行比較（例如，通過建議的答案）。 –

第1步''nltk.word_tokenize'，第2步：參見https://stackoverflow.com/questions/15173225/how-to-calculate-cosine-similarity-given-2-sentence-strings-python或嘗試任何方法http://web.stanford.edu/class/linguist236/materials/ling236-handout-05-09-vsm.pdf – alvas

@BhatiManishKumar，改變你的用戶ID以隱藏你的名字也會響起警鐘。只要他們是很好的問題，就不會反對這個網站的規則問**。你已經有足夠的提示，現在去學一些Python。 – alexis

您可以使用set.intersection。

p1 = ''' 
Electronic commerce, commonly written as E-Commerce, is the trading or 
facilitation of trading in goods or services using computer networks, such 
as the Internet or online social networks. Electronic commerce draws on 
technologies such as mobile commerce, electronic funds transfer, supply 
chain management, Internet marketing, online transaction processing, 
electronic data interchange (EDI), inventory management systems, and 
automated data collection systems. 
'''.split() 

p2 = ''' 
Modern electronic commerce typically uses the World Wide Web for at least 
one part of the transaction's life cycle although it may also use other 
technologies such as e-mail. The benefits of e-commerce include it’s the 
speed of access, a wider selection of goods and services, accessibility, and 
international reach. 
'''.split() 

print(set(p1).intersection(p2)) 
{'and', 'the', 'technologies', 'of', 'electronic', 'such', 'commerce', 'as', 'goods'}

來源

2017-08-04 03:58:46 umutto

謝謝你的迴應，但我想要在nltk。但也謝謝 – BMK007

@BhatiManishKumar我不知道任何NLTK方法來實現相同的結果。如果您檢查[源]（https://github.com/nltk/nltk/blob/8eb3803cb88a6e75d18d4f740678b218b3d8f4fd/nltk/text.py#L107），他們也經常使用set.intersection（）來獲得類似的結果。 – umutto

如果你不需要做一些特別有關於語言處理，你不需要NLTK：

paragraph1 = paragraph1.lower().split() 
paragraph2 = paragraph2.lower().split() 

intersection = set(words1) & set(words2)

來源

2017-08-04 03:59:21

謝謝你的迴應，但我想在nltk – BMK007

@BhatiManishKumar這一刻你開始拋出不合理的約束，敲響警鐘 - 這很可能是作業或項目，否則沒有理由安裝一個50MB庫來做一些你可以做的事情用一個簡單的交集來實現。你不可能在這裏找到幫助。 –

我在txt中有兩個段落。我有使用Python NLTK

回答

相關問題