其實我是hadoop的新手,也是python ....所以我的疑問是如何在hadoop中運行python腳本.....而且我還在寫使用python..So wordcount的程序,才能不使用地圖減少執行該腳本.... 其實我寫的代碼我可以看到輸出如下 黑暗1 天堂2 它3 燈4 年齡5 6歲以下 全部7 全部8 權威9 之前10 之前11 之前12 相信13 最好的14 比較15 度16 絕望17 直接18 直接19如何在不使用map的情況下使用Python編寫wordcount程序reduce
It is counting number of words in a list..but whati have to achieve is grouping and deleting the duplicates and also count number of times of its occurrences .....
Below is my code . can somebody please tell me where i have done the mistake
********************************************************
Wordcount.py
********************************************************
import urllib2
import random
from operator import itemgetter
current_word = {}
current_count = 0
story = 'http://sixty-north.com/c/t.txt'
request = urllib2.Request(story)
response = urllib2.urlopen(request)
each_word = []
words = None
count = 1
same_words ={}
word = []
""" looping the entire file """
for line in response:
line_words = line.split()
for word in line_words: # looping each line and extracting words
each_word.append(word)
random.shuffle(each_word)
Sort_word = sorted(each_word)
for words in Sort_word:
same_words = words.lower(),int(count)
#print same_words
#print words
if not words in current_word :
current_count = current_count +1
print '%s\t%s' % (words, current_count)
else:
current_count = 1
#if Sort_word == words.lower():
#current_count += count
current_count = count
current_word = word
#print '2. %s\t%s' % (words, current_count)
謝謝你的回覆....我已經看到上面的鏈接你給我我只是在MRtask中運行mapper.py和reducer.py ...但我實際上想編寫一個特別的代碼到python – user2732609
你是不是指獨立的python腳本? – Thejas
是的,我做!!!!!!! – user2732609