你好我正在使用sklearn並使用kmeans進行自然語言處理,我使用Kmeans從註釋創建聚類,然後創建一個字典,其中聚類的數量作爲一個關鍵字並將相關注釋列表作爲值關聯,如下所示:如何解決以下問題將字典寫入csv?
dict_clusters = {}
for i in range(0,len(kmeans.labels_)):
#print(kmeans.labels_[i])
#print(listComments[i])
if not kmeans.labels_[i] in dict_clusters:
dict_clusters[kmeans.labels_[i]] = []
dict_clusters[kmeans.labels_[i]].append(listComments[i])
print("dictionary constructed")
我還想寫本字典我嘗試了CSV:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows(dict_clusters)
Out.close()
但是我不知道爲什麼是錯誤的,因爲我得到了下面的錯誤,而且我不知道,如果這個錯誤與numpy有關,因爲kmeans.labels_包含多個值,
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 133, in <module>
w.writerows(dict_clusters)
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
File "C:\Program Files\Anaconda3\lib\csv.py", line 146, in _dict_to_list
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
TypeError: 'numpy.int32' object is not iterable
我想體會這種支持,我希望得到一個csv用我的字典如下:
key1, value
key2, value
.
.
.
keyN, value
從這裏的反饋之後,我嘗試:
with open("dictionary.csv", mode="wb") as out_file:
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
writer.writerow(dict_clusters)
我:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 129, in <module>
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
TypeError: __init__() missing 1 required positional argument: 'fieldnames'
attempt2:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()
輸出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 130, in <module>
w.writerows([dict_clusters])
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
TypeError: a bytes-like object is required, not 'str'
attempt3,這種嘗試需要花費大量的時間計算輸出:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()
,我使用的Python的版本如下:
3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
3.5.2
經過很多次嘗試,我決定使用更好的WA y以建立我的字典如下:
from collections import defaultdict
pairs = zip(y_pred, listComments)
dict_clusters2 = defaultdict(list)
for num, comment in pairs:
dict_clusters2[num].append(comment)
然而,似乎有些角色正在失敗的CSV文件的創建如下:
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in dict_clusters2.items():
writer.writerow([key, value])
輸出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 146, in <module>
writer.writerow([key, value])
File "C:\Program Files\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f609' in position 6056: character maps to <undefined>
在爲了更清楚我進行了:
for k,v in dict_clusters2.items():
print(k, v)
而且我得到這樣的:
1 ['hello this is','the car is red',....'performing test']
2 ['we already have','another comment',...'strings strings']
.
.
19 ['we have',' comment music',...'strings strings dance']
我的字典裏有一個關鍵,我想有一個CSV如下的幾點意見列表:
1,'hello this is','the car is red',....'performing test'
2,'we already have','another comment',...'strings strings'
.
.
19,'we have',' comment music',...'strings strings dance'
但似乎有些字符不是配合良好,一切都失敗了,我希望得到支持,感謝您的支持。
無關的問題:你可能想看看[ 'enumerate'](https://docs.python.org/3.5/library/functions.html#enumerate)和['dict.setdefault'](https://docs.python.org/3.5/library/stdtypes。 html#dict.setdefault)第一個代碼塊可以寫成類似'for i,標籤枚舉(kmeans.labels_):dict_clusters.setdefault(label,[])。append(listComments [i])'(儘管最好是split成幾行) –
甚至比'枚舉'更好,在這種情況下,你m ight希望檢出[zip](https://docs.python.org/3.5/library/functions.html#zip)以同時循環「listComments」和「kmeans.labels_'。有關循環索引的更多信息:http://treyhunner.com/2016/04/how-to-loop-with-indexes-in-python/ –
作爲'dict.setdefault'的替代方法,[collections.defaultdict(list )](https://docs.python.org/3.6/library/collections.html#defaultdict-examples)可以使用。我通常比'dict.setdefault'更喜歡'defaultdict',但它們都達到了相同的目的。 –