嘗試使用100昏暗的預訓練的word2vec的嵌入了訓練LSTMTensorflow tf.constant_initializer很慢
@staticmethod
def load_embeddings(pre_trained_embeddings_path, word_embed_size):
embd = []
import time
start_time = time.time()
cnt = 4
with codecs.open(pre_trained_embeddings_path, mode="r", encoding='utf-8') as f:
for line in f.readlines():
values = line.strip().split(' ')
embd.append(values[1:])
cnt += 1
if cnt % 100000 == 0:
print("word-vectors loaded: %d" % cnt)
embedding, vocab_size, embed_dim = embd, len(embd), len(embd[0])
load_end_time = time.time()
print("word vectors loaded from and start initialising, cnt: %d, time taken: %d secs " % (vocab_size, load_end_time - start_time))
embedding_init = tf.constant_initializer(embedding, dtype=tf.float16)
src_word_embedding = tf.get_variable(shape=[vocab_size, embed_dim], initializer=embedding_init, trainable=False, name='word_embedding', dtype=tf.float16)
print("word-vectors loaded and initialised, cnt: %d, time taken: %d secs" % (vocab_size, time.time() - load_end_time))
return src_word_embedding
並在此輸出運行時,這種方法是這樣的:
word vectors loaded from and start initialising, cnt: 2419080, time taken: 74 secs
word-vectors loaded and initialised, cnt: 2419080, time taken: 1647 secs
系統信息:tensorflow 1.1.0, tcmalloc, python 3.6, ubuntu 14.04
半小時初始化似乎很慢,或者它是一個正常的行爲?任何想法可能是這個問題或有一個?
UPDATE:使用@sirfz供給的嵌入的方法使得它非常快加載的嵌入Initialization Done in 85 secs
它在float32中慢嗎? – user1735003
是與float32類似的時間 – jknair
這似乎是一個公開的問題。參考[GPU上的布爾運算非常緩慢](https://github.com/tensorflow/tensorflow/issues/3649)。 – frankyjuang