2014-06-09 71 views
0

我已經解析了一個網頁並將所有鏈接寫入了一個csv文件;當我嘗試讀取從CSV這些鏈接我得到這個:如何從結果中刪除 t

[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]

\t是每一個字母后的到來,我已經嘗試過這種去除從結果\t但沒有運氣 這裏是我的代碼

out=open("categories.csv","rb") 
data=csv.reader(out) 
new_data=[[row[1]] for row in data] 
new_data = new_data.strip('\t\n\r') 
print new_data 

這給了一個錯誤

AttributeError: 'list' object has no attribute 'strip' 
+1

你有標籤每個個體之間*字符*;你是怎麼寫這個的? 'csv.writerow('single url string')'? –

+0

需要先學習基本的編程:http://learnpythonthehardway.org –

回答

1

您可以在字符串中使用re.sub功能,方便substition:

import re 
string = "[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts \tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]" 

new_string = re.sub(r'\t', '', string) 

print new_string 

======= OUTPUT:

[['http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011'], ['http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011']] 
0

注意噸他剝離方法只能從字符串的兩端刪除空白字符。 試試下面的方法:

out=open("categories.csv","rb") 
data=csv.reader(out) 
new_data=[[''.join(row[0].strip().split('\t'))] for row in data] 
print new_data 
+0

也許'csv.reader'應該被配置爲按標籤拆分... –

0

爲kludge解決方案:

x = [['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']] 

for y in x: 
    for z in y: 
     print("".join(z.split('\t'))) 

返回:

> http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011 
> http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011 
0

需要指數的字符串,然後做一個簡單的替換

string = [[...],[...]...] 

lst = [] 
for ylst in string: 
    for ln in ylst: 
     lst.append(ln.replace('\t','')) 

LST將包含各行沒有「\ T的