我有一個巨大的HTML表格(約500,000行),我需要轉換成JSON文件。 表看起來是這樣的:將HTML表格轉換爲JSON
<table>
<tr>
<th>Id</th>
<th>Timestamp</th>
<th>Artist_Name</th>
<th>Tweet_Id</th>
<th>Created_at</th>
<th>Tweet</th>
<th>User_name</th>
<th>User_Id</th>
<th>Followers</th>
</tr>
<tr>
<td>1</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034567793442816</td>
<td>Fri Jun 07 15:59:48 +0000 2013</td>
<td>So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...</td>
<td>Nicole Barrett</td>
<td>33831594</td>
<td>62</td>
</tr>
<tr>
<td>2</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034476395368448</td>
<td>Fri Jun 07 15:59:27 +0000 2013</td>
<td>RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…</td>
<td>A.J.</td>
<td>24193447</td>
<td>340</td>
</tr>
我想創建一個JSON文件看起來某事像那:
{'data': [
{
'text': 'So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...',
'id': 1,
'tweet_id': 343034567793442816
},
{
'text': 'RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…',
'id': 2,
'tweet_id': 343034476395368448
}
]}
多用一些包括的變量,但應該是自我解釋
也許。
我已經看過幾個選項,但主要是我有問題,我的HTML表是如此之大。我看到很多人推薦jQuery。考慮到我的桌子的大小,這對我有意義嗎? 如果有合適的Python選項,我會非常青睞,因爲我一直在Python中編寫大部分代碼。
哇,一個HTML表500,000行的人,這是巨大的,我會說太龐大了......爲什麼不使用分頁?順便說一句,從哪裏來第一個這個數據?我認爲你沒有硬編碼你的桌子,對吧?! –
數據從Twitter進行抓取。我有一個數據庫,但出口到目前爲止只在HTML中成功。使用所有其他格式,數據庫工具取消了我的導出請求。 – Tom