2016-12-04 121 views
-1

我試圖從this site得到一個數據從webservice獲取xml?

然後用一些它。對不起,不復制粘貼它,但它是一個很長的XML。到目前爲止,我試圖讓這個數據的方法:

from urllib.request import urlopen 
url = "http://degra.wi.pb.edu.pl/rozklady/webservices.php?" 
s = urlopen(url) 
content = s.read() 

print(content)看起來不錯,現在我想從它

<tabela_rozklad data-aktualizacji="1480583567"> 
<DZIEN>2</DZIEN> 
<GODZ>3</GODZ> 
<ILOSC>2</ILOSC> 
<TYG>0</TYG> 
<ID_NAUCZ>66</ID_NAUCZ> 
<ID_SALA>79</ID_SALA> 
<ID_PRZ>104</ID_PRZ> 
<RODZ>W</RODZ> 
<GRUPA>1</GRUPA> 
<ID_ST>13</ID_ST> 
<SEM>1</SEM> 
<ID_SPEC>0</ID_SPEC> 
</tabela_rozklad> 

得到一個數據我該如何處理這些數據很容易使用它?

+1

您解析XML。請Google這樣做。 – jonrsharpe

回答

0

您可以使用美麗的湯,捕捉你想要的標籤。下面的代碼應該讓你開始!

import pandas as pd 
import requests 
from bs4 import BeautifulSoup 

url = "http://degra.wi.pb.edu.pl/rozklady/webservices.php?" 

# secure url content 
response = requests.get(url).content 
soup = BeautifulSoup(response) 

# find each tabela_rozklad 
tables = soup.find_all('tabela_rozklad') 

# for each tabela_rozklad looks like there is 12 nested corresponding tags 
tags = ['dzien', 'godz', 'ilosc', 'tyg', 'id_naucz', 'id_sala', 
    'id_prz', 'rodz', 'grupa', 'id_st', 'sem', 'id_spec'] 

# initialize empty dataframe 
df = pd.DataFrame() 

# iterate over each tabela_rozklad and extract each tag and append to pandas dataframe 
for table in tables: 
    all = map(lambda x: table.find(x).text, tags) 
    df = df.append([all]) 

# insert tags as columns 
df.columns = tags 

# display first 5 rows of table 
df.head() 

# and the shape of the data 
df.shape # 665 rows, 12 columns 

# and now you can get to the information using traditional pandas functionality 

# for instance, count observations by rodz 
df.groupby('rodz').count() 

# or subset only observations where rodz = J 
J = df[df.rodz == 'J'] 
+0

謝謝你這真的讓我開始:> – Vesspe

+0

這真的很好,現在我試圖做出像 J = df [df.rodz =='J'&df.id_prz =='1']我想我錯過某種方法 – Vesspe

+0

嘗試df [(df.rodz ==「J」)&(df.id_prez ==「1」)] – datawrestler