2015-06-01 61 views
2

我試圖從一個JSON數據集提取的餐館一些信息,這裏有2個樣品,一個餐廳和一個不Python的JSON數據選擇

{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"} 
{"business_id": "mVHrayjG3uZ_RLHkLj-AMg", "full_address": "414 Hawkins Ave\nBraddock, PA 15104", "hours": {"Tuesday": {"close": "19:00", "open": "10:00"}, "Friday": {"close": "20:00", "open": "10:00"}, "Saturday": {"close": "16:00", "open": "10:00"}, "Thursday": {"close": "19:00", "open": "10:00"}, "Wednesday": {"close": "19:00", "open": "10:00"}}, "open": true, "categories": ["Bars", "American (New)", "Nightlife", "Lounges", "Restaurants"], "city": "Braddock", "review_count": 11, "name": "Emil's Lounge", "neighborhoods": [], "longitude": -79.866350699999998, "state": "PA", "stars": 4.5, "latitude": 40.408735, "attributes": {"Alcohol": "full_bar", "Noise Level": "average", "Has TV": true, "Attire": "casual", "Ambience": {"romantic": false, "intimate": false, "classy": false, "hipster": false, "divey": false, "touristy": false, "trendy": false, "upscale": false, "casual": false}, "Good for Kids": true, "Price Range": 1, "Good For Dancing": false, "Delivery": false, "Coat Check": false, "Smoking": "no", "Accepts Credit Cards": true, "Take-out": true, "Happy Hour": false, "Outdoor Seating": false, "Takes Reservations": false, "Waiter Service": true, "Wi-Fi": "no", "Caters": true, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": false, "breakfast": false, "brunch": false}, "Parking": {"garage": false, "street": false, "validated": false, "lot": false, "valet": false}, "Music": {"dj": false}, "Good For Groups": true}, "type": "business"} 

當我運行它打印兩即使第一位數據中不存在類別「餐廳」,任何人都可以解釋爲什麼請?

for line in f: 
    jd = json.loads(line) 
    if jd['categories'] == 'Food' or 'Restaurants': 
     print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude']) 

這裏的JSON數據以更可讀的格式:

{ 
    "business_id": "vcNAWiLM4dR7D2nwwJ7nCA", 
    "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", 
    "hours": { 
     "Thursday": { 
      "close": "17:00", 
      "open": "08:00" 
     }, 
     "Tuesday": { 
      "close": "17:00", 
      "open": "08:00" 
     }, 
     "Friday": { 
      "close": "17:00", 
      "open": "08:00" 
     }, 
     "Wednesday": { 
      "close": "17:00", 
      "open": "08:00" 
     }, 
     "Monday": { 
      "close": "17:00", 
      "open": "08:00" 
     } 
    }, 
    "open": true, 
    "categories": [ 
     "Doctors", 
     "Health & Medical" 
    ], 
    "city": "Phoenix", 
    "review_count": 9, 
    "name": "Eric Goldberg, MD", 
    "neighborhoods": [], 
    "longitude": -111.98375799999999, 
    "state": "AZ", 
    "stars": 3.5, 
    "latitude": 33.499313000000001, 
    "attributes": { 
     "By Appointment Only": true 
    }, 
    "type": "business" 
} 
{ 
    "business_id": "mVHrayjG3uZ_RLHkLj-AMg", 
    "full_address": "414 Hawkins Ave\nBraddock, PA 15104", 
    "hours": { 
     "Tuesday": { 
      "close": "19:00", 
      "open": "10:00" 
     }, 
     "Friday": { 
      "close": "20:00", 
      "open": "10:00" 
     }, 
     "Saturday": { 
      "close": "16:00", 
      "open": "10:00" 
     }, 
     "Thursday": { 
      "close": "19:00", 
      "open": "10:00" 
     }, 
     "Wednesday": { 
      "close": "19:00", 
      "open": "10:00" 
     } 
    }, 
    "open": true, 
    "categories": [ 
     "Bars", 
     "American (New)", 
     "Nightlife", 
     "Lounges", 
     "Restaurants" 
    ], 
    "city": "Braddock", 
    "review_count": 11, 
    "name": "Emil's Lounge", 
    "neighborhoods": [], 
    "longitude": -79.866350699999998, 
    "state": "PA", 
    "stars": 4.5, 
    "latitude": 40.408735, 
    "attributes": { 
     "Alcohol": "full_bar", 
     "Noise Level": "average", 
     "Music": { 
      "dj": false 
     }, 
     "Attire": "casual", 
     "Ambience": { 
      "touristy": false, 
      "hipster": false, 
      "romantic": false, 
      "divey": false, 
      "intimate": false, 
      "trendy": false, 
      "upscale": false, 
      "classy": false, 
      "casual": false 
     }, 
     "Good for Kids": true, 
     "Price Range": 1, 
     "Good For Dancing": false, 
     "Delivery": false, 
     "Coat Check": false, 
     "Smoking": "no", 
     "Accepts Credit Cards": true, 
     "Take-out": true, 
     "Happy Hour": false, 
     "Outdoor Seating": false, 
     "Takes Reservations": false, 
     "Waiter Service": true, 
     "Wi-Fi": "no", 
     "Caters": true, 
     "Good For": { 
      "dessert": false, 
      "latenight": false, 
      "lunch": false, 
      "dinner": false, 
      "brunch": false, 
      "breakfast": false 
     }, 
     "Parking": { 
      "garage": false, 
      "street": false, 
      "validated": false, 
      "lot": false, 
      "valet": false 
     }, 
     "Has TV": true, 
     "Good For Groups": true 
    }, 
    "type": "business" 
} 
+0

布魯諾desthuilliers和我在我們的答案中提到,這是痛苦的讀數JSON數據。下一次,_please_以更易讀的形式發佈您的數據,最好刪除不相關的字段,以便潛在的回覆者可以專注於您的實際問題。爲了未來讀者的利益,我將使用'json.dumps(jd,indent = 4)'創建一個格式化數據到這個問題,但請檢查它以確保我沒有無意中引入任何錯誤。 –

回答

6

此:

if jd['categories'] == 'Food' or 'Restaurants': 

被解析爲:

if (jd['categories'] == 'Food') or 'Restaurants': 

由於'Restaurant'是一個非空字符串,它總是有一個布爾上下文中的真正價值,所以你的測試是真的:

if (jd['categories'] == 'Food') or True: 

這是一個明顯的同義反復。

你想:

if jd['categories'] == 'Food' or jd['categories'] == 'Restaurants': 

或者更簡單地說:

if jd['categories'] in ('Food', 'Restaurants'): 
你的情況

現在(順便說一句,請需要時間清理,簡化和格式化 JSON片斷下一次發佈) ,jd['categories']是一個列表,所以你不能比較它與一個字符串 - 你可以,但它會永遠評價爲False - 也不使用上述的遏制測試,你必須檢查js['categories']包含'Food''Restaurants'

if 'Food' in jd['categories'] or 'Restaurants' in jd['categories']: 
+0

如果我使用 如果JD [「類」] ==「食品」或JD [「類」] ==「餐廳」: 我沒有得到任何輸出 難道是一個問題,如果它們是多個類別?我需要檢查它是否包含餐館類別,但它也可以有其他人 –

+0

@Ali_bean比較我編輯的答案。你的json片段格式不正確,所以我沒有發現「類別」是一個列表。 –

+0

非常感謝和道歉,我會更加小心格式化下一次 –

0

3號線似乎並不適當優化

for line in f: 
    jd = json.loads(line) 
    if jd['categories'] in ('Food', 'Restaurants'): 
     print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude']) 

你也可以認爲編碼或轉義從json.loads()函數,因爲這將是更多即將串適合於比較字符串。

+0

我正在逐行將其作爲一個1.5GB的數據集,但感謝! –

+0

你還可以看看熊貓的lib - pandas.pydata.org。 –

1

這不完全容易從OP中的數據進行測試,但你需要測試改變這樣的事情:

#Get category list from current dict 
cat = jd['categories'] 
if 'Food' in cat or 'Restaurants' in cat: 
    print(jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])