寫單位測試的功能與收益

我想寫一個unittest使用一個發電機的函數。下面是我的代碼：寫單位測試的功能與收益

def extract_data(body): 
    for i in body: 
     a = re.sub('<[^<]+?>', '', str(i)) 
     b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a)) 
     c = re.sub('key', '', str(b)) 
     d = re.sub('\xc2', ' ', str(c)) 
     e = re.sub('\xa0', '', str(d)) 
     yield e

我的單元測試代碼：

def test_extract_data(self): 
     sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>'] 
     expected_res = 'This Test Passes' 
     res = extract_data(sample_input) 

     self.assertEqual(expected_res, res)

該測試通過沒有問題，如果extract_data函數使用了一回，而不是產量。我如何爲發電機編寫測試？

來源

2016-06-21 Lefty

除了你問的問題，你真的不應該使用正則表達式來嘗試解析HTML（在一般情況下是不可能的）。有一個非常好的圖書館可以爲你做到這一點，我強烈建議你使用：https://www.crummy.com/software/BeautifulSoup/bs4/doc/ –

http://stackoverflow.com/questions/12775794/unit-testing-a-function-that-returns-a-generator-object – AK47

@Lefty你的問題已被回答？ – dantiston

我想通了什麼，我需要做的。我需要將res放入列表中。就是這樣。比我想象的要簡單得多。所以這就是它現在的樣子：

class TestScrapePage(unittest.TestCase): 

    def test_extract_data(self): 
     sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>'] 
     expected_res = ['This Test Passes'] 
     res = list(extract_data(sample_input)) 

    self.assertEqual(expected_res, res) 

if __name__ == '__main__': 
    unittest.main()

來源

2016-06-27 19:28:10 Lefty

您的代碼，稍微改動不要求單元測試：

import re 

def extract_data(body): 
    for i in body: 
     a = re.sub('<[^<]+?>', '', str(i)) 
     b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a)) 
     c = re.sub('key', '', str(b)) 
     d = re.sub('\xc2', ' ', str(c)) 
     e = re.sub('\xa0', '', str(d)) 
     yield e 

def test_extract_data(): 
    sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>'] 
    expected_res = 'This Test Passes' 
    res = extract_data(sample_input) 
    return expected_res == res 

print(test_extract_data())

這將打印False

的問題是，當你做return，功能，在你的情況下，返回一個str。但是，當您執行yield時，它將返回一個generator類型對象，其next()函數返回str。因此，例如：

import re 

def extract_data(body): 
    for i in body: 
     a = re.sub('<[^<]+?>', '', str(i)) 
     b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a)) 
     c = re.sub('key', '', str(b)) 
     d = re.sub('\xc2', ' ', str(c)) 
     e = re.sub('\xa0', '', str(d)) 
     yield e 

def test_extract_data(): 
    sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>'] 
    expected_res = 'This Test Passes' 
    res = extract_data(sample_input) 
    return expected_res == next(res) 

print(test_extract_data())

這打印True。

爲了說明，在Python command prompt：

>>> type("hello") 
<class 'str'> 
>>> def gen(): 
...  yield "hello" 
... 
>>> type(gen()) 
<class 'generator'>

您的其他選項（可能更好，這取決於你的使用情況），是測試的所有generator的結果是正確的通過將generator對象的結果爲list或tuple，然後比較平等：

import re 

def extract_data(body): 
    for i in body: 
     a = re.sub('<[^<]+?>', '', str(i)) 
     b = re.sub('view\xc2\xa0book\xc2\xa0info', '', str(a)) 
     c = re.sub('key', '', str(b)) 
     d = re.sub('\xc2', ' ', str(c)) 
     e = re.sub('\xa0', '', str(d)) 
     yield e 

def test_extract_data(): 
    sample_input = ['<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes</h5></tr>', '<tr><h1>keyThis</h1><h2>\xc2</h2><h3>\xa0</h3><h4>view\xc2\xa0book\xc2\xa0info</h4><h5>Test Passes Too!</h5></tr>'] 
    expected_res = ['This Test Passes', 'This Test Passes Too!'] 
    res = extract_data(sample_input) 
    return expected_res == list(res) 

print(test_extract_data())

來源

2016-06-22 00:20:43 dantiston

我相信你的意思是'next（expected_res）== res'在你的第二個片段中。 –

@michael_j_ward感謝您的收穫！幾乎。 'expected_res'是一個'str'，所以接下來是'res' - :-) – dantiston

寫單位測試的功能與收益

回答

相關問題