0
我正在處理推文並從推文中收集URL。從推文中過濾和處理url
- 如果URL代表的twitter(即,與
t.com
或twitter.com
開始),然後跳過它 - 如果鳴叫URL短網址的話,我將其轉換爲長的URL。
CODE:
if(preg_match($reg_exUrl, $tweet, $url)) {
preg_match_all($reg_exUrl, $tweet, $urls);
foreach ($urls[0] as $url) {
echo "Tiny url : {$url}<br>";
$full = MyURLDecode($url);
echo "Full url : $full<br>";
if (strpos($full, '//t.co') === true)
continue;
if (strpos($full, '//twitter.com') === true)
continue;
else if (strpos($full, '//bit.ly') !== true)
$full = MyURLDecode($full);
$url_count = get_twitter_url_count($full);
echo "Url count: $url_count";
//echo "Numbers of tweets containing this link : ", $code['count'];
echo "<br>";
}
} else {
echo "Mismatch<br>";
}
function MyURLDecode($url)
{
$ch = @curl_init($url);
@curl_setopt($ch, CURLOPT_HEADER, TRUE);
@curl_setopt($ch, CURLOPT_NOBODY, TRUE);
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$url_resp = @curl_exec($ch);
preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);
if (!isset($i[1]))
{
return $url;
}
return $i[1];
}
function get_twitter_url_count($url) {
$encoded_url = urlencode($url);
$content = @file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);
return $content ? json_decode($content)->count : 0;
}
問題與此是:
- 它不會跳過Twitter的URL
- 有些情況下長的URL是再短的URL,它需要被轉化爲長的網址。但它不能在這裏