2014-01-06 44 views
0

我正在處理推文並從推文中收集URL。從推文中過濾和處理url

  1. 如果URL代表的twitter(即,與t.comtwitter.com開始),然後跳過它
  2. 如果鳴叫URL短網址的話,我將其轉換爲長的URL。

CODE:

 if(preg_match($reg_exUrl, $tweet, $url)) { 
       preg_match_all($reg_exUrl, $tweet, $urls); 
       foreach ($urls[0] as $url) { 
       echo "Tiny url : {$url}<br>"; 
       $full = MyURLDecode($url); 
       echo "Full url : $full<br>"; 
       if (strpos($full, '//t.co') === true)     
        continue; 
       if (strpos($full, '//twitter.com') === true)      
       continue; 
       else if (strpos($full, '//bit.ly') !== true)      
        $full = MyURLDecode($full); 
       $url_count = get_twitter_url_count($full); 
       echo "Url count: $url_count";    
       //echo "Numbers of tweets containing this link : ", $code['count']; 
       echo "<br>"; 
       } 
      } else { 
      echo "Mismatch<br>";   
    }   
function MyURLDecode($url)  
    {  
     $ch = @curl_init($url);  
     @curl_setopt($ch, CURLOPT_HEADER, TRUE);  
     @curl_setopt($ch, CURLOPT_NOBODY, TRUE);  
     @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);  
     @curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);  
     $url_resp = @curl_exec($ch);  
     preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);  
     if (!isset($i[1]))  
     { 

     return $url;  
     }  
     return $i[1];  
    } 

function get_twitter_url_count($url) {  
      $encoded_url = urlencode($url);  
      $content = @file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);  
      return $content ? json_decode($content)->count : 0; 
     } 

問題與此是:

  1. 它不會跳過Twitter的URL
  2. 有些情況下長的URL是再短的URL,它需要被轉化爲長的網址。但它不能在這裏

回答

1

#1,strpos將返回找到的文本的起始位置,不會=== true,所以你需要測試,例如:

strpos($full, '//t.co') !== false 

#2,嘗試在一個while循環中調用MyURLDecode(),例如:

$previous = $full; 
while (($full = MyURLDecode($full)) != $previous) { 
    $previous = $full; 
}