2012-07-18 19 views
12

我真的難以理解Twitter如何期望其API的用戶將其發送的明文tweets轉換爲正確鏈接的HTML。PHP:如何使用Twitter API的數據將推文中的URL,提及和hastags轉換爲鏈接?

這裏的交易:當你請求的詳細數據鳴叫Twitter的JSON API發送該組信息後面:

{ 
    "created_at":"Wed Jul 18 01:03:31 +0000 2012", 
    "id":225395341250412544, 
    "id_str":"225395341250412544", 
    "text":"This is a test tweet. #boring @nbc http://t.co/LUfDreY6 #skronk @crux http://t.co/VpuMlaDs @twitter", 
    "source":"web", 
    "truncated":false, 
    "in_reply_to_status_id":null, 
    "in_reply_to_status_id_str":null, 
    "in_reply_to_user_id":null, 
    "in_reply_to_user_id_str":null, 
    "in_reply_to_screen_name":null, 
    "user": <REDACTED>, 
    "geo":null, 
    "coordinates":null, 
    "place":null, 
    "contributors":null, 
    "retweet_count":0, 
    "entities":{ 
     "hashtags":[ 
      { 
       "text":"boring", 
       "indices":[22,29] 
      }, 
      { 
       "text":"skronk", 
       "indices":[56,63] 
      } 
     ], 
     "urls":[ 
      { 
       "url":"http://t.co/LUfDreY6", 
       "expanded_url":"http://www.twitter.com", 
       "display_url":"twitter.com", 
       "indices":[35,55] 
      }, 
      { 
       "url":"http://t.co/VpuMlaDs", 
       "expanded_url":"http://www.example.com", 
       "display_url":"example.com", 
       "indices":[70,90] 
      } 
     ], 
     "user_mentions":[ 
      { 
       "screen_name":"nbc", 
       "name":"NBC", 
       "id":26585095, 
       "id_str":"26585095", 
       "indices":[30,34] 
      }, 
      { 
       "screen_name":"crux", 
       "name":"Z. D. Smith", 
       "id":407213, 
       "id_str":"407213", 
       "indices":[64,69] 
      }, 
      { 
       "screen_name":"twitter", 
       "name":"Twitter", 
       "id":783214, 
       "id_str":"783214", 
       "indices":[91,99] 
      } 
     ] 
    }, 
    "favorited":false, 
    "retweeted":false, 
    "possibly_sensitive":false 
} 

有趣的部分,對於這個問題,是text元素和條目hashtagsuser_mentionsurls陣列。 Twitter正在告訴我們哪裏在text元素hastags,提到和URL出現與indices陣列......所以這裏的問題的癥結所在:

你怎麼使用那些indices陣列?

因爲替換text中的第一個鏈接元素將使後續鏈接元素的所有索引值無效,您不能直接在每個鏈接元素上使用substr_replace來循環使用它們。你也不能使用substr_replace的數組功能,因爲它只適用於你爲第一個參數給它一個字符串數組,而不是單個字符串(我已經測試過了,結果是......很奇怪)。

是否有一些函數可以在一個字符串中用不同的替換字符串同時替換多個由索引定界的子字符串?

回答

15

所有你需要做的使用Twitter提供了直線上升的指數有一個簡單的替換是收集你想,然後向後梳理他們的替代品。你可能會找到一個更聰明的方式來建立$實體,我希望它們可以隨意選擇,所以我只要KISS就行了。

無論哪種方式,我這裏的重點只是爲了表明你不需要爆炸字符串和字符數和什麼。不管你如何做,你所需要做的就是從最後開始,到字符串的開頭,並且twitter的索引仍然有效。

<?php 

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true) 
{ 
    $return = $tweet->text; 

    $entities = array(); 

    if($links && is_array($tweet->entities->urls)) 
    { 
     foreach($tweet->entities->urls as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = "<a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>"; 
      $entities[] = $temp; 
     } 
    } 
    if($users && is_array($tweet->entities->user_mentions)) 
    { 
     foreach($tweet->entities->user_mentions as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = "<a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>"; 
      $entities[] = $temp; 
     } 
    } 
    if($hashtags && is_array($tweet->entities->hashtags)) 
    { 
     foreach($tweet->entities->hashtags as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = "<a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>"; 
      $entities[] = $temp; 
     } 
    } 

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);}); 


    foreach($entities as $item) 
    { 
     $return = substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]); 
    } 

    return($return); 
} 


?> 
+1

* facepalm * ...爲什麼沒有***我想到這一點?? – CoreDumpError 2014-08-27 00:24:31

+0

適用於我...我在頂部添加了一些額外的代碼,因爲當您檢索轉發時,它們有時會包含省略號。 Twitter建議您在這種情況下使用retweet_status及其實體:https://dev.twitter.com/overview/api/entities-in-twitter-objects#retweets – diggersworld 2014-09-15 10:33:42

+0

哇,偉大的方法,不會想到這一點。 .. – 2015-05-08 21:31:33

13

好吧,所以我需要做到這一點,我解決了它。這是我寫的功能。 https://gist.github.com/3337428

function parse_message(&$tweet) { 
    if (!empty($tweet['entities'])) { 
     $replace_index = array(); 
     $append = array(); 
     $text = $tweet['text']; 
     foreach ($tweet['entities'] as $area => $items) { 
      $prefix = false; 
      $display = false; 
      switch ($area) { 
       case 'hashtags': 
        $find = 'text'; 
        $prefix = '#'; 
        $url = 'https://twitter.com/search/?src=hash&q=%23'; 
        break; 
       case 'user_mentions': 
        $find = 'screen_name'; 
        $prefix = '@'; 
        $url = 'https://twitter.com/'; 
        break; 
       case 'media': 
        $display = 'media_url_https'; 
        $href = 'media_url_https'; 
        $size = 'small'; 
        break; 
       case 'urls': 
        $find = 'url'; 
        $display = 'display_url'; 
        $url  = "expanded_url"; 
        break; 
       default: break; 
      } 
      foreach ($items as $item) { 
       if ($area == 'media') { 
        // We can display images at the end of the tweet but sizing needs to added all the way to the top. 
        // $append[$item->$display] = "<img src=\"{$item->$href}:$size\" />"; 
       }else{ 
        $msg  = $display ? $prefix.$item->$display : $prefix.$item->$find; 
        $replace = $prefix.$item->$find; 
        $href = isset($item->$url) ? $item->$url : $url; 
        if (!(strpos($href, 'http') === 0)) $href = "http://".$href; 
        if ($prefix) $href .= $item->$find; 
        $with = "<a href=\"$href\">$msg</a>"; 
        $replace_index[$replace] = $with; 
       } 
      } 
     } 
     foreach ($replace_index as $replace => $with) $tweet['text'] = str_replace($replace,$with,$tweet['text']); 
     foreach ($append as $add) $tweet['text'] .= $add; 
    } 
} 
+1

這是一個很好的基礎實現,但它有錯誤。它將display_url而不是展開的url作爲href(斷開與「...」的鏈接),併爲hashtags和@screen_names做了一些奇怪的替換。 – Noam 2013-12-26 14:18:53

+0

我已經編輯了這個與當前API更一致的工作。 – Styledev 2014-01-08 23:47:09

+0

有用的東西.. :) – mithunsatheesh 2014-03-01 20:29:39

7

這是一個邊緣的情況下,但使用str_replace函數()的Styledev的答案,如果一個實體被包含在另一個可能導致的問題。例如,「我是天才!#me #mensa」可能會變成「我是天才!#me#me nsa」如果較短的實體先被替換。

該解決方案避免了這個問題:

<?php 
/** 
* Hyperlinks hashtags, twitter names, and urls within the text of a tweet 
* 
* @param object $apiResponseTweetObject A json_decoded() one of these: https://dev.twitter.com/docs/platform-objects/tweets 
* @return string The tweet's text with hyperlinks added 
*/ 
function linkEntitiesWithinText($apiResponseTweetObject) { 

    // Convert tweet text to array of one-character strings 
    // $characters = str_split($apiResponseTweetObject->text); 
    $characters = preg_split('//u', $apiResponseTweetObject->text, null, PREG_SPLIT_NO_EMPTY); 

    // Insert starting and closing link tags at indices... 

    // ... for @user_mentions 
    foreach ($apiResponseTweetObject->entities->user_mentions as $entity) { 
     $link = "https://twitter.com/" . $entity->screen_name;   
     $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]]; 
     $characters[$entity->indices[1] - 1] .= "</a>";   
    }    

    // ... for #hashtags 
    foreach ($apiResponseTweetObject->entities->hashtags as $entity) { 
     $link = "https://twitter.com/search?q=%23" . $entity->text;   
     $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]]; 
     $characters[$entity->indices[1] - 1] .= "</a>";   
    } 

    // ... for http://urls 
    foreach ($apiResponseTweetObject->entities->urls as $entity) { 
     $link = $entity->expanded_url;   
     $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]]; 
     $characters[$entity->indices[1] - 1] .= "</a>";   
    } 

    // ... for media 
    foreach ($apiResponseTweetObject->entities->media as $entity) { 
     $link = $entity->expanded_url;   
     $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]]; 
     $characters[$entity->indices[1] - 1] .= "</a>";   
    } 

    // Convert array back to string 
    return implode('', $characters); 

} 
?> 
+0

我得到這個錯誤:注意:未定義的屬性:stdClass :: $文本在/Users/#/Sites/twitter/test.php在線9 – intelis 2013-04-16 05:55:36

+0

@intelis:你json_decode()鳴叫目的? – 2013-04-17 22:49:45

+0

是的,它正在工作,但它正在離開部分的twitts :( – intelis 2013-04-17 23:02:51

6

傑夫的解決方案與英文文本效果很好,但是當鳴叫包含非ASCII字符,它得到打破。該解決方案避免了這個問題:

mb_internal_encoding("UTF-8"); 

// Return hyperlinked tweet text from json_decoded status object: 
function MakeStatusLinks($status) 
{$TextLength=mb_strlen($status['text']); // Number of UTF-8 characters in plain tweet. 
for ($i=0;$i<$TextLength;$i++) 
{$ch=mb_substr($status['text'],$i,1); if ($ch<>"\n") $ChAr[]=$ch; else $ChAr[]="\n<br/>"; // Keep new lines in HTML tweet. 
} 
if (isset($status['entities']['user_mentions'])) 
foreach ($status['entities']['user_mentions'] as $entity) 
{$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/".$entity['screen_name']."'>".$ChAr[$entity['indices'][0]]; 
    $ChAr[$entity['indices'][1]-1].="</a>"; 
} 
if (isset($status['entities']['hashtags'])) 
foreach ($status['entities']['hashtags'] as $entity) 
{$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/search?q=%23".$entity['text']."'>".$ChAr[$entity['indices'][0]]; 
    $ChAr[$entity['indices'][1]-1] .= "</a>"; 
} 
if (isset($status['entities']['urls'])) 
foreach ($status['entities']['urls'] as $entity) 
{$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>"; 
    for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]=''; 
} 
if (isset($status['entities']['media'])) 
foreach ($status['entities']['media'] as $entity) 
{$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>"; 
    for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]=''; 
} 
return implode('', $ChAr); // HTML tweet. 
} 
+0

好的,我也遇到過這個問題。與我正在使用的解決方案。它涉及到多字節字符時正確拆分字符串。 – 2014-08-20 15:46:55

0

這裏是vita10gy的解決方案

function tweetTextToHtml(tweet, links, users, hashtags) { 

    if (typeof(links)==='undefined') { links = true; } 
    if (typeof(users)==='undefined') { users = true; } 
    if (typeof(hashtags)==='undefined') { hashtags = true; } 

    var returnStr = tweet.text; 
    var entitiesArray = []; 

    if(links && tweet.entities.urls.length > 0) { 
     jQuery.each(tweet.entities.urls, function() { 
      var temp1 = {}; 
      temp1.start = this.indices[0]; 
      temp1.end = this.indices[1]; 
      temp1.replacement = '<a href="' + this.expanded_url + '" target="_blank">' + this.display_url + '</a>'; 
      entitiesArray.push(temp1); 
     }); 
    } 

    if(users && tweet.entities.user_mentions.length > 0) { 
     jQuery.each(tweet.entities.user_mentions, function() { 
      var temp2 = {}; 
      temp2.start = this.indices[0]; 
      temp2.end = this.indices[1]; 
      temp2.replacement = '<a href="https://twitter.com/' + this.screen_name + '" target="_blank">@' + this.screen_name + '</a>'; 
      entitiesArray.push(temp2); 
     }); 
    } 

    if(hashtags && tweet.entities.hashtags.length > 0) { 
     jQuery.each(tweet.entities.hashtags, function() { 
      var temp3 = {}; 
      temp3.start = this.indices[0]; 
      temp3.end = this.indices[1]; 
      temp3.replacement = '<a href="https://twitter.com/hashtag/' + this.text + '?src=hash" target="_blank">#' + this.text + '</a>'; 
      entitiesArray.push(temp3); 
     }); 
    } 

    entitiesArray.sort(function(a, b) {return b.start - a.start;}); 

    jQuery.each(entitiesArray, function() { 
     returnStr = substrReplace(returnStr, this.replacement, this.start, this.end - this.start); 
    }); 

    return returnStr; 
} 

然後,您可以使用此功能,像這樣一個JavaScript版本(使用jQuery)...

for(var i in tweetsJsonObj) { 
    var tweet = tweetsJsonObj[i]; 
    var htmlTweetText = tweetTextToHtml(tweet); 

    // Do something with the formatted tweet here ... 
} 
0

關於vita10gy樂於助人的json_tweet_text_to_HTML(),我發現了一個鳴叫,它不能正確地格式化:626125868247552000.

這鳴叫中有不間斷空格。我的解決辦法是用下面的替換函數的第一行:

$return = str_replace("\xC2\xA0", ' ', $tweet->text); 

執行上&nbsp;一個str_replace()覆蓋here

1

這是一個更新的答案,適用於Twitter的新擴展模式。它結合了@ vita10gy的答案和@Hugo的評論(爲了使UTF8兼容),並對其進行了一些小的調整,以處理新的api值。

function utf8_substr_replace($original, $replacement, $position, $length) { 
    $startString = mb_substr($original, 0, $position, "UTF-8"); 
    $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8"); 
    $out = $startString . $replacement . $endString; 
    return $out; 
} 

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true) { 
    // Media urls can show up on the end of the full_text tweet, but twitter doesn't index that url. 
    // The display_text_range indexes show the actual tweet text length. 
    // Cut the string off at the end to get rid of this unindexed url. 
    $return = mb_substr($tweet->full_text, $tweet->display_text_range[0],$tweet->display_text_range[1]); 
    $entities = array(); 

    if($links && is_array($tweet->entities->urls)) 
    { 
     foreach($tweet->entities->urls as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = " <a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>"; 
      $entities[] = $temp; 
     } 
    } 
    if($users && is_array($tweet->entities->user_mentions)) 
    { 
     foreach($tweet->entities->user_mentions as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = " <a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>"; 
      $entities[] = $temp; 
     } 
    } 
    if($hashtags && is_array($tweet->entities->hashtags)) 
    { 
     foreach($tweet->entities->hashtags as $e) 
     { 
      $temp["start"] = $e->indices[0]; 
      $temp["end"] = $e->indices[1]; 
      $temp["replacement"] = " <a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>"; 
      $entities[] = $temp; 
     } 
    } 

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);}); 


    foreach($entities as $item) 
    { 
     $return = utf8_substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]); 
    } 

    return($return); 
} 
相關問題