2012-03-09 45 views
0

從給定的字符串,即$ code我只想要所有的語言到語言數組,所有的代碼來編碼數組,最後所有的家庭到家庭數組,我怎樣才能做到這一點在PHP?我曾嘗試使用DOM,但它不可能任何其他方式將不勝感激,在此先感謝。將字符串分隔成一個數組?

<?php 
$codes = '<pre> 
LANGUAGE  CODE  LANGUAGE FAMILY 

AFAR   AA  HAMITIC 
ABKHAZIAN  AB  IBERO-CAUCASIAN 
AFRIKAANS  AF  GERMANIC 
AMHARIC   AM  SEMITIC 
ARABIC   AR  SEMITIC 
ASSAMESE  AS  INDIAN 
AYMARA   AY  AMERINDIAN 
AZERBAIJANI  AZ  TURKIC/ALTAIC 
BASHKIR   BA  TURKIC/ALTAIC 
BYELORUSSIAN BE  SLAVIC 
BULGARIAN  BG  SLAVIC 
BIHARI   BH  INDIAN 
BISLAMA   BI  [not given] 
BENGALI;BANGLA BN  INDIAN 
TIBETAN   BO  ASIAN 
BRETON   BR  CELTIC 
CATALAN   CA  ROMANCE 
CORSICAN  CO  ROMANCE 
CZECH   CS  SLAVIC 
WELSH   CY  CELTIC 
DANISH   DA  GERMANIC 
GERMAN   DE  GERMANIC 
BHUTANI   DZ  ASIAN 
GREEK   EL  LATIN/GREEK 
ENGLISH   EN  GERMANIC 
ESPERANTO  EO  INTERNATIONAL AUX. 
SPANISH   ES  ROMANCE 
ESTONIAN  ET  FINNO-UGRIC 
BASQUE   EU  BASQUE 
PERSIAN (farsi) FA  IRANIAN 
FINNISH   FI  FINNO-UGRIC 
FIJI   FJ  OCEANIC/INDONESIAN 
FAROESE   FO  GERMANIC 
FRENCH   FR  ROMANCE 
FRISIAN   FY  GERMANIC 
IRISH   GA  CELTIC 
SCOTS GAELIC GD  CELTIC 
GALICIAN  GL  ROMANCE 
GUARANI   GN  AMERINDIAN 
GUJARATI  GU  INDIAN 
HAUSA   HA  NEGRO-AFRICAN 
HEBREW   HE  SEMITIC [*Changed 1989 from original ISO 639:1988, IW] 
HINDI   HI  INDIAN 
CROATIAN  HR  SLAVIC 
HUNGARIAN  HU  FINNO-UGRIC 
ARMENIAN  HY  INDO-EUROPEAN (OTHER) 
INTERLINGUA  IA  INTERNATIONAL AUX. 
INTERLINGUE  IE  INTERNATIONAL AUX. 
INUPIAK   IK  ESKIMO 
INDONESIAN  ID  OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] 
ICELANDIC  IS  GERMANIC 
ITALIAN   IT  ROMANCE 
INUKTITUT  IU  [  ] 
JAPANESE  JA  ASIAN 
JAVANESE  JV  OCEANIC/INDONESIAN 
GEORGIAN  KA  IBERO-CAUCASIAN 
KAZAKH   KK  TURKIC/ALTAIC 
GREENLANDIC  KL  ESKIMO 
CAMBODIAN  KM  ASIAN 
KANNADA   KN  DRAVIDIAN 
KOREAN   KO  ASIAN 
KASHMIRI  KS  INDIAN 
KURDISH   KU  IRANIAN 
KIRGHIZ   KY  TURKIC/ALTAIC 
LATIN   LA  LATIN/GREEK 
LINGALA   LN  NEGRO-AFRICAN 
LAOTHIAN  LO  ASIAN 
LITHUANIAN  LT  BALTIC 
LATVIAN;LETTISH LV  BALTIC 
MALAGASY  MG  OCEANIC/INDONESIAN 
MAORI   MI  OCEANIC/INDONESIAN 
MACEDONIAN  MK  SLAVIC 
MALAYALAM  ML  DRAVIDIAN 
MONGOLIAN  MN  [not given] 
MOLDAVIAN  MO  ROMANCE 
MARATHI   MR  INDIAN 
MALAY   MS  OCEANIC/INDONESIAN 
MALTESE   MT  SEMITIC 
BURMESE   MY  ASIAN 
NAURU   NA  [not given] 
NEPALI   NE  INDIAN 
DUTCH   NL  GERMANIC 
NORWEGIAN  NO  GERMANIC 
OCCITAN   OC  ROMANCE 
AFAN (OROMO) OM  HAMITIC 
ORIYA   OR  INDIAN 
PUNJABI   PA  INDIAN 
POLISH   PL  SLAVIC 
PASHTO;PUSHTO PS  IRANIAN 
PORTUGUESE  PT  ROMANCE 
QUECHUA   QU  AMERINDIAN 
RHAETO-ROMANCE RM  ROMANCE 
KURUNDI   RN  NEGRO-AFRICAN 
ROMANIAN  RO  ROMANCE 
RUSSIAN   RU  SLAVIC 
KINYARWANDA  RW  NEGRO-AFRICAN 
SANSKRIT  SA  INDIAN 
SINDHI   SD  INDIAN 
SANGHO   SG  NEGRO-AFRICAN 
SERBO-CROATIAN SH  SLAVIC 
SINGHALESE  SI  INDIAN 
SLOVAK   SK  SLAVIC 
SLOVENIAN  SL  SLAVIC 
SAMOAN   SM  OCEANIC/INDONESIAN 
SHONA   SN  NEGRO-AFRICAN 
SOMALI   SO  HAMITIC 
ALBANIAN  SQ  INDO-EUROPEAN (OTHER) 
SERBIAN   SR  SLAVIC 
SISWATI   SS  NEGRO-AFRICAN 
SESOTHO   ST  NEGRO-AFRICAN 
SUNDANESE  SU  OCEANIC/INDONESIAN 
SWEDISH   SV  GERMANIC 
SWAHILI   SW  NEGRO-AFRICAN 
TAMIL   TA  DRAVIDIAN 
TELUGU   TE  DRAVIDIAN 
TAJIK   TG  IRANIAN 
THAI   TH  ASIAN 
TIGRINYA  TI  SEMITIC 
TURKMEN   TK  TURKIC/ALTAIC 
TAGALOG   TL  OCEANIC/INDONESIAN 
SETSWANA  TN  NEGRO-AFRICAN 
TONGA   TO  OCEANIC/INDONESIAN 
TURKISH   TR  TURKIC/ALTAIC 
TSONGA   TS  NEGRO-AFRICAN 
TATAR   TT  TURKIC/ALTAIC 
TWI    TW  NEGRO-AFRICAN 
UIGUR   UG  [  ] 
UKRAINIAN  UK  SLAVIC 
URDU   UR  INDIAN 
UZBEK   UZ  TURKIC/ALTAIC 
VIETNAMESE  VI  ASIAN 
VOLAPUK   VO  INTERNATIONAL AUX. 
WOLOF   WO  NEGRO-AFRICAN 
XHOSA   XH  NEGRO-AFRICAN 
YIDDISH   YI  GERMANIC [*Changed 1989 from original ISO 639:1988, JI] 
YORUBA   YO  NEGRO-AFRICAN 
ZHUANG   ZA  [  ] 
CHINESE   ZH  ASIAN 
ZULU   ZU  NEGRO-AFRICAN 
</pre>'; 

$doc= new DOMDocument(); 
$doc->loadHTML($codes); 

$xmlL = simplexml_import_dom($doc); 
$pathL = $xmlL->xpath('//pre'); 
print_r($pathL); 

?> 
+0

無論這個代碼來自哪裏,我建議重做構建它的函數。我建議將已存儲的數組轉換爲HTML,而不是將存儲的HTML轉換爲數組。 – Joseph 2012-03-09 09:13:30

+1

看看http://www.php.net/manual/en/function.str-getcsv.php – 2012-03-09 09:14:05

+0

[不規則空間和選項卡文件split/explode columnwise]的可能dup(http://stackoverflow.com/q/8349551/90527),[將字符串拆分爲PHP部分](http://stackoverflow.com/q/715747/90527),[根據數組中的值拆分字符串](http://stackoverflow.com/ q/891204/90527)以及許多其他許多人。 – outis 2012-03-09 09:27:48

回答

1

明顯生成列表,讓你有更好的運氣固定發電機,但如果你堅持這樣一個列表,下面應該分析它的方式,你想要:

$langs_ar = array(); 
$codes_ar = array(); 
$families_ar = array(); 

foreach(preg_split('/[\r\n]+/', $codes) as $line) 
{ 
    if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches)) 
    { 
     $langs_ar[] = $matches[1]; 
     $codes_ar[] = $matches[2]; 
     $families_ar[] = $matches[3]; 
    }                                    
} 

哦,而不是3個數組,我推薦一個數組存儲散列3個字段,而不是;或者使用3個屬性lang,code和family創建自己的對象。

編輯:更短的方式做同樣的是這樣的:

preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER); 
var_dump($matches); 

$匹配現在是「物」的所有行的數組,其中索引:

  • 0是全線
  • 1是語言
  • 2是代碼
  • 3是家庭

只是迭代完成任何你想做的事情。

+0

是的男人它工作正常 – 2012-03-09 10:36:10

+0

你能解釋這是什麼請/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/ – 2012-03-09 10:37:32

+0

這只是一個正則表達式,請參閱php doc here:http://www.php.net/manual/en/book.pcre.php – 2012-03-09 13:58:29

1

我想你應該看看php的爆炸函數。

這樣,你可以先用「\ n」字符分隔(分隔線),然後得到第一個數組。 然後對於每一行,您可以通過\ t(假設您有分隔您的數據的選項卡)來爆炸,以獲得具有3個單獨條目的數組,然後將這些數組中的每個數組推送到您想要的數組中。

喜歡的東西:

$codes_array = array(); 
foreach($line as explode("\n",$codes)){ 
    $codes_array[] = explode("\t",$line); 
} 
+1

*對於多行字符串定義,請使用雙引號。*爲什麼? – Yoshi 2012-03-09 09:22:07

+0

因爲,即使現在它工作,以前它沒有被標準支持。 – kappa 2012-03-09 11:22:01

+0

什麼?您可能想分享一個鏈接以供參考。因爲在過去的8年中,我從未聽說過這一點。 – Yoshi 2012-03-09 11:30:31

相關問題