2012-02-21 42 views
0

好的,我有一個文本文件,它會定期更改,我需要在屏幕上顯示並可能插入到數據庫中。文本格式如下:從文件中抓取文本以查看 - 使用非標準格式 - php

"Stranglehold" 
Written by Ted Nugent 
Performed by Ted Nugent 
Courtesy of Epic Records 
By Arrangement with 
Sony Music Licensing 
"Chateau Lafltte '59 Boogie" 
Written by David Peverett 
and Rod Price 
Performed by Foghat 
Courtesy of Rhino Entertainment 
Company and Bearsville Records 
By Arrangement with 
Warner Special Products 

我只需要歌名(引號之間的信息),這是誰寫的,誰它是由執行。正如你所看到的,由行寫成的行可能不止一行。

我通過查找問題,這是一個類似Scraping a plain text file with no HTML?,我能修改下面的解決方案https://stackoverflow.com/a/8432563/827449,這樣它至少會發現引號之間的信息,並把這些在數組中。然而,我不知道在哪裏以及如何將下一個preg_match語句寫入並執行,以便它將它添加到具有正確信息的數組中,假設我有正確的正則表達式。這是修改後的代碼。

<?php 
$in_name = 'in.txt'; 
$in = fopen($in_name, 'r') or die(); 

function dump_record($r) { 
    print_r($r); 
} 
    $current = array(); 
    while ($line = fgets($fh)) { 

     /* Skip empty lines (any number of whitespaces is 'empty' */ 
     if (preg_match('/^\s*$/', $line)) continue; 

     /* Search for 'things between quotes' stanzas */ 
     if (preg_match('/(?<=\")(.*?)(?=\")/', $line, $start)) { 
      /* If we already parsed a record, this is the time to dump it */ 
      if (!empty($current)) dump_record($current); 

     /* Let's start the new record */ 
     $current = array('id' => $start[1]); 
    } 
    else if (preg_match('/^(.*):\s+(.*)\s*/', $line, $keyval)) { 
     /* Otherwise parse a plain 'key: value' stanza */ 
     $current[ $keyval[1] ] = $keyval[2]; 
    } 
    else { 
     error_log("parsing error: '$line'"); 
    } 
} 
/* Don't forget to dump the last parsed record, situation 
* we only detect at EOF (end of file) */ 
if (!empty($current)) dump_record($current); 

fclose($in); 

任何幫助將是偉大的,因爲我現在在我的頭上,我有限的PHP和正則表達式的知識。

+0

如果該文件的格式是不會隨時更改很快我開始與不具有任何_regex_和在絕對必要時只鑽進一個解決方案。 – quickshiftin 2012-02-21 08:59:17

+0

有沒有一個規則背後的線休息? 「犀牛娛樂 公司」分爲兩行 – Eric 2012-02-21 14:40:11

+0

另外,如果我的公司名稱包含單詞「Courtesy」或「Written」,該怎麼辦? – Eric 2012-02-21 14:41:26

回答

1

如何:

$str =<<<EOD 
"Stranglehold" 
Written by Ted Nugent 
Performed by Ted Nugent 
Courtesy of Epic Records 
By Arrangement with 
Sony Music Licensing 
"Chateau Lafltte '59 Boogie" 
Written by David Peverett 
and Rod Price 
Performed by Foghat 
Courtesy of Rhino Entertainment 
Company and Bearsville Records 
By Arrangement with 
Warner Special Products 

EOD; 

preg_match_all('/"([^"]+)".*?Written by (.*?)Performed by (.*?)Courtesy/s', $str, $m, PREG_SET_ORDER); 
print_r($m); 

輸出:

Array 
(
    [0] => Array 
     (
      [0] => "Stranglehold" 
Written by Ted Nugent 
Performed by Ted Nugent 
Courtesy 
      [1] => Stranglehold 
      [2] => Ted Nugent 

      [3] => Ted Nugent 

     ) 

    [1] => Array 
     (
      [0] => "Chateau Lafltte '59 Boogie" 
Written by David Peverett 
and Rod Price 
Performed by Foghat 
Courtesy 
      [1] => Chateau Lafltte '59 Boogie 
      [2] => David Peverett 
and Rod Price 

      [3] => Foghat 

     ) 

) 
+0

感謝您的回覆。我已經編輯了一點,但這似乎是做我想要的。 – bazooka13 2012-02-24 06:30:16

+0

@ bazooka13:不客氣。 – Toto 2012-02-24 08:50:35

1

下面是該問題的正則表達式解決方案。請記住,你並不需要這裏的正則表達式。請參閱下面的第二個選項。

<?php 

$string = '"Stranglehold" 
Written by Ted Nugent 
Performed by Ted Nugent 
Courtesy of Epic Records 
By Arrangement with 
Sony Music Licensing 
"Chateau Lafltte \'59 Boogie" 
Written by David Peverett 
and Rod Price 
Performed by Foghat 
Courtesy of Rhino Entertainment 
Company and Bearsville Records 
By Arrangement with 
Warner Special Products'; 

// Titles delimit a record 
$title_pattern = '#"(?<title>[^\n]+)"\n(?<meta>.*?)(?=\n"|$)#s'; 
// From the meta section we need these tokens 
$meta_keys = array(
    'Written by ' => 'written', 
    'Performed by ' => 'performed', 
    'Courtesy of ' => 'courtesy', 
    "By Arrangement with\n" => 'arranged', 
); 
$meta_pattern = '#(?<key>' . join(array_keys($meta_keys), "|") . ')(?<value>[^\n$]+)(?:\n|$)#ims'; 


$songs = array(); 
if (preg_match_all($title_pattern, $string, $matches, PREG_SET_ORDER)) { 
    foreach ($matches as $match) { 
     $t = array(
      'title' => $match['title'], 
     ); 

     if (preg_match_all($meta_pattern, $match['meta'], $_matches, PREG_SET_ORDER)) { 
      foreach ($_matches as $_match) { 
       $k = $meta_keys[$_match['key']]; 
       $t[$k] = $_match['value']; 
      } 
     } 

     $songs[] = $t; 
    } 
} 

將導致

$songs = array (
    array (
    'title'  => 'Stranglehold', 
    'written' => 'Ted Nugent', 
    'performed' => 'Ted Nugent', 
    'courtesy' => 'Epic Records', 
    'arranged' => 'Sony Music Licensing', 
), 
    array (
    'title'  => 'Chateau Lafltte \'59 Boogie', 
    'written' => 'David Peverett', 
    'performed' => 'Foghat', 
    'courtesy' => 'Rhino Entertainment', 
    'arranged' => 'Warner Special Products', 
), 
); 

沒有正則表達式的一個解決方案也是可能的,雖然稍微詳細:

<?php 

$string = '"Stranglehold" 
Written by Ted Nugent 
Performed by Ted Nugent 
Courtesy of Epic Records 
By Arrangement with 
Sony Music Licensing 
"Chateau Lafltte \'59 Boogie" 
Written by David Peverett 
and Rod Price 
Performed by Foghat 
Courtesy of Rhino Entertainment 
Company and Bearsville Records 
By Arrangement with 
Warner Special Products'; 

$songs = array(); 
$current = array(); 
$lines = explode("\n", $string); 
// can't use foreach if we want to extract "By Arrangement" 
// cause it spans two lines 
for ($i = 0, $_length = count($lines); $i < $_length; $i++) { 
    $line = $lines[$i]; 
    $length = strlen($line); // might want to use mb_strlen() 

    // if line is enclosed in " it's a title 
    if ($line[0] == '"' && $line[$length - 1] == '"') { 
     if ($current) { 
      $songs[] = $current; 
     } 

     $current = array(
      'title' => substr($line, 1, $length - 2), 
     ); 

     continue; 
    } 

    $meta_keys = array(
     'By Arrangement with' => 'arranged', 
    ); 

    foreach ($meta_keys as $key => $k) { 
     if ($key == $line) { 
      $i++; 
      $current[$k] = $lines[$i]; 
      continue; 
     } 
    } 

    $meta_keys = array(
     'Written by ' => 'written', 
     'Performed by ' => 'performed', 
     'Courtesy of ' => 'courtesy', 
    ); 

    foreach ($meta_keys as $key => $k) { 
     if (strpos($line, $key) === 0) { 
      $current[$k] = substr($line, strlen($key)); 
      continue 2; 
     } 
    }  
} 

if ($current) { 
    $songs[] = $current; 
} 
+0

感謝您的回覆。我沒有用這個來解決上面的問題,但它對於我在同一個項目中遇到的其他問題有用。所以謝謝你的信息。 – bazooka13 2012-02-24 06:31:43