2016-08-04 111 views
0

我試圖找到一個函數,它會索引一個字符的第n個實例。SAS:如何在字符串中找到第n個字符/字符組的第n個實例?

例如,如果我有字符串ABABABBABSSSDDEE並且我想查找A的第三個實例,那麼我該怎麼做?如果我想找到AB

ABAB 的第四實例的 BB AB SSSDDEE

data HAVE; 
    input STRING $; 
    datalines; 
ABABABBASSSDDEE 
; 
RUN; 
+1

你到目前爲止嘗試過什麼。你有沒有讀過'正則表達式'。 'SAS'使用'PERL'正則表達式引擎。 – gwillie

+1

你也可以在循環中使用FIND來查找簡單的例子,雖然正則表達式對於複雜情況是更好的方法... – kl78

+0

@gwillie到目前爲止沒有'regex',但我會研究它..... .....在過去使用'index'和'find'與'substr'的​​組合,但是這個下一級複雜度可能需要regex。 TY。 –

回答

0
data _null_; 
findThis = 'A'; *** substring to find; 
findIn = 'ADABAACABAAE'; **** the string to search; 
instanceOf=1; *** and the instance of the substring we want to find; 
pos = 0; 
len = 0; 
startHere = 1; 
endAt = length(findIn); 
n = 0; *** count occurrences of the pattern; 
pattern = '/' || findThis || '/'; 
rx = prxparse(pattern); 
CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len); 
if pos le 0 then do; 
    put 'Could not find ' findThis ' in ' findIn; 
end; 
else do while (pos gt 0); 
    n+1; 
    if n eq instanceOf then leave; 
    CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len); 
end; 
if n eq instanceOf then do; 
    put 'found ' instanceOf 'th instance of ' findThis ' at position ' pos ' in ' findIn; 
end; 
else do; 
    put 'No ' instanceOf 'th instance of ' findThis ' found'; 
end; 
run; 
0

下面是使用find()功能和datastep內做循環的解決方案。然後我拿這個代碼,並把它放到一個proc fcmp程序中來創建我自己的函數find_n()。這將大大簡化任何使用此任務的任務並允許代碼重用。

定義數據:

data have; 
    length string $50; 
    input string $; 
    datalines; 
ABABABBABSSSDDEE 
; 
run; 

DO循環解決方案:

data want; 
    set have; 
    search_term = 'AB'; 
    nth_time = 4; 
    counter = 0; 
    last_find = 0; 

    start = 1; 
    pos = find(string,search_term,'',start); 
    do while (pos gt 0 and nth_time gt counter); 
    last_find = pos; 
    start = pos + 1; 
    counter = counter + 1; 
    pos = find(string,search_term,'',start+1); 
    end; 

    if nth_time eq counter then do;  
    put "The nth occurrence was found at position " last_find; 
    end; 
    else do; 
    put "Could not find the nth occurrence"; 
    end; 

run; 

定義proc fcmp功能:

注意:如果第n-發生不能被發現返回0.

options cmplib=work.temp.temp; 

proc fcmp outlib=work.temp.temp; 

    function find_n(string $, search_term $, nth_time) ;  

    counter = 0; 
    last_find = 0; 

    start = 1; 
    pos = find(string,search_term,'',start); 
    do while (pos gt 0 and nth_time gt counter); 
     last_find = pos; 
     start = pos + 1; 
     counter = counter + 1; 
     pos = find(string,search_term,'',start+1); 
    end; 

    result = ifn(nth_time eq counter, last_find, 0); 

    return (result); 
    endsub; 

run; 

proc fcmp用法:

注意這兩次調用該函數。第一個例子是顯示原始請求解決方案。第二個例子顯示當找不到匹配時會發生什麼。

data want; 
    set have; 
    nth_position = find_n(string, "AB", 4); 
    put nth_position =; 

    nth_position = find_n(string, "AB", 5); 
    put nth_position =; 
run; 
+0

爲什麼downvote? –