2012-11-15 37 views
3

我正在讀取一個.txt文件到SAS中,該文件使用「|」作爲分隔符。問題是有一列使用「|」作爲字詞分隔符,而不是像分隔符那樣行事,這需要在一列中。使用分隔符在錯誤位置讀取SAS中的文本文件

例如txt文件看起來像:

apple|fruit|Healthy|choices|of|food|12|2012|chart 

需要像這樣的SAS數據集:

apple | fruit | Healthy choices of Food | 12 | 2012 | chart 

我如何消除 「|」 「健康的食物選擇」之間?

+1

你怎麼知道水果和健康之間的分隔符是正確的,但健康和選擇之間的分隔符不是正確的?客觀地,你怎麼知道? –

+0

我認爲OP意味着第2個和最後3個變量不能包含額外的分隔符。我的解決方案假定。 – itzy

+0

這就是itzy – user1825366

回答

0

這不是特別優雅,但它會工作:

data tmp; 
input tmp $50.; 
cards; 
apple|fruit|Healthy|choices|of|food|12|2012|chart 
; 
run; 

data tmp; 
set tmp; 
var1 = scan(tmp,1,'|'); 
var2 = scan(tmp,2,'|'); 
var4 = scan(tmp,-3,'|'); 
var5 = scan(tmp,-2,'|'); 
var6 = scan(tmp,-1,'|'); 

var3 = tranwrd(tmp,trim(var1)||"|"||trim(var2),""); 
var3 = tranwrd(var3,trim(var4)||"|"||trim(var5)||"|"||trim(var6),""); 
var3 = tranwrd(var3,"|"," "); 
run; 
2

我認爲這會做你想要什麼:

data tmp1; 
    length tmp $100; 
    input tmp $; 
    cards; 
apple|fruit|Healthy|choices|of|food|12|2012|chart 
apple|fruit|Healthy|choices|of|food|and|lots|of|other|stuff|12|2012|chart 
; 
run; 

data tmp2; 
    set tmp1; 
    num_delims=length(tmp)-length(compress(tmp,"|")); 
    expected_delims=5; 
    extra_delims=num_delims-expected_delims; 
    length new_var $100; 
    i=1; 
    do while(scan(tmp,i,"|") ne ""); 
    if i<=2 or (extra_delims+2)<i<=num_delims then new_var=trim(new_var)||scan(tmp,i,"|")||"|"; 
    else new_var=trim(new_var)||scan(tmp,i,"|")||"#"; 
    i+1; 
    end; 
    new_var=left(tranwrd(new_var,"#"," ")); 
run; 
+0

+1。尼斯。我認爲任何解決方案要麼使用'scan'和'tranwrd',要麼使用一些'prx'(正則表達式)函數。 – itzy

0

擴大一點上Itzy的答案,這裏是另一種可能解決方案:

data want; 
    /* Define variables */ 
    attrib item length=$10 label='Item'; 
    attrib class length=$10 label='Family'; 
    attrib desc length=$80 label='Item Description'; 
    attrib count length=8 label='Some number'; 
    attrib year length=$4 label='Year'; 
    attrib somevar length=$10 label='Some variable'; 

    length countc $8; /* A temp variable */ 

    infile 'c:\temp\delimited_temp.txt' lrecl=1000 truncover; 
    input; 
    item = scan(_infile_,1,'|','mo'); 
    class = scan(_infile_,2,'|','mo'); 
    countc = scan(_infile_,-3,'|','mo'); /* Temp var for numeric field */ 
    count = inputn(countc,'8.');   /* Re-read the numeric field */ 
    year = scan(_infile_,-2,'|','mo'); 
    somevar = scan(_infile_,-1,'|','mo'); 

    desc = tranwrd(
      substr(_infile_ 
       ,length(item)+length(class)+3 
       ,length(_infile_) 
        - (length(item)+length(class)+length(countc) 
         +length(year)+length(somevar)+5)) 
      ,'|',' '); 
    drop countc; 
run; 

在這種情況下它的關鍵是它直接讀取您的文件並處理分隔符你自己。這可能會很棘手,並且需要您的數據文件完全按照描述。一個更好的解決方案是回到誰提供這些數據,並要求他們以更合適的形式交付給你。祝你好運!

0

另一種可能的解決方法。

data tmp; 
infile '/path/to/textfile'; 
input tmp :$100.; 
array varlst (*) $30 v1-v6; 
a=countw(tmp,'|'); 
do i=1 to dim(varlst); 
if i<=2 then 
    varlst(i) = scan(tmp,i,'|'); 
else if i>=4 then 
    varlst(i) = scan(tmp,a-(dim(varlst)-i),'|'); 
else do j=3 to a-(dim(varlst)-i)-1; 
    varlst(i)=catx(' ', varlst(i),scan(tmp,j,'|')); 
    end; 
end; 
drop tmp a i j; 
run; 
相關問題