2011-02-03 233 views
5

所以我的問題是這樣的,我有一個看起來像這樣的文件:解析字符串

[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1 

這當然會轉化爲

' This is an example file!' 

我正在尋找一種方法來解析將原始內容放入最終內容中,以便[BACKSPACE]將刪除最後一個字符(包含空格),並且多個後退將刪除多個字符。 [SHIFT]對我來說並不重要。感謝所有的幫助!

+0

是後退鍵和[SHIFT]你需要擔心的唯一標記? – inspectorG4dget 2011-02-03 03:12:51

回答

1

這是一種方式,但它感覺有點ha。。可能有更好的方法。

def process_backspaces(input, token='[BACKSPACE]'): 
    """Delete character before an occurence of "token" in a string.""" 
    output = '' 
    for item in (input+' ').split(token): 
     output += item 
     output = output[:-1] 
    return output 

def process_shifts(input, token='[SHIFT]'): 
    """Replace characters after an occurence of "token" with their uppecase 
    equivalent. (Doesn't turn "1" into "!" or "2" into "@", however!).""" 
    output = '' 
    for item in (' '+input).split(token): 
     output += item[0].upper() + item[1:] 
    return output 

test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1' 
print process_backspaces(process_shifts(test_string)) 
0

看來你可以使用正則表達式搜索(某事),後退鍵,並沒有取代它......

re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', '')) 

不知道你的意思「的多個空格刪除多個字符」 。

+1

-1這將如何工作「blah [BACKSPACE] [BACKSPACE] [BACKSPACE] arf」? – payne 2011-02-03 03:12:45

+0

它應該返回'barf' – 2011-02-03 03:18:48

+0

但它需要刪除一個空格之前的退格以及'[BACKSPACE]'itslef – 2011-02-03 03:20:07

1

如果你不關心的變化,只是去掉它們,負載

(defun apply-bspace() 
    (interactive) 
    (let ((result (search-forward "[BACKSPACE]"))) 
    (backward-delete-char 12) 
    (when result (apply-bspace)))) 

,打M-x apply-bspace在查看文件。它是Elisp,不是Python,但它符合您的初始要求「something I can download for free to a PC」。

編輯:如果您想將其應用於數字,則Shift更復雜(以便​​=>@[SHIFT]3 =>#等)。在字母上工作的天真的方式是

(defun apply-shift() 
    (interactive) 
    (let ((result (search-forward "[SHIFT]"))) 
    (backward-delete-char 7) 
    (upcase-region (point) (+ 1 (point))) 
    (when result (apply-shift)))) 
0

您需要閱讀輸入,提取令牌,識別它們,並給他們一個意思。

這是我會怎麼做:

# -*- coding: utf-8 -*- 

import re 

upper_value = { 
    1: '!', 2:'"', 
} 

tokenizer = re.compile(r'(\[.*?\]|.)') 
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1" 
result = "" 

shift = False 

for token in tokenizer.findall(origin): 
    if not token.startswith("["): 
     if(shift): 
      shift = False 
      try: 
       token = upper_value[int(token)] 
      except ValueError: 
       token = token.upper() 

     result = result + token 
    else: 
     if(token == "[SHIFT]"): 
      shift = True 
     elif(token == "[BACKSPACE]"): 
      result = result[0:-1] 

這不是最快的,無論是優雅的解決方案,但我認爲這是一個良好的開端。

希望它可以幫助:-)

1

這不正是你想要什麼:

def shift(s): 
    LOWER = '`1234567890-=[];\'\,./' 
    UPPER = '[email protected]#$%^&*()_+{}:"|<>?' 

    if s.isalpha(): 
     return s.upper() 
    else: 
     return UPPER[LOWER.index(s)] 

def parse(input): 
    input = input.split("[BACKSPACE]") 
    answer = '' 
    i = 0 
    while i<len(input): 
     s = input[i] 
     if not s: 
      pass 
     elif i+1<len(input) and not input[i+1]: 
      s = s[:-1] 
     else: 
      answer += s 
      i += 1 
      continue 
     answer += s[:-1] 
     i += 1 

    return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i) 

>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1") 
>>> This is an example file!