Q

如何在excel或python中提取單詞的文字？

2017-01-01 60 views 2 likes

2

我有成千上萬行文本是這樣：如何在excel或python中提取單詞的文字？

ksjd 234first special 34-37xy kjsbn 
sde 89second special 22-23xh ewio 
647red special 55fg dsk 
uuire another special 98 
another special 107r 
green special 55-59 ewk 
blue special 31-39jkl

我需要之前，從右邊的「特殊」和編號（或編號範圍）中提取一個字。換句話說，我想：

轉換成表格：

2017-01-01 KitKat

A

回答

1

在此外@RolandSmith寫道，這是一種在Excel中使用正則表達式的方法 - VBA

Option Explicit 
Function ExtractSpecial(S As String, Index As Long) As String 
    Dim RE As Object, MC As Object 
    Const sPat As String = "([a-z]+)\s+(special)\s+([^a-z]+)" 

Set RE = CreateObject("vbscript.regexp") 
With RE 
    .Global = True 
    .ignorecase = True 
    .MultiLine = False 
    .Pattern = sPat 
    If .test(S) = True Then 
     Set MC = .Execute(S) 
     ExtractSpecial = MC(0).submatches(Index - 1) 
    End If 
End With 

End Function

在此UDF的Index參數對應於從匹配集合返回無論是第一，第二或第三子匹配，所以你可以很容易地分割原始字符串轉換成你的三個所需的組件。

既然你寫你有「千行」，你可能更願意運行宏。宏將更快地處理數據，但不是動態的。下面的宏假設您的原始數據位於Sheet2的A列中，並將結果放在同一工作表上的C：E列中。您可以輕鬆地改變這些參數：

Sub ExtractSpec() 
    Dim RE As Object, MC As Object 
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range 
    Dim vSrc As Variant, vRes As Variant 
    Dim I As Long 

Set wsSrc = Worksheets("sheet2") 
Set wsRes = Worksheets("sheet2") 
    Set rRes = wsRes.Cells(1, 3) 

With wsSrc 
    vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)) 
End With 

Set RE = CreateObject("vbscript.regexp") 
With RE 
    .Global = True 
    .MultiLine = False 
    .ignorecase = True 
    .Pattern = "([a-z]+)\s+(special)\s+([^a-z]+)" 

ReDim vRes(1 To UBound(vSrc), 1 To 3) 
For I = 1 To UBound(vSrc) 
    If .test(vSrc(I, 1)) = True Then 
     Set MC = .Execute(vSrc(I, 1)) 
     vRes(I, 1) = MC(0).submatches(0) 
     vRes(I, 2) = MC(0).submatches(1) 
     vRes(I, 3) = MC(0).submatches(2) 
    End If 
Next I 
End With 

Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2)) 
With rRes 
    .EntireColumn.Clear 
    .Value = vRes 
    .EntireColumn.AutoFit 
End With 

End Sub

2017-01-01 17:58:50

+0

這是完美的。我唯一的問題是它會自動將一些數字轉換爲日期。我嘗試了建議的方法[例如將列設置爲文本]，但仍然存在此問題。 – KitKat

+0

我刪除了.EntireColumn.Clear，它工作！ – KitKat

+0

@KitKat嘗試在CStr函數中包裝MC（0）...。或者用單引號前置 –

3

一種快速的方法來做到這一點是使用正則表達式：

In [1]: import re 

In [2]: text = '''234first special 34-37xy       
    ...: 89second special 22-23xh 
    ...: 647red special 55fg 
    ...: another special 98 
    ...: another special 107r 
    ...: green special 55-59 
    ...: blue special 31-39jkl''' 

In [3]: [re.findall('\d*\s*(\S+)\s+(special)\s+(\d+(?:-\d+)?)', line)[0] for line in text.splitlines()] 
Out[3]: 
[('first', 'special', '34-37'), 
('second', 'special', '22-23'), 
('red', 'special', '55'), 
('another', 'special', '98'), 
('another', 'special', '107'), 
('green', 'special', '55-59'), 
('blue', 'special', '31-39')]

2017-01-01 17:51:18

3

在Excel中，您可以用一個公式來兩個詞之間提取文本做如下：

選擇一個空白單元格並輸入公式= MID（A1，SEARCH（「KTE」，A1）+ 3，SEARCH（「feature」，A1）-SEARCH（「KTE」，A1）-4），然後按下Enter按鈕。
拖動填充手柄以填充要應用此公式的範圍。現在僅提取「KTE」和「feature」之間的文本字符串。

注：

在這個公式中，A1是要提取文本的單元格。
KTE和功能是您要提取文本之間的文字。
數字3是KTE的字符長度，數字4等於KTE的字符長度加1。

2017-01-02 13:47:14

相關問題