從字段數組中提取文本

其中一個名爲「resources」的字段具有以下2個內部文檔。從字段數組中提取文本

{ 
    "type": "AWS::S3::Object", 
    "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
}, 
{ 
    "accountId": "934331768510612", 
    "type": "AWS::S3::Bucket", 
    "ARN": "arn:aws:s3:::sms_vild" 
}

我需要拆分ARN字段並獲取它的最後部分。即「reports_201706.schema」，優選使用腳本字段。

我曾嘗試：

1）我檢查的Fileds名單，發現只有2項resources.accountId和resources.type

2）我試圖與日期時間字段，它在腳本提交選項（表達式）中正確工作。

doc['eventTime'].value

3）但是，對於其他文本字段，例如，

doc['eventType'].value

收到此錯誤：

"caused_by":{"type":"script_exception","reason":"link error","script_stack":["doc['eventType'].value","^---- HERE"],"script":"doc['eventType'].value","lang":"expression","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [eventType] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."}}},"status":500}

這意味着我需要改變的映射。有沒有其他方法可以從對象中的嵌套數組中提取文本？

更新：

請點擊這裏查看樣品kibana ...

https://search-accountact-phhofxr23bjev4uscghwda4y7m.us-east-1.es.amazonaws.com/_plugin/kibana/

搜索「ebs_attach.png」，然後檢查資源領域。你會看到2個嵌套數組像這樣...

{ 
    "type": "AWS::S3::Object", 
    "ARN": "arn:aws:s3:::datameetgeo/ebs_attach.png" 
}, 
{ 
    "accountId": "513469704633", 
    "type": "AWS::S3::Bucket", 
    "ARN": "arn:aws:s3:::datameetgeo" 
}

我需要拆分ARN現場並提取最後一部分又是「ebs_attach.png」

如果我能有的，如何將其顯示爲腳本的字段，那麼我可以在發現選項卡上並排查看存儲桶名稱和文件名。

更新2

換句話說，我試圖提取該圖像中顯示爲發現標籤上的一個新的領域的文本。

來源

2017-07-17 shantanuo

儘管您可以使用腳本編寫，但我強烈建議您在索引時提取這些信息。我在這裏提供了兩個例子，這些例子遠不是故障安全的（你需要測試不同的路徑或者根本沒有這個字段），但它應該提供一個基礎，以開始

PUT foo/bar/1 
{ 
    "resources": [ 
    { 
     "type": "AWS::S3::Object", 
     "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
    }, 
    { 
     "accountId": "934331768510612", 
     "type": "AWS::S3::Bucket", 
     "ARN": "arn:aws:s3:::sms_vild" 
    } 
    ] 
} 

# this is slow!!! 
GET foo/_search 
{ 
    "script_fields": { 
    "document": { 
     "script": { 
     "inline": "return params._source.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')" 
     } 
    } 
    } 
} 

# Do this on index time, by adding a pipeline 
PUT _ingest/pipeline/my-pipeline-id 
{ 
    "description" : "describe pipeline", 
    "processors" : [ 
    { 
     "script" : { 
     "inline": "ctx.filename = ctx.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')" 
     } 
    } 
    ] 
} 

# Store the document, specify the pipeline 
PUT foo/bar/1?pipeline=my-pipeline-id 
{ 
    "resources": [ 
    { 
     "type": "AWS::S3::Object", 
     "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema" 
    }, 
    { 
     "accountId": "934331768510612", 
     "type": "AWS::S3::Bucket", 
     "ARN": "arn:aws:s3:::sms_vild" 
    } 
    ] 
} 

# lets check the filename field of the indexed document by getting it 
GET foo/bar/1 

# We can even search for this file now 
GET foo/_search 
{ 
    "query": { 
    "match": { 
     "filename": "reports_201706.schema" 
    } 
    } 
}

來源

2017-07-19 12:53:05 alr

注：被認爲是「資源」是一種陣列

NSArray *array_ARN_Values = [resources valueForKey:@"ARN"];

的希望它會爲你工作！

來源

2017-07-17 05:24:59 Sandy

這是行不通的。請參閱最新的問題。 – shantanuo

我如何知道資源是否是一種數組？我沒有在字段列表中看到「資源」。但是，來自資源的類型，ARN和accountid參數被索引。 – shantanuo

從字段數組中提取文本

回答

相關問題