2011-03-10 30 views
1

客戶端以.doc格式將文件上載到服務器目錄,並且使用POI按照Ray Camdens發佈的內容抽取文本here內容保存在文本/備註字段中在MySQL數據庫中,並作爲wsdl使用的Web服務提供。所有這些都按預期工作,直到Web服務的使用者訪問包含某些(我假定)控制字符的記錄,此時Web服務會拋出500錯誤。作爲控制字符上的WSDL扼流圈消耗的Coldfusion Web服務

在數據庫中,問題行似乎有控制字符,並且當Firefox中顯示文本字段時也有奇怪的字符。 enter image description here 的web服務,只是返回的返回類型的CF查詢=「任何」,併爲

​​

我相信在WSDL無法發送這些字符調用,所以是有辦法對其進行編碼,還是我只是必須使用正則表達式或其他東西去除它們?

<cfcomponent> 
    <cffunction output="false" access="remote" returntype="any" name="getPendingReferrals"> 
     <cfargument required="false" name="userName" type="string"/> 
     <cfargument required="false" name="password" type="string"/> 
     <cfargument required="false" name="maxrows" type="numeric" default="20"/>  
     <cfset var q=""> 

<cfinvoke component="cfcs.security" method="checkAuthenticated" returnvariable="checkAuth"> 
    <cfinvokeargument name="username" value="#arguments.userName#"> 
    <cfinvokeargument name="password" value="#arguments.password#"> 
</cfinvoke> 


<cfif checkAuth.authenticates is "true"> 
    <!--- log the login ---> 
    <cfset filename=#datepart("yyyy", now())#&#datepart("m", now())#&#datepart("d", now())#&"loginlog.txt"> 
    <CFSET OUTFILE = "#application.Root#"&"logs\"&"#filename#"> 
    <cfif #FileExists(OUTFILE)# is "Yes"> 
     <cffile action="append" file="#OUTFILE#" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#"> 
    <cfelse> 
     <CFFILE action="write" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#" file="#OUTFILE#"> 
    </cfif> 


     <cfif checkAuth.organisationID is 1> 
      <cfset toStr="toID=1"> 
     <cfelseif checkAuth.organisationID is 28> 
      <cfset toStr="(toID=28 OR toID=29)"> 
     </cfif> 

     <cfquery name="q" datasource='mySqlData' maxrows=#arguments.maxrows#> 
      SELECT messages.messageID, messages.toID, messages.fromID AS referrerID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.fromID) as referrerName,messages.threadID, messages.messageBody, messages.dateCreated, messages.dateSent, 
      messages.deleted, messages.createdByID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.createdByID) as createdByName, (SELECT organisationName FROM organisations WHERE messages.originatingOrganisationID = organisations.organisationID) as originatingOrganisationName, messages.originatingOrganisationID, messages.viewed, messages.referral, messages.actioned, messages.patientID, messages.refTypeID, messages.specialtyID, organisations.organisationName AS toOrganisationName, patients.nhsNumber AS patientNHSnumber, patients.patientTitle, patients.patientLastname, patients.patientFirstname, patients.patientDOB, patients.address1 as patientAddress1, patients.address2 AS patientAddress2, patients.address3 AS patientAddress3, patients.address4 AS patientAddress4, patients.postcode AS patientPostcode, patients.patientPhone1 
      FROM users INNER JOIN (organisations INNER JOIN (patients INNER JOIN messages ON patients.patientID = messages.patientID) ON organisations.organisationID = messages.toID) ON users.userID = messages.fromID 
      WHERE #toStr# 
      AND NOT actioned 
      AND NOT originatingOrganisationID=3 
      ORDER BY messageID 
      </cfquery> 


      <cfif isQuery(q)> 
       <cfreturn q> 
      <cfelse> 
       <cfreturn "Error : in query"> 
      </cfif> 



<cfelse> 
    <cfreturn "Error : failed to authenticate"> 
</cfif>  

</cffunction> 
+0

這將有助於看到getPendingReferrals方法的代碼很可能你將不得不爲XmlFormat數據,以便它是有效的XML – 2011-03-10 20:35:19

+0

我把getPendingReferrals.cfc放了起來。它真的只是返回一個查詢,它的內容在messageBody字段導致問題。 – Saul 2011-03-10 23:56:27

回答

2

您應該使用正則表達式去除所有高ascii字符。我發現的最好的之一是written up by Ben Nadel, here。 (雖然它不是完美的,I made some improvements to it in the comments

基本上,如果你只是想剝離出高ASCII字符,這樣做:

<cfset result = reReplace(messageBody, "[^\x20-\x7E\x0D\x09]", "", "all") /> 

這個正則表達式採用白名單方式,只允許可打印的字符保留:

  • \x20-\x7E = {space}! 「#$%&」()* +, -/0-9:;?< => @ AZ [\]^_`AZ {|}〜
  • \x0D =回車
  • \x09 =水平製表

如果你喜歡這種方法來消毒,可以使用肖恩·柯尼的方法,用循環更新查詢:

<cfloop query="q"> 
    <cfset querySetCell(
     q, 
     "messageBody", 
     clean(q.messageBody[q.currentRow]), 
     q.currentRow 
    )/> 
</cfloop> 
<cffunction name="clean"> 
    <cfargument name="in" /> 
    <cfreturn reReplace(arguments.in, "[^\x20-\x7E\x0D\x09]", "", "all") /> 
</cffunction> 
+0

亞當的解決方案+1。它肯定會比單純使用xmlFormat更安全 – 2011-03-11 15:24:12

+0

它是一個愛情節,+ 1的全面感謝,亞當和肖恩 – Saul 2011-03-11 17:42:16

1

這是不理想的,但你可以嘗試這樣的:

<cfloop query="q"> 
    <cfset querySetCell(q,"messageBody",xmlFormat(q.messageBody[q.currentRow]),q.currentRow) /> 
</cfloop> 

如果xmlFormat無法刪除所有字符(它已知錯過了幾個),您可能需要編寫一個手動方法將它們刪除。