Announcement

Collapse
No announcement yet.

Excel VBA RTF Tags Removal/Scrubbing. RTF to Text on Active Sheet

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Excel VBA RTF Tags Removal/Scrubbing. RTF to Text on Active Sheet



    This is posted as a solution in conjunction with the work done by Nossidge at http://stackoverflow.com/questions/1...formatted-text.

    His/her work saved me a lot of work. Thanks Nossidge.

    I fixed 2 'bugs', adding underscores (_) to the whole range selected for scrubbing when run for a second time and truncating the first 2 characters on 'scrubbing'. I also added 'handsfree' operation in where the whole Used Range in the Active Sheet is 'Scrubbed'.

    For your convenience I've attached the solution, with dummy RTF Data, so no need to go an copy and paste VBA Code into Excel if you are new to VBA. The code can be adapted to work as a function or be triggered by events.

    This solution was very helpfull, especially with the use of Pivot's which are unindated with the RTF Tags. Now I can import SQL nvarchar(max) data, scrub it and Pivot as required.

    Deon Wilken

    *EDIT*
    The original upload is specific to XL2010. Modded version attached which works in 2003 & 2007 (Tested)

    Added line to replace Chr$(13)'s in the returned text to more Excel friendly Chr$(10)'s - line is commented out in Procedure CopyParseAllRange in Module1

    Dependancy on specific version of Word removed - is now a generic object.
    Attached Files
    Last edited by cytop; October 5th, 2012, 16:47.

  • #2
    Re: Excel VBA RTF Tags Removal/Scrubbing. RTF to Text on Active Sheet

    This should do the same and faster.
    Code:
    Sub test()
        Dim a, i As Long, m As Object, temp As String
        With Range("a1").CurrentRegion
            a = .Value
            With CreateObject("VBScript.RegExp")
                .Global = True
                .Pattern = "{\S+ ([^};]+(?!\;))\}"
                For i = 1 To UBound(a, 1)
                    temp = a(i, 1): a(i, 1) = Empty
                    For Each m In .Execute(temp)
                        a(i, 1) = a(i, 1) & m.submatches(0)
                    Next
                Next
            End With
            .Value = a
        End With
    End Sub

    Comment


    • #3
      Re: Excel VBA RTF Tags Removal/Scrubbing. RTF to Text on Active Sheet

      The difference is IMPRESSIVE to say the least and it works so well that it can be used as a real-time solution whenever data changes on the sheet. Attached the solution with the new code. Thanks Jindon
      Comparison
      ROWS 121 12,000
      Old Code (Execution time in seconds) 2.4 182.5
      New Code (Execution time in seconds) 0.1 0.38
      Attached Files

      Comment


      • #4
        Re: Excel VBA RTF Tags Removal/Scrubbing. RTF to Text on Active Sheet

        Here is a better way to use regex objects to clean rtf code. These patterns assume that the RTF text has newline characters and is not a continuous string of code/text.

        Code:
        Public Function RemoveRTFFromString(ByVal str As String) As String
            If str <> "" Then
                Dim resStr As String
                Dim regEx1 As Object
                
                Set regEx1 = CreateObject("VBScript.RegExp")
                With regEx1
                    .Global = True
                    .IgnoreCase = False
                    .MultiLine = True 'Only changes how script reads newlines (FALSE = treats entire string as one line.  TRUE = treats each line of text [up to newline chars] as one line)
                End With
                
                'Remove all {\code}  ,  \code  ,  and  }}  kinds of patterns
                regEx1.Pattern = "({\\)(.*)(}+)|(\\)([a-zA-Z0-9_;-]+)|(\}*)(\s*)(\}*)$"
                resStr = regEx1.Replace(str, "")
                
                'Replace all non-tab and space \s chars with a space " " (so lines of text are seperated)
                regEx1.Pattern = "\r|\n|\v|\f"
                resStr = regEx1.Replace(resStr, " ")
                
                'Trim ends
                regEx1.Pattern = "^( +)|( +)$"
                resStr = regEx1.Replace(resStr, "")
                
                Set regEx1 = Nothing
                
                RemoveRTFFromString = resStr
            Else
                RemoveRTFFromString = str
            End If
        End Function
        Last edited by nopers; December 2nd, 2014, 04:07. Reason: text posted incorrectly

        Comment


        • #5


          hi there. I'm trying to run this, but for some reason it is not working new Office releases. only adding compatibility with 64bits is not solving. if someone could help, will be great! thanks!!!

          Comment

          Working...
          X