Announcement

Collapse
No announcement yet.

Parsing HTML Table to Excel

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing HTML Table to Excel



    My personal project this past week has been trying to parse a HTML table to Excel.
    I've used many resources to finally get the code I'm looking for. It has taken a week to figure it out so I feel that I should share it to help others like me.

    Code:
    Sub ParseTable()
    Dim IE As InternetExplorer
    Dim htmldoc As MSHTML.IHTMLDocument 'Document object
    Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
    Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
    Dim eleRow As MSHTML.IHTMLElement 'Row elements
    Dim eleCol As MSHTML.IHTMLElement 'Column elements
    Dim ieURL As String 'URL
    
    'Open InternetExplorer
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    
    'Navigate to webpage
    ieURL = "Add URL here"
    IE.navigate ieURL
    'Wait
    Do While IE.Busy Or IE.readyState <> 4
        DoEvents
    Loop
    
    Set htmldoc = IE.document 'Document webpage
    Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
    
    'This section populates Excel
    i = 0 'start with first value in tr collection
    For Each eleRow In eleColtr 'for each element in the tr collection
        Set eleColtd = htmldoc.getElementsByTagName("tr")(i).getElementsByTagName("td") 'get all the td elements in that specific tr
        j = 0 'start with the first value in the td collection
        For Each eleCol In eleColtd 'for each element in the td collection
            Sheets("Sheet1").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
            j = j + 1 'move to next element in td collection
        Next eleCol 'rinse and repeat
        i = i + 1 'move to next element in td collection
    Next eleRow 'rinse and repeat
    
    End Sub
    You may have problems if your website has frames. Use this:
    Code:
    Set htmldoc = IE.document.frames.Item(x).document 'Replace x with the frame number that contains the table
    Instead of:
    Code:
    Set htmldoc = IE.document
    Goodluck,

    Paul

  • #2
    Re: Parsing HTML Table to Excel

    As this is a 'solution' rather than an issue, moved to a more appropriate forum...

    Comment


    • #3
      Re: Parsing HTML Table to Excel

      Thanks for sharing!
      Check out our new reputation system. Click on the Like button under the post!
      _______________________________________________

      There are 10 types of people in the world. Those that understand Binary and those that dont.

      Why are Halloween and Christmas the same? Because Oct 31 = Dec 25...

      The BEST Lookup function of all time

      Dynamic Named Ranges are your bestest friend

      _______________________________________________

      Comment


      • #4
        Re: Parsing HTML Table to Excel

        Thank you so much! I literally made an account to share my gratitude with this thread! You literally solved all my programming problems thanks for the frames as well! You sir are a LIFE SAVER!

        Comment


        • #5
          Nice. Going to check this out. Need a web scraping utility again, and looking for new methods other than what I've done all ready. Looks like many things different than what I was doing.

          One question is what if the web page has more than one table? Did have an issue sometime ago with the frame thing never was able to solve it at the time. Think I was trying to get to your solution just ...... never got that far ... Nice work thanks for sharing.
          Regards,
          Barry

          My Favorite New Thing:
          Dynamic Named Ranges



          The alternative for
          "Press Any Key To Continue."

          and we all have one we'd like to use it on

          1. Cross Posting Etiquette
          2. Are You Here To Learn: What Have You Tried?
          3. Your Best Resource to begin learning VBA: RECORD A MACRO . . . Then Study It!

          Comment


          • #6


            Ok ... Here's your code with a lil "iwrk...." messaging.

            Code:
            Sub ParseTable(Optional strURL As String, Optional intTableIndex As Integer)
            
            '   Local Variables
                Dim IE              As InternetExplorer
                Dim htmldoc         As MSHTML.IHTMLDocument             'Document object
                Dim eleColtr        As MSHTML.IHTMLElementCollection    'Element collection for tr tags
                    Dim eleColtd    As MSHTML.IHTMLElementCollection    'Element collection for td tags
                    Dim htmlTables  As MSHTML.IHTMLElementCollection    'Element collection for table tags
                Dim eleRow          As MSHTML.IHTMLElement              'Row elements
                    Dim eleCol      As MSHTML.IHTMLElement              'Column elements
                Dim wksOut          As Worksheet
                Dim rngOut          As Range
                Dim intRowIndex     As Integer
                    Dim intColIndex As Integer
            
            '   Open InternetExplorer
                Set IE = CreateObject("InternetExplorer.Application")
                IE.Visible = True
            
            '   Navigate to webpage
                IE.navigate strURL
                Do While IE.Busy Or IE.readyState <> 4: DoEvents: Loop
            
            '   Extract html information to objects
                Set htmldoc = IE.document
                Set htmlTables = htmldoc.getElementsByTagName("table")
                Set eleColtr = htmlTables(intTableIndex).getElementsByTagName("tr")
            
            '   Extract table to blank worksheet ( create worksheet if necessary )
                On Error Resume Next
                Set wksOut = ThisWorkbook.Worksheets("HTML Table Extract")
                If Err.Number <> 0 Then
                    Set wksOut = ThisWorkbook.Worksheets.Add(After:=Worksheets(Worksheets.Count))
                    wksOut.Name = "HTML Table Extract"
                End If
                With wksOut
                    .Cells.Clear
                    .Cells.NumberFormat = "@"
                    .Cells.ColumnWidth = 2
                End With
                On Error GoTo 0
            
            '   This section populates Excel
                intRowIndex = 0
                For Each eleRow In eleColtr
                    Set eleColtd = htmlTables(intTableIndex).getElementsByTagName("tr")(intRowIndex).getElementsByTagName("td") 'get all the td elements in that specific tr
                    Set rngOut = wksOut.Range("A1000000").End(xlUp).Offset(1, 0)
                    intColIndex = 0
                    For Each eleCol In eleColtd
                        rngOut.Offset(0, intColIndex) = eleCol.innerText
                        intColIndex = intColIndex + 1
                    Next eleCol
                    intRowIndex = intRowIndex + 1
                Next eleRow
            
                wksOut.Cells.EntireColumn.AutoFit
            
            '   Clear variables that "NEED" setting ( just something I do cause I don't know which var's need cleaning up )
                IE.Quit
                Set IE = Nothing
                Set htmldoc = Nothing
                Set htmlTables = Nothing
                Set eleColtr = Nothing
                Set eleColtd = Nothing
                Set wksOut = Nothing
                Set rngOut = Nothing
            
            End Sub
            Regards,
            Barry

            My Favorite New Thing:
            Dynamic Named Ranges



            The alternative for
            "Press Any Key To Continue."

            and we all have one we'd like to use it on

            1. Cross Posting Etiquette
            2. Are You Here To Learn: What Have You Tried?
            3. Your Best Resource to begin learning VBA: RECORD A MACRO . . . Then Study It!

            Comment

            Working...
            X