Announcement

Collapse
No announcement yet.

[SOLVED] - Excel VBA code reads wrong innerHTML code

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [SOLVED] - Excel VBA code reads wrong innerHTML code



    Hi Everyone!
    I am absolutely new in webscraping and have some minor previous VBA knowledge. I am trying to make a scraper which loads a website makes a search on the website and then scrapes the details of the search. I am very annoyed that my scraper can make the search with the given parameters, but after the search is made and the new website (with the search findings) is loaded, I make a innerHTML read request within VBA and the results are NOT the source code of the new page. I know it because I check the source code manuall within IExplorer and it does not match with the innerHtml code inserted in cell B16 by my VBA code:

    Why is that happening? What is the source code that my VBA extracts?
    Thank you very much for your help in advance!

    Please see my short and commented code below:

    Code:
    Sub test()
     Dim eRow As Long
     Dim ele As Object
     Application.StatusBar = "Started"
     
     Set sht = Sheets(2)
     RowCount = 1
     
     'initialize IE object
     Set objIE = CreateObject("InternetExplorer.Application")
    ' navigate to the desired website
    With objIE
     .Visible = True
     
     .Navigate "http://www.profession.hu/"
    'fully load the website
    Do While .Busy Or _
     .ReadyState <> 4
     DoEvents
     Loop
     
     'print innerHTML (source code) of the website
     Set html = .Document
     Range("A16") = html.DocumentElement.innerHTML
     
     
     'insert the search criterias in the input fields of the website
    .Document.getElementById("header_keyword").Value = "mérnök"
    .Document.getElementById("header_location").Value = "budapest"
     'click the search button on the website
     Set my_classes = .Document.getElementsByClassName("p2_button_inner")
        
        For Each my_class In my_classes
            If my_class.getAttribute("value") = "Keresés" Then
                Range("c4") = "Clicked"
                my_class.Click
                i = i + 1
            End If
        Next my_class
                        
                    
               
                
     'load the new website with the search criterias
     Do While .Busy Or _
     .ReadyState <> 4
     DoEvents
     Loop
     Application.StatusBar = "Loaded new site"
     
     
     'print innerHTML (source code) of the 'new' website
     Set html = .Document
     Range("B16") = html.DocumentElement.innerHTML
     
     
     End With
     Set objIE = Nothing
     End Sub
    So basically my problem is the following:
    1. The scraper goes to profession.hu. Loads the innerHtml code of the website and displays it in the cell A16. (I checked the result and this is
    working properly, so no problem here).
    2. Then writes data in two input fields of the website and makes the search. (Obviously after the search a new webpage is displayed with the
    search findings).
    3. After the new page is fully loaded the scraper takes again the innerHtml (source code) of the new page and displays it in cell b16. (Here is my
    problem: the innerHTML source code it is not correct, because I checked it manually within IE).

    Thank you for your help in advance!
    Last edited by hunsnowboarder; August 17th, 2015, 23:55. Reason: solved!

  • #2
    Re: Excel VBA code reads wrong innerHTML code

    The maximum text length in a cell is 32767, so Excel truncates the innerHTML string, making the 2 pages look the same. Use innerText instead and you'll see that the pages are different, and the 2nd page is indeed the results page.

    Comment


    • #3


      Re: Excel VBA code reads wrong innerHTML code

      Ahh..Thank yo sooo much!! Thanks!

      Comment

      Working...
      X