Announcement

Collapse
No announcement yet.

HTML Web Scraping Using VBA

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTML Web Scraping Using VBA



    Hi,

    I am new to HTML web scraping but let me try to summarize what I'm trying to do real quick before I get into my problem. I need to be able to perform the following steps on a website called marketingscents.com using VBA:
    1. Load the webpage
    2. Click the login button
    3. Enter username and password
    4. Click login button
    5. Access needed information

    I have been able to perform the first 2 steps using the following code
    Code:
    'navigate to marketing scents
        Dim IE As New SHDocVw.InternetExplorer
        Dim HTMLDoc As MSHTML.HTMLDocument
        Dim HTMLAs As MSHTML.IHTMLElementCollection
        Dim HTMLA As MSHTML.IHTMLElement
        Dim HTMLInputs As MSHTML.IHTMLElementCollection
        Dim HTMLInput As MSHTML.IHTMLElement
    
        IE.Visible = True
        IE.navigate "http://marketingscents.com/"
       
        Do While IE.readyState <> READYSTATE_COMPLETE
           
        Loop
       
        'navigate to login page
        Set HTMLDoc = IE.document
       
        Set HTMLAs = HTMLDoc.getElementsByTagName("a")
       
        For Each HTMLA In HTMLAs
            'Debug.Print HTMLA.getAttribute("classname"), HTMLA.getAttribute("href")
            If HTMLA.getAttribute("classname") = "header__button button button--secondary button--large button--login" And HTMLA.getAttribute("href") = "https://www.marketingscents.com/index?page=login-new" Then
                HTMLA.Click
                Exit For
            End If
        Next HTMLA
    Where I am getting stuck is on Step 3. I have combed through the HTML provided below and determined that the username and password fields that I need are input boxes.

    HTML Code:
    <html lang="en"><head><base href="https://www.marketingscents.com/usercontent/65307188/homepage/">
    
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <title>Login - Marketing Scents</title>
        <link href="/usercontent/65307188/homepage/favicon.ico" rel="shortcut icon" type="image/x-icon">
        <link href="http://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,400,300,600,700" rel="stylesheet" type="text/css">
        <link href="media-login/css/bootstrap.min.css" rel="stylesheet">
        <link href="media-login/css/style.css" rel="stylesheet">
       
            <!-- GA Tracking Code -->
       
        <script src="https://connect.facebook.net/en_US/fbevents.js" async=""></script><script src="https://www.google-analytics.com/analytics.js" async=""></script><script>
      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
      })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
    
      ga('create', 'UA-91740248-1', 'auto');
      ga('send', 'pageview');
    
    </script>
       
        <!-- End GA Code -->
       
          <!-- Facebook Pixel Code -->
    <script>
      !function(f,b,e,v,n,t,s)
      {if(f.fbq)return;n=f.fbq=function(){n.callMethod?
      n.callMethod.apply(n,arguments):n.queue.push(arguments)};
      if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0';
      n.queue=[];t=b.createElement(e);t.async=!0;
      t.src=v;s=b.getElementsByTagName(e)[0];
      s.parentNode.insertBefore(t,s)}(window, document,'script',
      'https://connect.facebook.net/en_US/fbevents.js');
      fbq('init', '1720644064910435');
      fbq('track', 'PageView');
    </script>
    <noscript>&lt;img height="1" width="1" style="display:none"
      src="https://www.facebook.com/tr?id=1720644064910435&amp;ev=PageView&amp;noscript=1"
    /&gt;</noscript>
    <!-- End Facebook Pixel Code -->
       
       
      </head>
      <body>
        <div class="alert alert-header alert-danger" id="js-page-message">
          <div class="container">Something went wrong. Please try again, or <a href="javascript:openBrWindow('/index/lookup','_blank','height=420,width=350,status=no,menubar=no,location=no,scrollbars=yes');">reset your password</a>.</div>
        </div>
        <div class="container">
          <div class="panel">
            <div class="panel-header">
              <img class="logo" alt="" src="/usercontent/65307188/homepage/media-new/img/logo-color.png">
            </div>
              <form id="js-login-form" role="form" action="//www.mscbackoffice.com/index/login" method="post" data-csrf-added="1">
    <input name="CSRFToken" type="hidden" value="ko733ujmr1v5rdee599u4d756gbp4oj4o3g8b0ai9sah2oi1v6dhd8bg8ubv1vbp-q2DKtcs2WwFCs5kfYCTTOYi+pDQVIjumO0uEL0+thTc=">
    
      <input name="login_page" type="hidden" value="login-new">
              <div class="form-group">
                <label for="form-email">Member ID</label>
                <input name="username" class="form-control input-lg" id="form-email" type="text" placeholder="Enter Member ID">
              </div>
              <div class="form-group" style="margin-bottom: 10px;">
                <label for="form-password">Password</label>
                <input name="password" class="form-control input-lg" id="form-password" type="password" placeholder="Enter Password" value="">
              </div>
              <div class="form-group remember-me">
                       <a class="pull-right" href="javascript:openBrWindow('/index/lookup','_blank','height=420,width=350,status=no,menubar=no,location=no,scrollbars=yes');">Forgot Password?</a>
                <input name="remember_me" id="remember_me" type="checkbox" value="1">
                  <label for="remember_me">Remember Me</label>
              </div>
              <button class="btn btn-lg btn-primary btn-block" id="js-login-btn" type="submit" value="Login">Log in</button>
            </form>
          </div>
        </div>
     
     
      <script src="media-login/js/jquery-1.11.1.min.js"></script>
      <script>
       
      function openBrWindow(theURL,winName,features) {
        window.open(theURL,winName,features);
      }
     
      </script>
    </body><!-- Minify and concatenate when dev is done --></html>
    However, I can't even get my VBA to print the name of all the input fields on the page using Debug.Print. I am using the following code.

    Code:
    'Prompt user to enter in their login info and login to website
        Set HTMLInputs = HTMLDoc.getElementsByTagName("input")
        For Each HTMLInput In HTMLInputs
            Debug.Print HTMLForm.getAttribute("name")
        Next HTMLInput
    Can anyone tell me what am I doing wrong???

  • #2


    Ok so I've figured out how to get the username and password into the correct input boxes on the site using the following code. The problem that I'm experiencing now is that the code works when i step through it using F8 and the fields are populated with the right info but when I try to run the code completely without using F8, I get a "Object variable or With block variable not set" error. Can anyone tell me why I'm getting this error?

    Code:
     
    'Prompt user to enter in their login info and login to website
    Set HTMLUser = HTMLDoc.getElementById("form-email")
    HTMLUser.Value = "xxxxxxxxx"
    Set HTMLPWord = HTMLDoc.getElementById("form-password")
    HTMLPWord.Value = "**********"

    Comment

    Working...
    X