Reading Very Old Data

  • Hi,


    I have been able to extract data from an ancient system (mainframe and terminal emulation programs) in PDF that runs to over 2,000 pages and over 120,000 lines of data but I get stuck with an anomaly in the output and would really appreciate some help please:

    A set of data looks like this:
    Month: Oct 2015 Nov 2015 Dec 2015 JAN 2016 FEB 2016 MAR 2016
    INVOICES: 1230 4500 1000 2230 2345 3456
    CREDITS: 123 10% 4500-100% 100- 10% 22- 1% 23- 1% 346- 10%


    The VBA code I have written looks for spaces and every time it sees a space it treats the data as a single number and transfers the number to the worksheet in columns, increasing the row number each time.


    The data uses trailing negatives and the code converts these.

    On the CREDITS line there are spaces between the trailing negative and the percentage except in one case 4500-100% the code can’t translate these instances and they don’t always have trailing zeros (for some reason there canbe positive credits).


    I should be getting:
    123
    -4500
    -100
    -22
    -23
    -346

    Instead I am getting:
    123
    -100
    -22
    -23
    -346



    Thanks!

  • Re: Reading Very Old Data


    Hi,


    I have added a file with simplified version of the data fetch function and sample data.
    The data in RED is the data that doesn't get converted because there is no space between the value, the negative and the percentage.

    Files

  • Re: Reading Very Old Data


    Here is an example approach which works for part of your example. I don't know enough about the wider dataset to be able to know how the rest of the results should be presented. Run the second macro (x).

  • Re: Reading Very Old Data


    Thanks!


    That is quite complex code, I am sure that I will be able to adapt it to my needs but I would really appreciate some guidance on how the Regex2 code works (I have not come across the object "VBScript.RegExp" before.

  • Re: Reading Very Old Data


    I was afraid you'd ask that! Here is one tutorial http://analystcave.com/excel-regex-tutorial/ and here is another https://www.experts-exchange.c…s-and-Visual-Basic-6.html and there are many others if you google 'regular expressions'. It is not an easy thing to learn (in my view) - the difficult bit is coming up with the right patten for matching text. I've simplified above slightly since my original post.


    \d+ means look for one or more digits
    [-?|\s+] means the digits should be followed by either zero or one - signs OR one or more spaces (so don't match if followed by a %)
    |\d+$ means instead of all of above match one or more digits if they are at the end of the string ($)


    0 is matched because it is digit(s) followed by a space
    34151- is matched because it is digits followed by a -
    111% is not matched because digits are followed by %


    Most of the rest is looping through the stored results and multiplying by -1 if it ends with a -.


    I'm by no means an expert so there may be easier ways to do this.