Compare 2 Sheets with 1 million rows and 50 columns and highlights the difference with color

  • No idea if this will be any quicker than the above but you may get a speed bump. I figured it was worth a shot. Haven't tested it because I don't have workbooks with a million rows! :D


  • Hi revanth,


    I tested this code comparing two sheets of data each with 41 columns and 520,000 rows, it took 12.29 seconds to complete. It does not do everything you are after yet but if it you think it is in the ballpark of what you are after let me know.


    Currently it finds the first difference in a row then highlights the entire row and moves to the next row.


  • HI Trunten,


    I tried your code on the large data set and it returned an out of memory error, I was getting the same when loading the entire sheet into an array so I cut the array into pieces. when I cut it down to 90,000 rows per sheet it ran but I cancelled after about 20 seconds. The highlight is colouring numerous cells which are the same on both sheets ( I only included a couple of differences throughout, but thousands of cells are being highlighted).


    I obtained some large data sets from Here then copied and pasted them a few times to get into the ridiculous range.


    Regards

    Justin

  • Probably should have guessed that would be too big for an array!!

    i thought I had to compare rows that had the same id in column A (and then highlight the entire row if the id was not on the first sheet) so that may explain why you’re seeing so many “reds”. Could’ve got the wrong end of the stick though

  • Maybe, I am not entirely sure what he is after yet. I was just interested in how long it would take for an array solution to work through that many cells... I think 12 seconds is not too bad, though it may increase significantly depending on the number of differences and the additional requirements.

  • Made a few tweaks to mine to try and fit in memory and up the speed a little. It still took 35 seconds but I created a spreadsheet with 2 sheets each with 50 columns and 500,000 rows which took over a minute just to open on my machine so I'm pretty happy with that :D


  • What version of excel are you running, and how old is the computer system? Maybe if it is a 32 bit version of excel? That is a big jump in time for a not particularly large jump in size (about a 25% increase considering you had 9 extra columns). so you would expect it to be closer to 20 seconds. Anyway, that is the trouble with humungous data sets. You could try your hand at R.