Announcement

Collapse
No announcement yet.

Normalize & Standardize

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalize & Standardize

    How to normalize a simple array (1 column) of 1000+ cells

    I tried:

    =STANDARDIZE(A2,A3,A4)

    A2 the array that I want to normalize (serie of values)
    A3 the Arithm. Mean (Average) for the target array (the correct array, same data type)
    A4 STDEVP for the target array (the correct array).

    But not sure that this formula is for normalizing (shift data set till it fits the histogram bell shape of the target data set)

    I was able to do the normalization by hand using Histogram in Excel Statistics and then (linear) shifting my data (adding a constant) till I had a good fit when overlaying the two histograms, but I like to do it with a formula.

    Below a sample of the data: the first column is my model (target) the second the data to be normalized and the third column after I subtracted 10 to the second column so it fitted in the histogram of the first column (cannot attach the Excel file as it is too large)

    102.383 115.18 105.18
    102.258 128.56 118.56
    101.427 140.58 130.58
    106.019 119.7 109.7
    105.065 123.18 113.18
    105.396 128.27 118.27
    103.949 130.46 120.46
    104.874 138.82 128.82
    104.249 121.39 111.39
    105.333 126.47 116.47
    101.205 128.47 118.47
    102.703 125.53 115.53
    104.45 126.81 116.81
    108.036 132.21 122.21
    107.87 116.57 106.57
    101.882 119.16 109.16
    100.955 123.27 113.27
    101.454 122.23 112.23
    104.486 121.46 111.46
    107.608 122.48 112.48
    104.489 112.41 102.41
    111.009 104.62 94.62
    111.839 116.94 106.94
    112.713 126.64 116.64
    103.57 128.75 118.75
    103.57 126.07 116.07
    104.693 99.15 89.15
    108.814 126.24 116.24
    112.006 124.5 114.5
    112.297 120.46 110.46
    113.92 120.91 110.91
    108.477 136.96 126.96
    Last edited by witteman; November 21st, 2006, 20:52.

  • #2
    Re: Normalize

    Well the standardize formula simply subtracts the second entry from the first and then divides the difference by the third number. Whether a histogram of the transformed data will match some other histogram depends on whether the actual distibution of the two data sets comes from the same family (often assumed to follow the Normal or Gaussian distribution). That's why the second entry to the Standarize function is called the mean, and the third the standard deviation. These would be calculated first from your set of data. All that said, I'm not sure I know what question you are really asking. Does this help any? if not, can you be more explicit in what you want?
    Please do not U2U (private message) me directly for questions that should be posted to the forum; any such U2Us will be ignored.

    Comment


    • #3
      Re: Normalize

      I have two sets of data (well logs from two oil wells) the first is a new well that I want to match ('correct') with the second set (target data set). The two wells are on top of each other (very close so the same family, same interval, a Gamma Ray log). When I make histograms of the two data sets you can see a shift in the mean (about -10) this is the second column on the example, when I subtract -10 from the second column (that is the 3rd column) I get a perfect match with the first column (histogram). How to do this with a statistics formula? I can show you the two histograms and complete sets of data that I made but the spreadsheet is 300k and won't attach.

      Appreciate your help

      Frank
      Last edited by witteman; November 22nd, 2006, 07:43.

      Comment


      • #4
        Re: Normalize

        What happens when you standarize both sets of data using the mean and standard deviation of each set respectively. Do the histograms of the resulting data sets then appear close to each other? Alternatively calculate the difference of the two means. Is it close to the 10 you observed by trial and error? If not, calculater the difference in the medians of the two sets. Is that value close to the 10? You can calculate all of the means, standard deviations, and medians with Excel functions.
        Please do not U2U (private message) me directly for questions that should be posted to the forum; any such U2Us will be ignored.

        Comment


        • #5
          Re: Normalize

          The median difference gives the best (11) result so I guess that is better than Mean (8). When I shift my data set by 10.5, I get a perfect histogarm overlay. How to correct my data set for the difference in standard deviation or standard error?

          Also ran some regressions (linear trendlines) on the two data sets and found that the intercepts=mean (difference=8) but the slope (x) is almost 5 times higher in the set that I want to correct so that may be the way to do it with slope and intercept y=bx+c

          Stupid question: How to correct my data set with the slope and intercept of the correct data set?

          I gave up on the Standardize method as that seems to filter the data set and I don't want to filter just correct the trend and scale the data.

          Thanks

          Comment


          • #6
            Re: Normalize

            Dividing a data set by its standard deviation is the usual way to scale the data.

            I don't understand your concern about the Standardize function. Its only purpose is to adjust a data set (usually for comparison with another data set) to a common mean of zero and standard deviation of 1.

            As for histograms, their shape is often highly dependent on the number and width of the bins.
            Please do not U2U (private message) me directly for questions that should be posted to the forum; any such U2Us will be ignored.

            Comment


            • #7
              Re: Normalize

              Thanks Peter,

              On Standardize: I don't want to compare or correlate the the two data sets (qualitatively) as I already know that they represent identical layers of rock. I want to correct one curve so that the statistics are closer together so I can use the corrected data quantitatively: I have one (type) dataset that I don't touch and correct the other data set as I know that there are errors in that set.

              Comment


              • #8
                Re: Normalize

                Withoiut seeing the data, but understanding you want to leave the reference set alone, I suggest you first multiply all members of the second set by the ratio of the standard deviation of the first set divided by the standard deviation of the second set (which will make the second set have the same standard deviation as the first set), and then add the diference in medians to the second set (to make the resulting medians identical). Whether this will in fact correct the errors in the second set will depend on the source of the errors.
                Please do not U2U (private message) me directly for questions that should be posted to the forum; any such U2Us will be ignored.

                Comment


                • #9
                  Re: Normalize

                  OK, I will try that, the errors are a drift in the baseline and statistical errors as the tool that produced the data is a Geiger counter measuring natural radioactivity of rocks. I will let you know the results.

                  Thanks

                  Frank

                  Comment


                  • #10
                    Re: Normalize & Standardize

                    Hi Peter,

                    The method works, I have attached an overlay of the old histogram of my data set (subtracted -10 from the bad data set) in black and your method in red. You can see that the red is sleeker with less scatter in the lower range.

                    Now the problem of a baseline drift, how do I correct the slope of a trendline in the form y=bx+c I want to bring down 'b' of the bad data set so it matches the 'b' (slope) of the good data trendline?

                    Frank
                    Attached Files

                    Comment


                    • #11
                      Re: Normalize & Standardize

                      ....and this is the overlay of your method in green with the good data set (in red). very good match indeed!
                      Attached Files

                      Comment


                      • #12
                        Re: Normalize & Standardize

                        What serves as the x values (I'm assuming your y values are the data we've been talking about)? It may be that the transformation just made with the standard deviations will already have made the adjustment to the slope coefficient. Give it a try and see. (By the way, who is Peter?)
                        Please do not U2U (private message) me directly for questions that should be posted to the forum; any such U2Us will be ignored.

                        Comment


                        • #13
                          Re: Normalize & Standardize

                          Derk, not Peter, sorry I was talking to an MSOffice support group on the same subject (but got nowhere) and that was a Peter, I believe ....

                          With the histograms I don't have an x but when I do regression I use row numbers for x, I make sure that the number of rows are the same. In the real world x = depth in meters although we geologists call depth y as logs are vertically oriented (normally) where y axis is depth and x the data.

                          The drift is still there, less than with my method but still double the slope of the good data, I will attach the to trendlines so you can see.

                          Cheers Derk

                          Frank

                          Comment


                          • #14
                            Re: Normalize & Standardize

                            Derk, the slope is less than double but still there, # samples (x axis) identical.
                            (see plots attached)

                            Frank
                            Attached Files

                            Comment


                            • #15
                              Re: Normalize & Standardize

                              Derk, if you use polygones instead of bars for the histograms the results are even more impressive (see attachment). The red line is the target set and in black (overlay same scales) bad data, normalized with your method, a duplicate!

                              All that is left is the baseline drift in the bad data that is still bothering me a bit.....
                              Attached Files

                              Comment

                              Working...
                              X