Ozgrid Excel Help & Best Practices Forums

Excel Training / Excel Dashboards Reports

# Thread: Normalize & Standardize

1. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Normalize & Standardize

How to normalize a simple array (1 column) of 1000+ cells

I tried:

=STANDARDIZE(A2,A3,A4)

A2 the array that I want to normalize (serie of values)
A3 the Arithm. Mean (Average) for the target array (the correct array, same data type)
A4 STDEVP for the target array (the correct array).

But not sure that this formula is for normalizing (shift data set till it fits the histogram bell shape of the target data set)

I was able to do the normalization by hand using Histogram in Excel Statistics and then (linear) shifting my data (adding a constant) till I had a good fit when overlaying the two histograms, but I like to do it with a formula.

Below a sample of the data: the first column is my model (target) the second the data to be normalized and the third column after I subtracted 10 to the second column so it fitted in the histogram of the first column (cannot attach the Excel file as it is too large)

102.383 115.18 105.18
102.258 128.56 118.56
101.427 140.58 130.58
106.019 119.7 109.7
105.065 123.18 113.18
105.396 128.27 118.27
103.949 130.46 120.46
104.874 138.82 128.82
104.249 121.39 111.39
105.333 126.47 116.47
101.205 128.47 118.47
102.703 125.53 115.53
104.45 126.81 116.81
108.036 132.21 122.21
107.87 116.57 106.57
101.882 119.16 109.16
100.955 123.27 113.27
101.454 122.23 112.23
104.486 121.46 111.46
107.608 122.48 112.48
104.489 112.41 102.41
111.009 104.62 94.62
111.839 116.94 106.94
112.713 126.64 116.64
103.57 128.75 118.75
103.57 126.07 116.07
104.693 99.15 89.15
108.814 126.24 116.24
112.006 124.5 114.5
112.297 120.46 110.46
113.92 120.91 110.91
108.477 136.96 126.96
Last edited by witteman; November 21st, 2006 at 20:52.

Excel Video Tutorials / Excel Dashboards Reports

2. ## Re: Normalize

Well the standardize formula simply subtracts the second entry from the first and then divides the difference by the third number. Whether a histogram of the transformed data will match some other histogram depends on whether the actual distibution of the two data sets comes from the same family (often assumed to follow the Normal or Gaussian distribution). That's why the second entry to the Standarize function is called the mean, and the third the standard deviation. These would be calculated first from your set of data. All that said, I'm not sure I know what question you are really asking. Does this help any? if not, can you be more explicit in what you want?

3. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Re: Normalize

I have two sets of data (well logs from two oil wells) the first is a new well that I want to match ('correct') with the second set (target data set). The two wells are on top of each other (very close so the same family, same interval, a Gamma Ray log). When I make histograms of the two data sets you can see a shift in the mean (about -10) this is the second column on the example, when I subtract -10 from the second column (that is the 3rd column) I get a perfect match with the first column (histogram). How to do this with a statistics formula? I can show you the two histograms and complete sets of data that I made but the spreadsheet is 300k and won't attach.

Frank
Last edited by witteman; November 22nd, 2006 at 07:43.

Excel Video Tutorials / Excel Dashboards Reports

4. ## Re: Normalize

What happens when you standarize both sets of data using the mean and standard deviation of each set respectively. Do the histograms of the resulting data sets then appear close to each other? Alternatively calculate the difference of the two means. Is it close to the 10 you observed by trial and error? If not, calculater the difference in the medians of the two sets. Is that value close to the 10? You can calculate all of the means, standard deviations, and medians with Excel functions.

5. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Re: Normalize

The median difference gives the best (11) result so I guess that is better than Mean (8). When I shift my data set by 10.5, I get a perfect histogarm overlay. How to correct my data set for the difference in standard deviation or standard error?

Also ran some regressions (linear trendlines) on the two data sets and found that the intercepts=mean (difference=8) but the slope (x) is almost 5 times higher in the set that I want to correct so that may be the way to do it with slope and intercept y=bx+c

Stupid question: How to correct my data set with the slope and intercept of the correct data set?

I gave up on the Standardize method as that seems to filter the data set and I don't want to filter just correct the trend and scale the data.

Thanks

Excel Video Tutorials / Excel Dashboards Reports

6. ## Re: Normalize

Dividing a data set by its standard deviation is the usual way to scale the data.

I don't understand your concern about the Standardize function. Its only purpose is to adjust a data set (usually for comparison with another data set) to a common mean of zero and standard deviation of 1.

As for histograms, their shape is often highly dependent on the number and width of the bins.

7. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Re: Normalize

Thanks Peter,

On Standardize: I don't want to compare or correlate the the two data sets (qualitatively) as I already know that they represent identical layers of rock. I want to correct one curve so that the statistics are closer together so I can use the corrected data quantitatively: I have one (type) dataset that I don't touch and correct the other data set as I know that there are errors in that set.

Excel Video Tutorials / Excel Dashboards Reports

8. ## Re: Normalize

Withoiut seeing the data, but understanding you want to leave the reference set alone, I suggest you first multiply all members of the second set by the ratio of the standard deviation of the first set divided by the standard deviation of the second set (which will make the second set have the same standard deviation as the first set), and then add the diference in medians to the second set (to make the resulting medians identical). Whether this will in fact correct the errors in the second set will depend on the source of the errors.

9. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Re: Normalize

OK, I will try that, the errors are a drift in the baseline and statistical errors as the tool that produced the data is a Geiger counter measuring natural radioactivity of rocks. I will let you know the results.

Thanks

Frank

Excel Video Tutorials / Excel Dashboards Reports

10. I agreed to these rules
Join Date
9th July 2006
Posts
18

## Re: Normalize & Standardize

Hi Peter,

The method works, I have attached an overlay of the old histogram of my data set (subtracted -10 from the bad data set) in black and your method in red. You can see that the red is sleeker with less scatter in the lower range.

Now the problem of a baseline drift, how do I correct the slope of a trendline in the form y=bx+c I want to bring down 'b' of the bad data set so it matches the 'b' (slope) of the good data trendline?

Frank

Excel Video Tutorials / Excel Dashboards Reports

##### Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•
porno