DECISION MAKING AND EXTRACTING DATA

LESSON WORKBOOK:  Level 2 Lesson 4 2007.xlsx

This lesson will be used to look at the various methods available within Excel to extract required information from databases and tables. There are many ways in which this can be done, but some ways are far better than others. By far the most important factor in this is how the information is laid out in the first place. By following these six rules (also mentioned in Lesson 1), life will be much easier come time to extract the information.

The Functions that we will look at in this lesson are best suited for extracting information.  These are the Lookup & Reference Functions, the Database Functions and Array Formulas.

However, before we start, we must stress that we will only use array formulas as a very last resort.  Our reason for this is that over the years we have been helping others use Excel, we have consistently found that when users first learn how to use and write array formulas, they tend to fall back on them all too often. The major drawback to this is that you will end up with a workbook that becomes extremely slow in re-calculating. You must remember also, that Excel by default re-calculates when you open, save or enter data into a cell that a formula is dependent on.   All too often we have seen workbooks that have become virtually unusable due to the amount of array formulas.

The other subject we will look at is the means and methods of deciding which formula to use in which situation. As Excel has tons of Functions available, this can sometimes seem a bit daunting. Keep in mind that although Excel is very rich in Functions, you will find that most likely you will be using the same set of Functions over and over again. You must always keep in mind that Excel was designed for all people from all walks of life and businesses to use, by this I mean it contains many Functions that would be of no use at all to some people, but be the "be all and end all" to others. A couple of brief examples of this would be some of Excel's engineering and financial Functions, unless you were in that field of expertise they would be of no use at all.

We will look at the means and methods of deciding which formula to use in which situation first and then we will move on to ways and means of extracting information from tables and databases.

Decision Making

One of the many problems that can crop up when using Excel is trying to figure out which function to use in which situation. There are certainly no hard and fast rules to this as more often than not you will need to use a nested formula that will incorporate many types of Functions.  Whichever way you go do not fall into the trap of trying to make your formula as short as possible. We constantly see 'so called' experienced users replacing False with 0 (zero), True with 1 and formulas like: =IF(A1,,).  A well constructed formula is one that can be read with relative ease by someone else and even more importantly by yourself when you come back to it later on. In case you are wondering, in Excel True has a value of 1 or greater and False a value of 0. In other words you could replace the True or False argument in a function with 1 or greater or 0, eg =ADDRESS(1,1,2,2,"Sheet1") is the same as =ADDRESS(1,1,2,TRUE,"Sheet1").  Having said this, as far as we are concerned this is confusing, as it's very hard to tell what is the True part and what is the numeric part!

Anyway enough of that. Before you construct your formula you should ask yourself these five questions.

  1. Is the data text or numeric?

  2. Do you want text or a number as your result?

  3. Do you need to locate your data first?

  4. Does the result depend on the data type or value?

  5. Are you dealing with dates and/or times?

If your data is Numeric then you can possibly rule out the need for any of the Text functions.

If your data is Text and you want a Numeric result you will very likely need one of the Text functions.

If you need to locate your data first you will very likely need one of the Lookup & Reference functions.

If your result depends on the data type or value you will need one of the Logical functions.

If you are dealing with dates and/or times then you may need one of the Date & Time functions

Excels Insert Function (Shift+F3) is always a good place to start.  Under the heading Or Select a Category are the Function categories.  If you know the category you want, you can select from this list, or if you do not know which Function Category you want, once you have the Insert Function dialog on screen, click in the Search for a  Function  box and type in some key words of the result you want. Don't be too vague though and don't be to detailed. If you were wanting to find some data that is within some more data and you just type "Find" you will end up with a very long list. The same problem would arise (to a lesser degree ) if you typed "Find some data in a list of data". But simply type "Find data" and you will get a reasonable list.

Let's look at the ones you are most likely to need and when to use them.

Date & Time - A Case Study

We will use a real problem and work through it to see how we might apply the "Decision making" process. This is only one example, but I believe it will demonstrate how it can be applied.

If your data contains a period of time you should consider using one of the Date & Time functions. This is not always completely necessary because, if you only need to find the difference between two dates you can simply subtract one date from the other.

Suppose you have the date 22/12/2011 in cell A1 and 15/4/2011 in cell A2 you could use: =A1-B1 and format your result cell as General to find out the number of days between the two dates. But what if we need to find out the months between the two dates? We would use: =MONTH(A1)-MONTH(A2). Of course this is not a problem because the years in both dates are the same, but what if they are not! This is where we ask ourselves the 5 questions:

  1. Is the data text or numeric? Answer= numeric!

  2. Do you want text or a number as your result? Answer= number!

  3. Do you need to locate your data first? Answer= no!

  4. Does the result depend on the data type or value? Answer= no!

  5. Are you dealing with dates and/or times? Answer= yes!

We need to take into account the year and month portion of our dates. From this it is most likely a Date & Time function is called.  We are being totally honest here when we say we have never had to do this! You will see why soon. Let's work through this problem step by step. We will assume we have not used: : =MONTH(A1)-MONTH(A2) yet. Make sure the dates: 22/12/2011 is cell A1 and 15/4/2011 in cell A2

Before we went through this step by step we stated that we were "being totally honest here when we say we have never had to do this!" The reason we have never had to do this before is because there is a little known about Excel function that can do this for us! It's called DATEDIF, you will not find much documentation on this function as it is only supplied for compatibility with Lotus 1-2-3. The DATEDIF calculates the number of days, months, or years between two dates. It's syntax is: DATEDIF(start_date,end_date,unit).  Unit is the number of what you want returned.

Y = The number of complete years in the period

M = The number of complete months in the period.

D = The number of days in the period.

MD = The difference between the days in start_date and end_date. The months and years of the dates are ignored.

YM = The difference between the months in start_date and end_date. The days and years of the dates are ignored.

YD = The difference between the days of start_date and end_date. The years of the dates are ignored.

So if we use the formula =DATEDIF(A2,A1,"M") on our dates to check if the formula we wrote is correct we get: 32 also.

Although this is only one situation out of a possible infinite number, it should give you a good idea of a logical process that could be applied to many other situations.

Extracting Data

Extracting data from tables and databases is something that is done quite frequently from within Excel.  We haven't seen too many spreadsheets where extracting data is not needed. The information below is meant as a generic guide to extracting data from a database or table. The one thing you should always look for when faced with the problem of how to extract information is "what won't change!" In other words you must establish a constant from all the variables. Lets look at some ways this can be done!

In the Lookup & Reference example below we will work through it with you to again try and demonstrate the possible thought process we would use to arrive at a solution. This will end up being a relatively deeply nested formula and will include a number of different functions.  We are using this as an example because it will end up being a very generic solution.

Lookup and Reference

Quite possibly the most common of these type of Functions is the VLOOKUP. This because it will look within the first column of a nominated table array for a nominated value and return the value from the same row in a nominated column. We looked at the VLOOKUP in the lesson on "Specific Worksheet Formulas" so we will move on to a more difficult situation.

When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number. Once we have this we can extract the data from the table or database. Remember that Excel uses a grid method to locate data, a very simple but effective method.  If it works for Excel - it will work for us!

Let's say we have a huge table of data that consists of 40 columns and 5000 rows. We need to find 6 single cells within a table of 200,000 cells (40*5000). We have of course set up our spreadsheet by the rule book. Our table has been named AllInfo in other words AllInfo refers to the range A1:T5000. but the data we need to extract poses the following problems.

The data wanted is the names of the 3 largest and the 3 smallest tax payers and these change daily.

All we know for certain is that the column heading Tax does not change, but it's positions can! The names are always in the column immediately before the Tax column.

So in a nutshell, what we need to do is, find the 3 largest and the 3 smallest tax figures in the Tax column and then return the name immediately to the left.

The very first thing we are going to do is give the top row (headings) a named range. We'll call it TpRwAllData and it will refer to A1:T1.

The VLOOKUP requires that we know the column position and we have been told that "The column number can and does vary". It also requires we look for a known value in the first column of our table and we have no idea what that might be.  So VLOOKUP can virtually be ruled out!

Let us use some positive information and work from there. The biggest help is the column heading Tax.  If we can find that and get a reference to it we should then also be able to get a reference to the column immediately before it. 

Click through the Lookup & Reference functions and the functions that appear to be of use are ADDRESS, HLOOKUP, INDEX, MATCH, INDIRECT, and OFFSET. If we can create a reference to the Tax column we could look within it for the 3 largest and smallest tax figures.

I like the look of MATCH because it "Returns the relative position in an array...". So lets try it and see if we could use it to find the Tax column.

Now we have this we can be very confident that we are on the right track. Remember "When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number" We have the Column number, so now I'm going to use this nested with the ADDRESS function to create a reference to the Tax column.

Now we can apply the same function again but instead of 1 as the Row number we will use 5000 (the last row of our table)

=ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) gives a result of $G$5000

If we now join these using the & (Ampersand) and include a : (Colon) we can create a reference to our Tax column that will always be the address of the Tax column no matter what position it is in. In other words it will be a variable range.

=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))

This returns the result $G$1:$G$5000.

Now we can use this to find our 3 largest and the 3 smallest tax figures. There is one very important thing to remember here though and that is the result $G$1:$G$5000 will be seen by Excel as a Text string and not a range reference! To force Excel to see this as a range reference we only need to nest it within the INDIRECT function. The INDIRECT function "Returns the reference returned by a text string".

What we will show you now is a neat little trick that is very much like creating your very own Custom Function. The same can be applied to just about any nested or non-nested function and comes in very handy!

What we have now is a variable named range that will always refer to our Tax column!

We now need to find the 3 largest  and the 3 smallest  tax figures in the named range TaxCol  so let's type the word Largest in the Search for a Function  box in the Insert Function  dialog and then Smallest. This is an easy one as there are two functions that will do exactly what we need, they are the LARGE  and SMALL  functions.

What we intend to do now is to use the INDEX function to create a reference to the name of the largest tax payer.

The syntax for INDEX is =index(array,row_num,column_num)

=INDEX(AllData,MATCH(LARGE(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)

As you can see we have colour coded the formula to help you see what each nested function is being used for. The -1 is returning the column number of the column immediately before the Tax column. We can now easily find the 3 largest tax payers within our table all we need to do is substitute LARGE(TaxCol,1) with LARGE(TaxCol,2) and LARGE(TaxCol,3).

For the 3 smallest tax payers we substitute the LARGE function with the SMALL function, eg; =INDEX(AllData,MATCH(SMALL(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)

We realise that this can seem a bit daunting at first, but take your time and work through it, as the solution to this one difficult problem can be applied in many situations.

Database Functions

Excel includes twelve Database functions and they are referred to as the Dfunctions.  These functions are ideal for extracting figures and working with data that is in a database, list or table. They are also not used anywhere enough by most Excel users. We highly recommend learning as much as possible about these functions as they can be invaluable when you need to extract or work with specific data from a database list or table. The help within Excel gives a lot of information on Database Functions and is well worth reading. Although there are twelve of these functions you really only need to know how to use any one of them, as they are all very similar and all have the same syntax.

Dfunction(database,field,criteria)  All arguments are mandatory.

database: is simply the range of cells that make up your database, list or table. It must include your headings!

field: is the column within the database, list or table (database) to use in the function. Field can be given as text or the column number of the column in the database, list or table (database) .

criteria: Is a range of cells that specifies the conditions that must be met by the cells within the database to be used in the particular Dfunction. The criteria cells must contain the column heading(s) of column(s) you wish to include in the particular Dfunction.

These functions can best be seen by real examples.  We have attached a Workbook that shows various ways these can be used. Use these examples in conjunction with the help on Database Functions. Some of the examples also use the Validation feature in Excel which is found under Data Tools on the Data tab.

We have found this feature very good when needing to extract data as we can use it to create a "pick from list" in our criteria cells.

Array Formulas

Array formulas are a very powerful feature of Excel and as so they should only be used when really needed. Too many of them and your Workbook can slow down to a crawl. You will find that once you get the general gist of array formulas, the temptation will be there to use them often. We highly recommend resisting this urge for reasons already mentioned. In most cases the desired result can be obtained by one of the Dfunctions and are a far superior choice.

The Excel Help defines array formulas as:

Performs multiple calculations and then returns either a single result or multiple results. Array formulas act on one or more sets of values, known as array arguments. Each array argument must be rectangular, and each argument must have the same number of rows and/or the same number of columns as the other arguments. To return multiple results, the formula must be entered into multiple cells.

To enter an array formula, press CTRL+SHIFT+ENTER. Microsoft Excel encloses array formulas between braces ( { } ).

One other important note here is that, an Array formula CANNOT refer to an entire column.

There are a number of examples of array formulas on the www.ozgrid.com website under Array Formulas that explain simple ways these can be used. If you haven't read this yet then you really should. You will also know from using the Conditional Sum Wizard that it can aid you in writing simple Array formulas.

Let's use one of the examples from our website to explain how they work.

To get the total cost of male Spaniels sold:

=SUM(IF($A$2:$A$200="Male",IF($B$2:$B$200="Spaniel",$E$2:$E$200,0),0))

We can see by reading the formula we only want to sum the values that refer to male Spaniels sold. This is where our two IF functions come into play!  We will explain this to you in hopefully simple terms.

So what it actually does is loop through all the cells in the range "$A$2:$A$200 and $B$2:$B$200 and returns to the SUM function either 0 (zero) or the corresponding cell in $E$2:$E$200. This is why having too may Array formulas can slow down a workbook. Just imagine having 20 Array formulas each with three or more arguments and each argument is referencing 5000 cells in a range. That is (3*5000)*20 which is 300,000 cells Excel MUST loop through each cell each time the Worksheet recalculates.

The best way to apply array formulas in most cases is to get the desired result then copy and PasteSpecial as values only over the top of itself. The quickest way to do this for any cell that contains a formula is to

Summary

So while Excel is generally known for its capabilities of manipulating numbers it is also very capable of working with tables of Text and/or Numbers.  We have had a good overall look at being able to retrieve and work with data extracted from a database and/or table. Excel is not usually thought of as a database, but it is used as one in nearly every spreadsheet. However having said this, Excel is not designed to be used solely as a database and can (but not always) run into limitations if amount of data becomes very large.

Again we cannot stress enough the huge benefits that will become apparent if you take the time to learn the Dfunctions in detail!  As we have mention the help on these functions is very thorough and well worth the time to read.

 

 

Microsoft ® and Microsoft Excel ® are registered trademarks of Microsoft Corporation. OzGrid.com is in no way associated with Microsoft.

OzGrid.com accepts no responsibility for any adverse affect that may result from undertaking our training.