LESSON WORKBOOK: Level 2 Lesson 4 2007.xlsx
This lesson will be used to look at the various methods available within Excel to extract required information from databases and tables. There are many ways in which this can be done, but some ways are far better than others. By far the most important factor in this is how the information is laid out in the first place. By following these six rules (also mentioned in Lesson 1), life will be much easier come time to extract the information.
When putting in headings bold the font, this will help Excel recognise them as headings when you use one of its functions such as sorting data
When putting data into the data area of your spreadsheet try to avoid blank rows and columns if possible. This is because a lot of Excels built in features will assume a blank row or column is the end of your data. It also helps with a lot of formulas.
Have your data sorted if possible. Excel is very rich in what are known in Lookup & Reference formulas and many of these rely on your data being sorted in a logical order.
Use real dates for headings and format them appropriately. By this I mean if you want the names of the months as headings type them in as 1/1/2011, 1/2/2011, 1/3/2011 etc then format them as mmmm. This is a very simple procedure that is all too often overlooked by many. If you have real dates as headings life will be much easier further down the road when you need to use them in formulas.
Avoid merging cells if possible. Use Centre across selection instead. This can be found by going under Alignment options on the Home tab and selecting from the Horizontal box. Merging cells can cause all sorts of problems later on that that will have you scratching your head.
Don't put in one cell what could go in more than one cell. For example, say you have the names of 100 people to put in your spreadsheet. Don’t put their full name in one cell. Instead put the First name in one cell then their surname in the next cell to the right. If you need to place them into one single cell at a later stage, this can be done very easily, the same cannot always be said for the reverse.
The Functions that we will look at in this lesson are best suited for extracting information. These are the Lookup & Reference Functions, the Database Functions and Array Formulas.
However, before we start, we must stress that we will only use array formulas as a very last resort. Our reason for this is that over the years we have been helping others use Excel, we have consistently found that when users first learn how to use and write array formulas, they tend to fall back on them all too often. The major drawback to this is that you will end up with a workbook that becomes extremely slow in re-calculating. You must remember also, that Excel by default re-calculates when you open, save or enter data into a cell that a formula is dependent on. All too often we have seen workbooks that have become virtually unusable due to the amount of array formulas.
The other subject we will look at is the means and methods of deciding which formula to use in which situation. As Excel has tons of Functions available, this can sometimes seem a bit daunting. Keep in mind that although Excel is very rich in Functions, you will find that most likely you will be using the same set of Functions over and over again. You must always keep in mind that Excel was designed for all people from all walks of life and businesses to use, by this I mean it contains many Functions that would be of no use at all to some people, but be the "be all and end all" to others. A couple of brief examples of this would be some of Excel's engineering and financial Functions, unless you were in that field of expertise they would be of no use at all.
We will look at the means and methods of deciding which formula to use in which situation first and then we will move on to ways and means of extracting information from tables and databases.
One of the many problems that can crop up when using Excel is trying to figure out which function to use in which situation. There
are certainly no hard and fast rules to this as more often than not you will need to use a nested formula that will incorporate many types of Functions. Whichever way you go do not fall into the trap of trying to make your formula as short as possible.
We constantly see 'so called' experienced users replacing False with 0 (zero),
True with 1 and formulas like: =IF(A1,
Anyway enough of that. Before you construct your formula you should ask yourself these five questions.
Is the data text or numeric?
Do you want text or a number as your result?
Do you need to locate your data first?
Does the result depend on the data type or value?
Are you dealing with dates and/or times?
If your data is Numeric then you can possibly rule out the need for any of the Text functions.
If your data is Text and you want a Numeric result you will very likely need one of the Text functions.
If you need to locate your data first you will very likely need one of the Lookup & Reference functions.
If your result depends on the data type or value you will need one of the Logical functions.
If you are dealing with dates and/or times then you may need one of the Date & Time functions
Excels Insert Function (Shift+F3) is always a good place to start. Under the heading Or Select a Category are the Function categories. If you know the category you want, you can select from this list, or if you do not know which Function Category you want, once you have the Insert Function dialog on screen, click in the Search for a Function box and type in some key words of the result you want. Don't be too vague though and don't be to detailed. If you were wanting to find some data that is within some more data and you just type "Find" you will end up with a very long list. The same problem would arise (to a lesser degree ) if you typed "Find some data in a list of data". But simply type "Find data" and you will get a reasonable list.
Let's look at the ones you are most likely to need and when to use them.
We will use a real problem and work through it to see how we might apply the "Decision making" process. This is only one example, but I believe it will demonstrate how it can be applied.
If your data contains a period of time you should consider using one of the Date & Time functions. This is not always completely necessary because, if you only need to find the difference between two dates you can simply subtract one date from the other.
Suppose you have the date 22/12/2011 in cell A1 and 15/4/2011 in cell A2 you could use: =A1-B1 and format your result cell as General to find out the number of days between the two dates. But what if we need to find out the months between the two dates? We would use: =MONTH(A1)-MONTH(A2). Of course this is not a problem because the years in both dates are the same, but what if they are not! This is where we ask ourselves the 5 questions:
Is the data text or numeric? Answer= numeric!
Do you want text or a number as your result? Answer= number!
Do you need to locate your data first? Answer= no!
Does the result depend on the data type or value? Answer= no!
Are you dealing with dates and/or times? Answer= yes!
We need to take into account the year and month portion of our dates. From this it is most likely a Date & Time function is called. We are being totally honest here when we say we have never had to do this! You will see why soon. Let's work through this problem step by step. We will assume we have not used: : =MONTH(A1)-MONTH(A2) yet. Make sure the dates: 22/12/2011 is cell A1 and 15/4/2011 in cell A2
Push Shift+F3 and select Date & Time from Or Select a Category
First thing we are looking for is a function that mentions Years and/or Months
You should be able to see MONTH so click on it. The help text at the bottom of the Paste function box tells me that the argument type must be a serial number ie; MONTH(Serial_Number). That's good, as our dates are serial numbers (as are all dates)
The description below this tells me: Returns the month a number from 1 (January) to 12 (December). Sounds like a good start, let's try it. With MONTH still selected click OK
Now click the collapse dialog box and select cell A1, click it again (the collapse dialog box ) and look at the "formula result" it reads is 12. The result we expected!
Lets use the same function on our date in cell A2 and subtract the result from our MONTH(A1)
Click in the formula bar and type a - (minus sign) after MONTH(A1), ie; MONTH(A1)-
Now from the box to the left of the formula bar select "MONTH" and repeat step 5, but do it on cell A2. You will notice the Function result is 8, and therefore our formula result is 8 click OK. Our end formula should be: =MONTH(A1)-MONTH(A2)
Your result of 8 is the number of months between 12 (December) and 4 (April). Now we have to do the same for the years.
Select any cell and Push Shift+F3 and select Date & Time. This time we are only interested in the year portion of our date, so we basically repeat steps 3 to 9 substituting months for years. Our end formula is: =YEAR(A1)-YEAR(A2)
We end up with the result 2, the number of years between 2009 and 2011. Now we know that there are 12 months in each year so we add to our Year formula to make it: =YEAR(A1)-YEAR(A2)*12 Enter this and we get ###. This is because we need to change the syntax. Excel is multiplying YEAR(A2) by 12 then subtracting this from YEAR(A1). Which gives us a negative date. Excel cannot handle negative Dates and/or Times!
So we add some parenthesis to change this, eg; =(YEAR(A1)-YEAR(A2))*12 and format the cell as General. Now we get 24. The number of months between the years 2009 and 2011.
Let's now add the two formulas together, eg; =MONTH(A1)-MONTH(A2)+(YEAR(A1)-YEAR(A2))*12 We get a result of 32. This looks pretty good to me!
Before we went through this step by step we stated that we were "being totally honest here when we say we have never had to do this!" The reason we have never had to do this before is because there is a little known about Excel function that can do this for us! It's called DATEDIF, you will not find much documentation on this function as it is only supplied for compatibility with Lotus 1-2-3. The DATEDIF calculates the number of days, months, or years between two dates. It's syntax is: DATEDIF(start_date,end_date,unit). Unit is the number of what you want returned.
Y = The number of complete years in the period
M = The number of complete months in the period.
D = The number of days in the period.
MD = The difference between the days in start_date and end_date. The months and years of the dates are ignored.
YM = The difference between the months in start_date and end_date. The days and years of the dates are ignored.
YD = The difference between the days of start_date and end_date. The years of the dates are ignored.
So if we use the formula =DATEDIF(A2,A1,"M") on our dates to check if the formula we wrote is correct we get: 32 also.
Although this is only one situation out of a possible infinite number, it should give you a good idea of a logical process that could be applied to many other situations.
Extracting data from tables and databases is something that is done quite frequently from within Excel. We haven't seen too many spreadsheets where extracting data is not needed. The information below is meant as a generic guide to extracting data from a database or table. The one thing you should always look for when faced with the problem of how to extract information is "what won't change!" In other words you must establish a constant from all the variables. Lets look at some ways this can be done!
In the Lookup & Reference example below we will work through it with you to again try and demonstrate the possible thought process we would use to arrive at a solution. This will end up being a relatively deeply nested formula and will include a number of different functions. We are using this as an example because it will end up being a very generic solution.
Quite possibly the most common of these type of Functions is the VLOOKUP. This because it will look within the first column of a nominated table array for a nominated value and return the value from the same row in a nominated column. We looked at the VLOOKUP in the lesson on "Specific Worksheet Formulas" so we will move on to a more difficult situation.
When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number. Once we have this we can extract the data from the table or database. Remember that Excel uses a grid method to locate data, a very simple but effective method. If it works for Excel - it will work for us!
Let's say we have a huge table of data that consists of 40 columns and 5000 rows. We need to find 6 single cells within a table of 200,000 cells (40*5000). We have of course set up our spreadsheet by the rule book. Our table has been named AllInfo in other words AllInfo refers to the range A1:T5000. but the data we need to extract poses the following problems.
It could be in any row.
It could be in any column.
The row number can and does vary.
The column number can and does vary.
The data wanted is the names of the 3 largest and the 3 smallest tax payers and these change daily.
All we know for certain is that the column heading Tax does not change, but it's positions can! The names are always in the column immediately before the Tax column.
So in a nutshell, what we need to do is, find the 3 largest and the 3 smallest tax figures in the Tax column and then return the name immediately to the left.
The very first thing we are going to do is give the top row (headings) a named range. We'll call it TpRwAllData and it will refer to A1:T1.
The VLOOKUP requires that we know the column position and we have been told that "The column number can and does vary". It also requires we look for a known value in the first column of our table and we have no idea what that might be. So VLOOKUP can virtually be ruled out!
Let us use some positive information and work from there. The biggest help is the column heading Tax. If we can find that and get a reference to it we should then also be able to get a reference to the column immediately before it.
Click through the Lookup & Reference functions and the functions that appear to be of use are ADDRESS, HLOOKUP, INDEX, MATCH, INDIRECT, and OFFSET. If we can create a reference to the Tax column we could look within it for the 3 largest and smallest tax figures.
I like the look of MATCH because it "Returns the relative position in an array...". So lets try it and see if we could use it to find the Tax column.
=MATCH("Tax",TpRwAllData,0) gives a result of 7 (column G)
Where "Tax" is the value to find.
TpRwAllData is the first row in our table AllData
0 (zero) tells it to find an exact match.
So our result of 7 is the position of Tax in the range A1:T1.
Now we have this we can be very confident that we are on the right track. Remember "When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number" We have the Column number, so now I'm going to use this nested with the ADDRESS function to create a reference to the Tax column.
=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) gives a result of $G$1
Where 1 is the Row number.
MATCH("Tax",TpRwAllData,0) is the column number.
Now we can apply the same function again but instead of 1 as the Row number we will use 5000 (the last row of our table)
=ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) gives a result of $G$5000
If we now join these using the & (Ampersand) and include a : (Colon) we can create a reference to our Tax column that will always be the address of the Tax column no matter what position it is in. In other words it will be a variable range.
=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))
This returns the result $G$1:$G$5000.
Now we can use this to find our 3 largest and the 3 smallest tax figures. There is one very important thing to remember here though and that is the result $G$1:$G$5000 will be seen by Excel as a Text string and not a range reference! To force Excel to see this as a range reference we only need to nest it within the INDIRECT function. The INDIRECT function "Returns the reference returned by a text string".
What we will show you now is a neat little trick that is very much like creating your very own Custom Function. The same can be applied to just about any nested or non-nested function and comes in very handy!
Push Ctrl + ` (circumflex) This will show all formulas on the Worksheet.
Push Ctrl + F3 This will display the Define name dialog box.
Select New
In the Names: box type the name: TaxCol
In the Refers to box type: =INDIRECT()
Now type the formula: =ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) within the parenthesis of "=INDIRECT()" It should look like:
=INDIRECT(ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))) Use the formula showing on the Worksheet as your guide.
Click Add then click OK
Push Ctrl + ` (circumflex). This will now hide all formulas on the Worksheet.
What we have now is a variable named range that will always refer to our Tax column!
We now need to find the 3 largest and the 3 smallest tax figures in the named range TaxCol so let's type the word Largest in the Search for a Function box in the Insert Function dialog and then Smallest. This is an easy one as there are two functions that will do exactly what we need, they are the LARGE and SMALL functions.
=LARGE(TaxCol,1)
Where TaxCol is the range to look in.
1 is telling it we want the largest value.
What we intend to do now is to use the INDEX function to create a reference to the name of the largest tax payer.
The syntax for INDEX is =index(array,row_num,column_num)
=INDEX(AllData,MATCH(LARGE(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)
As you can see we have colour coded the formula to help you see what each nested function is being used for. The -1 is returning the column number of the column immediately before the Tax column. We can now easily find the 3 largest tax payers within our table all we need to do is substitute LARGE(TaxCol,1) with LARGE(TaxCol,2) and LARGE(TaxCol,3).
For the 3 smallest tax payers we substitute the LARGE function with the SMALL function, eg; =INDEX(AllData,MATCH(SMALL(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)
We realise that this can seem a bit daunting at first, but take your time and work through it, as the solution to this one difficult problem can be applied in many situations.
Excel includes twelve Database functions and they are referred to as the Dfunctions. These functions are ideal for extracting figures and working with data that is in a database, list or table. They are also not used anywhere enough by most Excel users. We highly recommend learning as much as possible about these functions as they can be invaluable when you need to extract or work with specific data from a database list or table. The help within Excel gives a lot of information on Database Functions and is well worth reading. Although there are twelve of these functions you really only need to know how to use any one of them, as they are all very similar and all have the same syntax.
Dfunction(database,field,criteria) All arguments are mandatory.
database: is simply the range of cells that make up your database, list or table. It must include your headings!
field: is the column within the database, list or table (database) to use in the function. Field can be given as text or the column number of the column in the database, list or table (database) .
criteria: Is a range of cells that specifies the conditions that must be met by the cells within the database to be used in the particular Dfunction. The criteria cells must contain the column heading(s) of column(s) you wish to include in the particular Dfunction.
These functions can best be seen by real examples. We have attached a Workbook that shows various ways these can be used. Use these examples in conjunction with the help on Database Functions. Some of the examples also use the Validation feature in Excel which is found under Data Tools on the Data tab.
We have found this feature very good when needing to extract data as we can use it to create a "pick from list" in our criteria cells.
Array formulas are a very powerful feature of Excel and as so they should only be used when really needed. Too many of them and your Workbook can slow down to a crawl. You will find that once you get the general gist of array formulas, the temptation will be there to use them often. We highly recommend resisting this urge for reasons already mentioned. In most cases the desired result can be obtained by one of the Dfunctions and are a far superior choice.
The Excel Help defines array formulas as:
Performs multiple calculations and then returns either a single result or multiple results. Array formulas act on one or more sets of values, known as array arguments. Each array argument must be rectangular, and each argument must have the same number of rows and/or the same number of columns as the other arguments. To return multiple results, the formula must be entered into multiple cells.
To enter an array formula, press CTRL+SHIFT+ENTER. Microsoft Excel encloses array formulas between braces ( { } ).
One other important note here is that, an Array formula CANNOT refer to an entire column.
There are a number of examples of array formulas on the www.ozgrid.com website under Array Formulas that explain simple ways these can be used. If you haven't read this yet then you really should. You will also know from using the Conditional Sum Wizard that it can aid you in writing simple Array formulas.
Let's use one of the examples from our website to explain how they work.
To get the total cost of male Spaniels sold:
=SUM(IF($A$2:$A$200="Male",IF($B$2:$B$200="Spaniel",$E$2:$E$200,0),0))
We can see by reading the formula we only want to sum the values that refer to male Spaniels sold. This is where our two IF functions come into play! We will explain this to you in hopefully simple terms.
When entered as an Array formula (Ctrl+Shift+Enter) IF($A$2:$A$200="Male" will return TRUE for all cells within the range "$A$2:$A$200" that contain the text "Male".
For each cell that does return TRUE ("Male") it will then look at the next TRUE part of the IF function, ie: IF($B$2:$B$200="Spaniel".
If it does not equal "Male" it returns FALSE and so returns the FALSE part of the IF function ie; 0 (zero).
Once it is in the TRUE part of the IF function, ie: IF($B$2:$B$200="Spaniel" it does the same, except this time it looks for "Spaniel".
If this is TRUE it moves in the TRUE part of this IF function ie; $E$2:$E$200 which is part of the SUM function. So it sums the cell on the same row as Male and Spaniel (TRUE and TRUE). Again if it does not equal "Spaniel" it moves onto the FALSE part of the IF function which is again 0 (zero) ie; "),0"
So what it actually does is loop through all the cells in the range "$A$2:$A$200 and $B$2:$B$200 and returns to the SUM function either 0 (zero) or the corresponding cell in $E$2:$E$200. This is why having too may Array formulas can slow down a workbook. Just imagine having 20 Array formulas each with three or more arguments and each argument is referencing 5000 cells in a range. That is (3*5000)*20 which is 300,000 cells Excel MUST loop through each cell each time the Worksheet recalculates.
The best way to apply array formulas in most cases is to get the desired result then copy and PasteSpecial as values only over the top of itself. The quickest way to do this for any cell that contains a formula is to
Select the cell containing the formula
Right click on any cell border and drag it across, up or down one cell.
Now drag back and release and select Copy Here as Values Only
So while Excel is generally known for its capabilities of manipulating numbers it is also very capable of working with tables of Text and/or Numbers. We have had a good overall look at being able to retrieve and work with data extracted from a database and/or table. Excel is not usually thought of as a database, but it is used as one in nearly every spreadsheet. However having said this, Excel is not designed to be used solely as a database and can (but not always) run into limitations if amount of data becomes very large.
Again we cannot stress enough the huge benefits that will become apparent if you take the time to learn the Dfunctions in detail! As we have mention the help on these functions is very thorough and well worth the time to read.
OzGrid.com accepts no responsibility for any adverse affect that may result from undertaking our training.