3 March 2019 | 6 minutes of reading time
One of our most frequently used functions in Klipfolio is the LOOKUP function. Unfortunately, this is not one of the easiest functions. A lot of our clients come to us with questions regarding this function, when it does not quite work as expected. Especially in the beginning I had to get used to it as well.
I would like to explain the structure of the LOOKUP function. But, before I do this, I have to explain a few different concepts: GROUP, GROUPBY, SELECT & IF. Especially the difference between IF and SELECT is interesting.
The grouping function sees to it that you can take a series of values and make unique values out of this.
So, in column A:A you have the values apple, apple, banana, lemon, lemon, lemon, pear, pear, and you want to know the unique values. In this case you can use GROUP(A:A), or, sometimes GROUP(SLICE(A:A)) if your column has a header.
Imagine, you want to calculate a specific value (measure) per group (dimension). Then, you can use GROUPBY. If the values belonging to fruit, 1, 2, 3, 4, 5, 6, 7, 8, are shown in column B:B and we want to sum those, we can use GROUPBY. This function takes the format of GROUPBY(dimension, measure, aggregation function). However, if you leave the last statement blank, GROUPBY will automatically take the sum.
So, GROUPBY(A:A,B:B) returns the values 3 (1+2), 3 (3), 15 (4+5+6), and 15 (7+8). The values return automatically in the order of which GROUP has been sorted, which is alphabetically.
A SELECT function is used when you only want to measure specific values. Suppose that in our previous example we only want the lemons. In this case what we really want to code is: select the values from column B with the requirement that column A has a specific value. Or: SELECT(B:B, A:A = “Lemon”).
A SELECT statement will throw away records. We started with 8, but after using SELECT we will be left with only 3. You can also see this when you evaluate the formula with the lightning bolt: if you hover over column B:B you are shown 8 values. Likewise, if you are selecting the part ‘A:A = “lemon”’ you are shown 8 values (true or false). But, if you select the entire formula, you are shown only 3 values. These are 4, 5, and 6.
It is important to realize that both columns of the select formula should have the same amount of records. If this is not the case, the results of the formula can be unexpected. Problems often arise at for example empty cells or NULL values, especially at the bottom of a data source.
When using the IF function, you can test whether a specific value satisfies certain criteria. Does this hold, then you will do “X”. Does this not hold, then you will do “Y”. In practice “X” or “Y” is usually the number 0. In other words: IF(the value in column A:A = “lemon”, B:B, 0). That is: If in column A there is a value “lemon”, take the value from column “B”, or else take value 0. The result is 0, 0, 0, 4, 5, 6, 0, 0.
It is important to realize that the 3 parts of the formula need to have the same amount of records. *) Likewise, the end result needs to have the same amount of results. Is this not the case, then yet again the formula can have unexpected outcomes.
A LOOKUP function has 3 components: LOOKUP(specific items, where to search, return value).
If we dissect this:
In the first argument are the values that we are searching for. This can be for example months, such as ““2018-07”, “2018-08”, “2018-09”, 2018-10” etc. In this example, let us assume that we want to show those 4 months, and not the other months. In practice these are often de values of the X-axis in a graph.
In the second argument we have the entire batch of values, in which the values from the first argument can be located. This is allowed to be in a different format. However, then we have to change these in a formula.
It can be the case that there are many more months in the source data. These values will not return in the formula. That is what you want, because in the first argument you declared what you want to look up.
It could also be the case that there aren’t that many values in the source data. For example, you have the values of July and August 2018, but (not yet) those of September and October. However, you do want to show those in your graph. In this case the first component of the lookup function will not return in the second component. Here, the lookup formula will return an empty cell.
In the third argument you explain what exactly you are going to calculate. So, for example, we have looked up those 4 months in a batch of data (1st component), we found those months (2end component), in for example column A:A), and now we want to return the value that exists in a different column, for example B:B.
The amount of values from argument 1 is in most cases different than the amount of values in argument 2. But, the amount of values in component 2 should be equal to the amount of values in argument 3. If this is not the case, something is going wrong.
In this example I assume that we have a graph, with on the X-axis months and on the Y-axis the amount of sessions to a particular website. The X-axis is labeled ‘Months’.
There are a few possibilities:
1. The data source in which we search has always 1 value per month. For example: total amount of sessions per month.
In column A:A we have months, in column B:B we have the amount of sessions in that month.
The formula becomes:
LOOKUP(&maanden, A:A, B:B)
maanden = months
2. The data source in which we search always has multiple values per month. For example: amount of sessions per day.
In column C:C we have a date, in column D:D the amount of sessions per day.
The formula becomes:
In other words: at the second argument a GROUP and at the third component a GROUPBY which on its own term has 2 components.
The first component is the exact same as the part in the GROUP of the second component.
3. The data source in which we are searching has another dimension besides months.
For example: Total amount of sessions per month per channel.
The months, channels, and specific numbers are reported in column C:C, D:D, and, E:E respectively.
The formula for the channel “organic” becomes:
LOOKUP(&maanden, select(C:C, D:D = “organic”), select(E:E, D:D = “organic”))
LOOKUP(&maanden, GROUP(C:C),GROUPBY(C:C,IF(D:D = “organic”,E:E,0))
At the SELECT formula we only select records that we want.
At the IF formula we actually select everything, but only if the criteria are satisfied than the values are returned.
4.In the data source we have another dimension besides months and the source is in days, not months.
In column C:C the days are represented, in column D:D the channels, and in column E:E the amounts.
The formula for the channel “organic” then becomes:
LOOKUP(&maanden, GROUP(SELECT(DATE_CONVERT(C:C,”yyyyMMdd”,”yyyy-mm”),D:D = “organic”),GROUPBY(SELECT(DATE_CONVERT(C:C,”yyyyMMdd”,”yyyy-mm”),D:D = “organic”,select(E:E, D:D = “organic”))))
LOOKUP(&maanden, GROUP(date_convert(C:C,”yyyyMMdd”,”yyyy-MM)), GROUPBY(date_convert(C:C,”yyyyMMdd”,”yyyy-MM),IF(D:D = “organic”,E:E,0))
At the select formula we only select the records that we want. Component 2 as well as both parts of component 3 get the SELECT function.
For the IF function we do select all records, but actually only in the last part we say that we only want to use the values that satisfy the criteria.
So, when do you use IF and when SELECT? Every situation is unique, so the silly answer is ‘it depends’. We do have a few precepts:
Are you using the LOOKUP function, but are you running into problems? Feel free to contact us. We can always have a look at it with you without any commitments!