I’m sure that all of you R users have now noticed that sometimes R is talking to you. When you do something wrong, R replies with a message written in red in the console.
How many of you actually read those error messages? If you take the time to read them carefully, you’ll get a hint about what was wrong in your command. Let’s look at an example:
> sum(c('1','3','4','4'))
Error in sum(c("1", "3", "4", "4")) :
invalid 'type' (character) of argument
R is pointing out which statement caused the error before giving an explanation. In this example, the type of argument, a character, is invalid. You can not sum characters.
Notice the keyword “Error” at the beginning of the message. When R encounters an error, the code is not executed. It’s hard to ignore. You need to fix the error!
However, you’ll sometimes encounter “Warning” messages. Those are not related to critical errors and, in those cases, the code will run even though something might be wrong.
> mean(c('1','3','4','4'))
[1] NA
Warning message:
In mean.default(c("1", "3", "4", "4")) :
argument is not numeric or logical: returning NA
Again, how many of you actually read those warning messages? It’s tempting to just ignore them. Especially since R does not display them directly when numerous messages are generated. When more than 50 warnings arise, you actually don’t see them. Instead, there’s a message telling you to use the command/function warnings()
to display them.
But it’s important to look at the warnings to assess if you need to fix the code that triggered them or not. If you don’t, you won’t know for sure that the returned value is the good one.
Let me convince you with a case I’ve run into a few times this summer. When you want to compare two groups and you don’t know if they are normally distributed, a non-parametric test is indicated. The Wilcoxon Rank Sum test (also known as Mann-Withney U test or Mann-Withney-Wilcoxon test) tests whether two independent vectors of observations are drawn from the same distribution. This test is based on the ranking of all the values; if the two groups come from the same population, there won’t be any pattern emerging from the arrangement of the data. If they come from different populations, a pattern will be discernable.
Group A observations : | 4 | 8 | 9 | 10 | 5 | 11 |
Group B observations : | 1 | 3 | 5 | 8 | 6 | 9 |
The values are ordered and given a rank. When ties occur, the values are given the average of the ranks. The sum of the ranks is then computed.
Group : | B | B | A | B | A | B | A | B | B | A | A | A |
Observations : | 1 | 3 | 4 | 5 | 5 | 6 | 8 | 8 | 9 | 9 | 10 | 11 |
Rank : | 1 | 2 | 3 | 4.5 | 4.5 | 6 | 7.5 | 7.5 | 9.5 | 9.5 | 11 | 12 |
wA = 3+4.5+7.5+9.5+11+12 = 47.5
wB = 1+2+4.5+6+7.5+9.5 = 30.5
These values are then used to assess whether the null hypothesis is true or false.
Applying wilcox.test()
in R will give the following output:
Wilcoxon rank sum test with continuity correction
data: x and y
W = 26.5, p-value = 0.1978
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x, y) : cannot compute exact p-value with ties
You get a p-value, but it’s derived from a normal approximation. With only a few values per group, this can lead to a not-so-good estimation. Well, you’ve been warned!!
Leave A Comment