Why does as.character drop decimal point? - r

I'm interested to know why as.character(5.0) returns 5 but as.character(5.1) returns 5.1 in R. I tried to get an answer by reading the documentation but had no luck.

I'm interested to know why as.character(5.0) returns 5
The key word here is "returns." What do you mean by that? Note that typing this in the console gives you 5:
> 5.0
[1] 5
5 is the same things as 5.0 for the purposes of calculation. So what you probably really care about is how 5 is printed. You thus need to use joran's method or a function like sprintf.

For more precise formatting of numbers as characters, you might use format:
> format(5,nsmall = 1)
[1] "5.0"

Related

Loss of decimal places when calculating mean in R

I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic

Is there no "multiple match vector" function in R?

I was trying to find a "readily available" function to do the following:
> my_array = c(5,9,11,10,6,5,9,13)
> my_array
[1] 5 9 11 10 6 5 9 13
> my_test <- c(5, 6)
> new_match_function(my_test, my_array)
[1] 1 5 6
# or instead, maybe:
# [[1]]
# [1] 1 6
# [[2]]
# [1] 5
For my purposes, %in% is close enough, since it will return:
> my_array %in% my_test
[1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
and I could just do:
> seq(length(my_array))[my_array %in% my_test]
[1] 1 5 6
But it just seems that something like match should provide this capability: a means to return multiple elements from the match.
If I were to create a package simply to provide this solution, it would not be strongly adopted (for good reason... this tiny use case is not worth installing a package).
Is there a solution already available? If not, where is a good place for me to add this? As I showed, it's easy enough to solve without a new function, but for match to not allow for multiple matches seems crazy. I'd ideally like to either:
Find out that I'm wrong and there is a direct function to accomplish this, or
Be able to alter match itself so that it can return multiple occurrences.
But my impression (right or wrong) has been that any adjustments to the base code are more trouble than they are worth.
For simple cases, which(my_array %in% my_test) or lapply(my_test, function(x) which(my_array==x)) works fine, but those are not the most efficient.
For the first case (just knowing which are matches, not seeing to which elements they correspond), using the fastmatch-package may help, it has the %fin% (fast-in) function, that keeps a hash table of your array so that subsequent lookups are more efficient.
For the second case, there is findMatches in the S4Vectors-bioconductor-package. (https://bioconductor.org/packages/release/bioc/html/S4Vectors.html)
Note that this function doesn't return a list, but a hits-object. To get a list, you need the buioconductor IRanges-package as well (and use as.list). (https://bioconductor.org/packages/release/bioc/html/IRanges.html)

rank() function in R is ranking objects with floating points rather than integers

I'm quite new to R so this may seem quite trivial to many experienced programmers, sorry in advance!
I've got a numeric vector of length 8 that looks like this:
data <- c(45, 67, 23, 24, 5, 23, 45, 23)
When I type in: rank(data), R returns: [1] 6.5 8.0 3.0 5.0 1.0 3.0 6.5 3.0
However with my (very basic) understanding of rank, I expect R to return to me only whole numbers... such as:
[1] 6 8 2 5 1 3 7 4
How can rank() tell me the first element in data has a floating point ranking rather than a whole number ranking? Is it because there are values in data that are repeated and so rank() is trying to handle ties in a way that I am not expecting? If so, please tell me how I can fix this so I can get output that looks like what I previously expected. Also, any information on how rank() deals with NA values would be much appreciated. A basic description on rank() and what bells and whistles can be used would be fantastic! I've looked for videos on youtube and searched stackoverflow to no avail! Thanks so much.
From ?rank:
With some values equal (called ‘ties’), the argument ties.method determines the result at the corresponding indices. The "first" method results in a permutation with increasing values at each index set of ties. The "random" method puts these in random order whereas the default, "average", replaces them by their mean, and "max" and "min" replaces them by their maximum and minimum respectively, the latter being the typical sports ranking.
Sounds like you're using the default setting of "average" for tie breaking, which uses the mean, which is not necessarily an integer.
The built-in documentation should always be your first stop in looking for help. In this case (and most cases), it details all the "bells and whistles"---here there aren't many: just tie-handling and NA-handling. It also has examples at the bottom.

R: detect changing characters without loop

I'm analyzing a huge dataset of ~700000 rows.
I would like to detect where (in which rows) the character change from previous one without using loops.
For instance, in the array "dat", the ideal function would give c(4,6)
dat=c(BIS84003, BIS84003, BIS84003, BIS84005, BIS84005, BIS84006)
Does someone has any idea?
Here are two ways of doing this:
Use run-length encoding
Directly compare vectors
Method 1: Use run length encoding with the function rle().
dat=c("BIS84003", "BIS84003", "BIS84003", "BIS84005", "BIS84005", "BIS84006")
head(cumsum(rle(dat)$lengths) + 1, -1)
[1] 4 6
Method 2: compare vectors
1 + which(dat[-1] != dat[-length(dat)])
[1] 4 6
Using diff
which(!!c(0,diff(as.numeric(factor(dat)))))
#[1] 4 6

Why does rainbow() gives me an # followed by 8 digits rather 6?

I tried the function rainbow(3) in R to give me some rainbow colours. However the string that got output was
"#FF0000FF" "#00FF00FF" "#0000FFFF"
I was only expect 6 digits after #. How come it's giving 8 digits? Does it have to do with me using a 64 bit machine? How do I convert the above to 6 digits only?
I think you will get a better R color education by looking at:
?rgb
?col2rgb
demo('colors')

Resources