modulus bug in R [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why are these numbers not equal?
Just noticed this bug in R. I'm guessing it's the way 0.6 is represented, but anyone know exactly what's going on?
According to R:
0.3 %% 0.2 = 0.1
0.4 %% 0.2 = 0
0.5 %% 0.2 = 0.1
**0.6 %% 0.2 = 0.2**
0.7 %% 0.2 = 0.1
0.8 %% 0.2 = 0
What's going on?

In addition to #joshua Ulrich's comment
from ?'%%'
%% and x %/% y can be used for non-integer y, e.g. 1 %/% 0.2, but the results are subject to representation error and so may be platform-dependent. Because the IEC 60059 representation of 0.2 is a binary fraction slightly larger than 0.2, the answer to 1 %/% 0.2 should be 4 but most platforms give 5.
also similar to why we get this
> .1 + .1 + .1 == .3
[1] FALSE
as #Ben Boker pointed out, you may want to use something like
> 3:8 %% 2 / 10
[1] 0.1 0.0 0.1 0.0 0.1 0.0

Related

Normalize blocks/sub-matrices within a matrix

I want to normalize (i.e., 0-1) blocks/sub-matrices within a square matrix based on row/col names. It is important that the normalized matrix correspond to the original matrix. The below code extracts the blocks, e.g. all col/row names == "A" and normalizes it by its max value. How do I put that matrix of normalized blocks back together so it corresponds to the original matrix, such that each single value of the normalized blocks are in the same place as in the original matrix. I.e. you cannot put the blocks together and then e.g. sort the normalized matrix by the original's matrix row/col names.
#dummy code
mat <- matrix(round(runif(90, 0, 50),),9,9)
rownames(mat) <- rep(LETTERS[1:3],3)
colnames(mat) <- rep(LETTERS[1:3],3)
mat.n <- matrix(0,nrow(mat),ncol(mat), dimnames = list(rownames(mat),colnames(mat)))
for(i in 1:length(LETTERS[1:3])){
? <- mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] / max(mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]])
#For example,
mat.n[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] <- # doesn't work
}
UPDATE
Using ave() as #G. Grothendieck suggested works for the blocks, but I'm not sure how it's normalizing beyond that.
mat.n <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
Within block the normalization works, e.g.
mat[rownames(mat)=="A",colnames(mat)=="A"]
A A A
A 13 18 15
A 38 33 41
A 12 18 47
mat.n[rownames(mat.n)=="A",colnames(mat.n)=="A"]
A A A
A 0.2765957 0.3829787 0.3191489
A 0.8085106 0.7021277 0.8723404
A 0.2553191 0.3829787 1.0000000
But beyond that, it looks weird.
> round(mat.n,1)
A B C A B C A B C
A 0.3 0.2 0.1 0.4 0.2 1.0 0.3 0.9 1.0
B 0.9 0.8 0.9 0.4 0.5 0.4 0.4 0.9 0.0
C 0.0 0.4 0.4 0.0 0.8 0.5 0.4 0.9 0.0
A 0.8 0.9 0.5 0.7 0.9 0.6 0.9 0.4 0.4
B 0.1 0.8 0.7 1.0 0.3 0.5 0.1 1.0 0.8
C 0.4 0.0 0.2 0.2 0.2 0.6 1.0 0.4 1.0
A 0.3 0.4 0.3 0.4 0.6 0.8 1.0 1.0 0.3
B 0.6 0.2 0.5 0.9 0.3 0.2 0.9 0.3 1.0
C 0.5 0.9 0.7 1.0 0.4 0.5 1.0 1.0 0.9
In this case, I would expect 3 1s across the whole matrix- 1 for each block. But there're 10 1s, e.g. mat.n[3,2], mat.n[1,9]. I'm not sure how this function normalized between blocks.
UPDATE 2
#Original matrix.
#Suggested solution produces `NaN`
mat <- as.matrix(read.csv(text=",1.21,1.1,2.2,1.1,1.1,1.21,2.2,2.2,1.21,1.22,1.22,1.1,1.1,2.2,2.1,2.2,2.1,2.2,2.2,2.2,1.21,2.1,2.1,1.21,1.21,1.21,1.21,1.21,2.2,1.21,2.2,1.1,1.22,1.22,1.22,1.22,1.21,1.22,2.1,2.1,2.1,1.22
1.21,0,0,0,0,0,0,0,0,292,13,0,0,0,0,0,0,0,0,0,0,22,0,0,94,19,79,0,9,0,126,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,0,0,155,166,0,0,0,0,0,0,4,76,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,201,0,0,79,0,0,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,33,0,91,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,8,0,0,0,0,0,0,0,404,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,37,26,18,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,162,79,1,0,0,0,0,0,0,0,0,10,0,27,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,33,17,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,207,0,0,0,0,1644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,16,17,402,0,0,0,606,0,0,0,0,0,0,0,0,0,0,0,0
1.22,13,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26,0,0,15,0,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,374,6,121,6,21,0,0,0,0
1.1,0,0,0,44,0,0,0,0,0,0,0,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,0,0,0,0,0,0,0,0,0,0
1.1,0,0,0,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,18,0,0,0,0,353,116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,0,5,0
2.2,0,0,0,0,0,0,0,37,0,0,0,0,0,4,0,0,0,36,46,62,0,0,0,0,0,0,0,0,0,0,73,0,0,0,0,0,0,1,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,0,0,0,0,0,0,0,38,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2.2,17,0,23,0,0,0,444,65,0,0,0,0,0,0,0,78,0,0,42,30,15,0,0,0,0,0,0,0,4,0,18,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,75,8,0,0,0,0,0,0,0,87,0,74,0,85,0,0,0,0,0,0,0,0,1,0,19,0,25,0,0,0,0,0,0,0,0,0
2.2,0,0,13,0,0,0,12,20,0,0,0,0,0,0,0,118,0,29,92,0,25,0,0,0,0,0,0,0,0,0,16,0,48,0,0,0,0,0,0,0,0,0
1.21,14,0,1,0,0,0,0,0,17,0,0,0,0,0,0,0,0,0,0,14,0,0,0,0,0,0,0,0,3,0,20,0,0,0,0,0,0,0,0,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,204,0,0,0,0,0,0,0,133,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,44,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,67,0,0,0,0,0,0,143,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,12,15,0
1.21,79,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,38,26,6,9,0,112,0,0,0,0,0,0,0,0,0,0,0,0
1.21,11,0,0,0,0,17,0,0,49,0,0,0,0,0,0,0,0,0,0,0,0,0,0,28,0,0,0,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,40,0,0,0,0,0,0,0,122,0,0,0,0,0,0,0,0,0,0,0,3,0,0,24,11,0,887,20,0,389,0,0,0,0,0,0,0,0,0,0,0,0
1.21,14,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,50,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.21,34,0,0,0,0,26,0,0,56,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54,9,297,13,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,39,0,0,0,0,0,0,0,0,25,0,17,12,20,25,0,0,0,0,0,0,0,0,0,393,0,7,0,0,0,0,0,0,0,0,0
1.21,177,0,0,0,0,8,0,0,775,0,0,0,0,0,0,0,0,0,0,0,0,0,0,113,0,227,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2.2,0,0,0,0,0,0,21,17,0,0,0,0,0,0,0,0,0,42,30,16,0,0,0,0,0,0,0,0,165,0,0,0,0,0,0,0,0,0,0,0,0,0
1.1,0,6,0,28,0,0,0,0,0,0,0,9,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,4,37,0,0,0,0,0,0,0,0,3,0,0,0,0,14,7,0,0,18,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,44,785,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,21,0,44,177,13,24,0,0,0,0
1.22,0,0,0,0,0,0,30,0,0,182,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,12,0,1231,135,17,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73,1308,0,669,16,0,0,0,8
1.21,0,0,0,0,0,0,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,33,197,626,0,44,0,0,0,0
1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,37,12,80,0,0,0,0,16
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,54,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,0,0,0
2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,58,0,1,0,0,0,0,28,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,2,0,0
1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,9,0,0,0,0"))
ids <- read.csv(text=",x
1,1.21
2,1.1
3,2.2
4,1.1
5,1.1
6,1.21
7,2.2
8,2.2
9,1.21
10,1.22
11,1.22
12,1.1
13,1.1
14,2.2
15,2.1
16,2.2
17,2.1
18,2.2
19,2.2
20,2.2
21,1.21
22,2.1
23,2.1
24,1.21
25,1.21
26,1.21
27,1.21
28,1.21
29,2.2
30,1.21
31,2.2
32,1.1
33,1.22
34,1.22
35,1.22
36,1.22
37,1.21
38,1.22
39,2.1
40,2.1
41,2.1
42,1.22")
mat <- mat[,-1]
rownames(mat) <- ids$x
colnames(mat) <- ids$x
ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
Any help is much appreciated, thanks.
Use ave to get the maxima:
mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
For example, there are 9 ones, as expected, and there is one 1 in each block also as expected. (There could be more than 9 if the matrix happened to have multiple maxima in one or more blocks but there shoud not be less than 9.)
set.seed(123)
mat <- matrix(round(runif(90, 0, 50),),9,9)
rownames(mat) <- rep(LETTERS[1:3],3)
colnames(mat) <- rep(LETTERS[1:3],3)
ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)
sum(ans == 1)
## [1] 9
# there are no duplicates (i.e. a block showing up more than once) hence
# there is exactly one 1 in each block
w <- which(ans == 1, arr = TRUE)
anyDuplicated(cbind(rownames(mat)[w[, 1]], colnames(mat)[w[, 2]]))
## [1] 0
ADDED
If some blocks are entirely zero (which is the case in UPDATE 2) then you will get NaNs for those blocks. If you want 0s instead for the all-zero blocks try this:
xmax <- function(x) if (all(x == 0)) 0 else x/max(x)
ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = xmax)

Round sequence of numbers to chosen numbers

I got a vector of numbers from 0 to 1. I'd like to divide them to X amount of groups - for example if X=5, then round the numbers to 5 groups: all numbers from 0 to 0.2 will be 0, all from 0.2 to 0.4 will be 0.2, etc.
For example, if I have x <- c(0.34,0.07,0.56) and X=5 like the above explanation, I'll get (0.2, 0, 0.4).
So far, the only way I found to that is by looping over the entire vector. Is there a more elegant way to do that?
You can simply do:
floor(x*X)/X
# [1] 0.2 0.0 0.4
More testing cases:
X = 10
floor(x*X)/X
# [1] 0.3 0.0 0.5
X = 2
floor(x*X)/X
# [1] 0.0 0.0 0.5
X = 5
floor(x*X)/X
# [1] 0.2 0.0 0.4
Data:
x <- c(0.34,0.07,0.56)
Try:
cut.alt <- function(x, X) {
out <- cut(x, breaks=(1:X-1)/X)
levels(out) <- as.character((1:X-1)/X)
out
}
cut with breaks set to (1:X-1)/X divides the vector x into groups like OP asks. Then changing the levels to the value of the cutoff gives the answer.
Or using plyr:
library(plyr)
round_any(x, 1/X,floor)
# [1] 0.2 0.0 0.4

Trouble transforming a data set in R; making a look up table

R (programming language)
I would like to transform my data set that has sample numbers, treatment days and concentrations (variable); to set it up as a single matix where the cells are filed with only concentration values. My output is a lookup table, where the user can look up a sample number along the 1st row and a day along the first column (header), and follow these along to get a concentration.
This is not my data set (it comes as a matrix), however I quickly made these three for the example.
Samplenb - < c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
Day <- c(1,5,10,15,1,5,10,15,1,5,10,15,1,5,10,15)
Concentration <- c(0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9)
Any help it much appreciated. I have been playing around with the reshape package functions. However, they do not seem suitable.
Thank you for taking the time to help me!
For variety (and since you mentioned "reshape"), here are a few options (though MrFlick's is by far the most appropriate).
The first two options assume we have grouped your vectors into a data.frame:
DF <- data.frame(Samplenb, Day, Concentration)
Option 1: reshape
reshape(DF, direction = "wide", idvar = "Day", timevar = "Samplenb")
# Day Concentration.1 Concentration.2 Concentration.3 Concentration.4
# 1 1 0.2 0.2 0.2 0.2
# 2 5 0.3 0.3 0.3 0.3
# 3 10 0.5 0.5 0.5 0.5
# 4 15 0.9 0.9 0.9 0.9
Option 2: dcast from "reshape2"
library(reshape2)
dcast(DF, Day ~ Samplenb, value.var="Concentration")
# Day 1 2 3 4
# 1 1 0.2 0.2 0.2 0.2
# 2 5 0.3 0.3 0.3 0.3
# 3 10 0.5 0.5 0.5 0.5
# 4 15 0.9 0.9 0.9 0.9
Option 3: A manual approach--should be fast, but unless you're a coding masochist, best left as a lesson in matrix indexing in R.
Nrow <- unique(Day)
Ncol <- unique(Samplenb)
M <- matrix(0, nrow = length(Nrow), ncol = length(Ncol),
dimnames = list(Nrow, Ncol))
M[cbind(match(Day, rownames(M)), match(Samplenb, colnames(M)))] <- Concentration
# 1 2 3 4
# 1 0.2 0.2 0.2 0.2
# 5 0.3 0.3 0.3 0.3
# 10 0.5 0.5 0.5 0.5
# 15 0.9 0.9 0.9 0.9
Good ol' xtabs can help out here
xtabs(Concentration ~ Day + Samplenb)
will produce
Samplenb
Day 1 2 3 4
1 0.2 0.2 0.2 0.2
5 0.3 0.3 0.3 0.3
10 0.5 0.5 0.5 0.5
15 0.9 0.9 0.9 0.9

How do you generate a regular non-integer sequence in julia?

How are regular, non-integer sequences generated in julia?
I'm trying to get 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
In MATLAB, I would use
0.1:0.1:1
And in R
seq(0.1, 1, by = 0.1)
But I can't find anything except integer sequences in julia (e.g., 1:10). Searching for "sequence" in the docs only gives me information about how strings are sequences.
Similarly to Matlab, but with the difference that 0.1:0.1:1 defines a Range:
julia> typeof(0.1:0.1:1)
Range{Float64} (constructor with 3 methods)
and thus if an Array is needed:
julia> [0.1:0.1:1]
10-element Array{Float64,1}:
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Unfortunately, this use of Range is only briefly mentioned at this point of the documentation.
Edit: As mentioned in the comments by #ivarne it is possible to achieve a similar result using linspace:
julia> linspace(.1,1,10)
10-element Array{Float64,1}:
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
but note that the results are not exactly the same due to rounding differences:
julia> linspace(.1,1,10)==[0.1:0.1:1]
false
The original answer is now deprecated. You should use collect() to generate a sequence.
## In Julia
> collect(0:.1:1)
10-element Array{Float64,1}:
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
## In R
> seq(0, 1, .1)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
They are generated the same way as in Matlab
julia> sequence = 0:.1:1
0.0:0.1:1.0
Alternatively, you can use the range() function, which allows you to specify the length, step size, or both
julia> range(0, 1, length = 5)
0.0:0.25:1.0
julia> range(0, 1, step = .01)
0.0:0.01:1.0
julia> range(0, step = .01, length = 5)
0.0:0.01:0.04
You can still do all of the thinks you would normally do with a vector, eg indexing
julia> sequence[4]
0.3
math and stats...
julia> sum(sequence)
5.5
julia> using Statistics
julia> mean(sequence)
0.5
This will (in most cases) work the same way as a vector, but nothing is actually allocated. It can be comfortable to make the vector, but in most cases you shouldn't (it's less performant). This works because
julia> sequence isa AbstractArray
true
If you truly need the vector, you can collect(), splat (...) or use a comprehension:
julia> v = collect(sequence)
11-element Array{Float64,1}:
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
julia> v == [sequence...] == [x for x in sequence]
true

1 %% 0.1 = 0.1 AND 1 %% 0.2 = 0.2?

Am I missing a thing or a bug in the base package?
I am running on R-2.15.2, on Win 7-32
x %% y modulus (x mod y) 5 %% 2 is 1
from : http://www.statmethods.net/management/operators.html
> 1 %% 0.1
[1] 0.1
> 1 %% 0.2
[1] 0.2
Both of them must be 0.
The below examples works as expected.
For example:
1 %% 0.15
must be 0.1 ( 1.0 = 6 x 0.15 + 0.1)
> 1 %% 0.11 # expected result
[1] 0.01
> 1 %% 0.15
[1] 0.1
> 1 %% 0.3
[1] 0.1
> 1 %% 0.4
[1] 0.2
> 1 %% 0.5
[1] 0
First of all, I cannot reproduce this using R version 2.15.1 running on x86_64.
If that's what happens in your environment, this almost certainly has to do with the fact that neither 0.1 nor 0.2 can be represented exactly using binary floating-point arithmetic:
> sprintf("%.20f", 0.1)
[1] "0.10000000000000000555"
> sprintf("%.20f", 0.2)
[1] "0.20000000000000001110"
The documentation for %% has the following to say:
%% and x %/% y can be used for non-integer y, e.g. 1 %/% 0.2, but the results are subject to representation error and so may be platform-dependent. Because the IEC 60059 representation of 0.2 is a binary fraction slightly larger than 0.2, the answer to 1 %/% 0.2 should be 4 but most platforms give 5.
There are many other similar pitfalls having to do with the properties of floating-point arithmetic (not just in R). The classic paper on the subject is What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Resources