Round sequence of numbers to chosen numbers - r
I got a vector of numbers from 0 to 1. I'd like to divide them to X amount of groups - for example if X=5, then round the numbers to 5 groups: all numbers from 0 to 0.2 will be 0, all from 0.2 to 0.4 will be 0.2, etc.
For example, if I have x <- c(0.34,0.07,0.56) and X=5 like the above explanation, I'll get (0.2, 0, 0.4).
So far, the only way I found to that is by looping over the entire vector. Is there a more elegant way to do that?
You can simply do:
floor(x*X)/X
# [1] 0.2 0.0 0.4
More testing cases:
X = 10
floor(x*X)/X
# [1] 0.3 0.0 0.5
X = 2
floor(x*X)/X
# [1] 0.0 0.0 0.5
X = 5
floor(x*X)/X
# [1] 0.2 0.0 0.4
Data:
x <- c(0.34,0.07,0.56)
Try:
cut.alt <- function(x, X) {
out <- cut(x, breaks=(1:X-1)/X)
levels(out) <- as.character((1:X-1)/X)
out
}
cut with breaks set to (1:X-1)/X divides the vector x into groups like OP asks. Then changing the levels to the value of the cutoff gives the answer.
Or using plyr:
library(plyr)
round_any(x, 1/X,floor)
# [1] 0.2 0.0 0.4
Related
How to collect outputs of multivariable vector-valued function into a dataframe?
I have a function f1 that take a pair of real numbers (x, y) and returns a triple of real numbers. I would like to collect all outputs of this function for all x in a vector a and y in a vector b. Could you please elaborate on how to do so? f1 <- function(x, y){ return (c(x+y, x-y, x*y)) } a <- seq(0, pi, 0.1) b <- seq(0, 2 * pi, 0.1) Update: I mean for all pair $(x, y) \in a \times b$.
Here is a data.table option setDT(expand.grid(a, b))[, fval := do.call(Vectorize(f1, SIMPLIFY = FALSE), unname(.SD))][] where expand.grid + do.call + Vectorize are used, giving Var1 Var2 fval 1: 0.0 0.0 0,0,0 2: 0.1 0.0 0.1,0.1,0.0 3: 0.2 0.0 0.2,0.2,0.0 4: 0.3 0.0 0.3,0.3,0.0 5: 0.4 0.0 0.4,0.4,0.0 --- 2012: 2.7 6.2 8.90,-3.50,16.74 2013: 2.8 6.2 9.00,-3.40,17.36 2014: 2.9 6.2 9.10,-3.30,17.98 2015: 3.0 6.2 9.2,-3.2,18.6 2016: 3.1 6.2 9.30,-3.10,19.22 A more compact one is using CJ(a,b) instead of setDT(expand.grid(a, b)) (Thank #akrun's advise)
We can use expand.grid to expand the data between 'a', and 'b' values, then loop over the row with apply, MARGIN = 1 and apply the f1 out <- as.data.frame(t(apply(expand.grid(a, b), 1, function(x) f1(x[1], x[2])))) Or with tidyverse library(dplyr) library(purrr) library(tidyr) out2 <- crossing(x = a, y = b) %>% pmap_dfr(f2) -output head(out2) # A tibble: 6 x 3 # add subtract multiply # <dbl> <dbl> <dbl> #1 0 0 0 #2 0.1 -0.1 0 #3 0.2 -0.2 0 #4 0.3 -0.3 0 #5 0.4 -0.4 0 #6 0.5 -0.5 0 where f2 f2 <- function(x, y){ return (tibble(add = x+y, subtract = x-y, multiply = x*y)) } It may be better to return a list or tibble so that it becomes easier
Create all possible combinations with expand.grid and use Map to apply f1 to every pair. val <- expand.grid(a, b) result <- do.call(rbind, Map(f1, val$Var1, val$Var2)) head(result) # [,1] [,2] [,3] #[1,] 0.0 0.0 0 #[2,] 0.1 0.1 0 #[3,] 0.2 0.2 0 #[4,] 0.3 0.3 0 #[5,] 0.4 0.4 0 #[6,] 0.5 0.5 0
Subset does not work with some numeric values but with others
I ran in a very strange problem I don't know how to solve and have never seen. I can subset a data.frame for some but not for other numeric values. Here is the data I use: library(dplyr) ws <- seq(0, 1, by=.1) kombos <- expand.grid(weightjaw2 = ws, weightjaw3 = ws) %>% as.data.frame kombos$kombi <- 1:nrow(kombos) kombos$weightjaw2 <- as.numeric(kombos$weightjaw2) kombos$weightjaw3 <- as.numeric(kombos$weightjaw3) class(kombos$weightjaw2) [1] "numeric" Now, I need to subset this data.frame. This works well, say for example, the value 0.1. kombos %>% filter(weightjaw2==0.1) weightjaw2 weightjaw3 kombi 1 0.1 0.0 2 2 0.1 0.1 13 3 0.1 0.2 24 4 0.1 0.3 35 5 0.1 0.4 46 6 0.1 0.5 57 7 0.1 0.6 68 8 0.1 0.7 79 9 0.1 0.8 90 10 0.1 0.9 101 11 0.1 1.0 112 Strangely enough, this does not work for values of 0.3, 0.6, and 0.7. kombos %>% filter(weightjaw2==0.3) [1] weightjaw2 weightjaw3 kombi <0 rows> (or 0-length row.names) The same holds for subset(kombos, weightjaw2==0.3). Why is that and how can I solve this? EDIT I solved this using dyplyr::near(): kombos %>% filter(near(weightjaw2, 0.3))
The == requires both lhs and rhs to be exactly equal. The 'weightjaw2' column is not exactly equal to 0.3 due to the precision checks. One option is to convert the column to character in filter to subset the rows library(dplyr) kombos %>% filter(as.character(weightjaw2) == 0.3)
R: Apply function on data frame A dependent on values of data frame B
I have two data frames A and B. A = data.frame(x = c(3,-4,2), y=c(-4,7,1), z=c(-5,-1,6)) B = data.frame(x = c(0.5,0.9,0.3), y=c(0.7,0.2,0.1), z=c(0.9,0.8,0.6)) If a value in A is negative the corresponding value in B (the same position like in A) should be subtracted from 1. If the value in A is positive the corresponding value in B should not change. In the end B should look like this x y z 1 0.5 0.3 0.1 2 0.1 0.2 0.2 3 0.3 0.1 0.6 Anyone an idea how this problem can be solved? Thanks in advance, Christian
This seems to work: B[A<0] <- 1 - B[A<0] x y z 1 0.5 0.3 0.1 2 0.1 0.2 0.2 3 0.3 0.1 0.6
Normalize blocks/sub-matrices within a matrix
I want to normalize (i.e., 0-1) blocks/sub-matrices within a square matrix based on row/col names. It is important that the normalized matrix correspond to the original matrix. The below code extracts the blocks, e.g. all col/row names == "A" and normalizes it by its max value. How do I put that matrix of normalized blocks back together so it corresponds to the original matrix, such that each single value of the normalized blocks are in the same place as in the original matrix. I.e. you cannot put the blocks together and then e.g. sort the normalized matrix by the original's matrix row/col names. #dummy code mat <- matrix(round(runif(90, 0, 50),),9,9) rownames(mat) <- rep(LETTERS[1:3],3) colnames(mat) <- rep(LETTERS[1:3],3) mat.n <- matrix(0,nrow(mat),ncol(mat), dimnames = list(rownames(mat),colnames(mat))) for(i in 1:length(LETTERS[1:3])){ ? <- mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] / max(mat[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]]) #For example, mat.n[rownames(mat)==LETTERS[1:3][i],colnames(mat)==LETTERS[1:3][i]] <- # doesn't work } UPDATE Using ave() as #G. Grothendieck suggested works for the blocks, but I'm not sure how it's normalizing beyond that. mat.n <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max) Within block the normalization works, e.g. mat[rownames(mat)=="A",colnames(mat)=="A"] A A A A 13 18 15 A 38 33 41 A 12 18 47 mat.n[rownames(mat.n)=="A",colnames(mat.n)=="A"] A A A A 0.2765957 0.3829787 0.3191489 A 0.8085106 0.7021277 0.8723404 A 0.2553191 0.3829787 1.0000000 But beyond that, it looks weird. > round(mat.n,1) A B C A B C A B C A 0.3 0.2 0.1 0.4 0.2 1.0 0.3 0.9 1.0 B 0.9 0.8 0.9 0.4 0.5 0.4 0.4 0.9 0.0 C 0.0 0.4 0.4 0.0 0.8 0.5 0.4 0.9 0.0 A 0.8 0.9 0.5 0.7 0.9 0.6 0.9 0.4 0.4 B 0.1 0.8 0.7 1.0 0.3 0.5 0.1 1.0 0.8 C 0.4 0.0 0.2 0.2 0.2 0.6 1.0 0.4 1.0 A 0.3 0.4 0.3 0.4 0.6 0.8 1.0 1.0 0.3 B 0.6 0.2 0.5 0.9 0.3 0.2 0.9 0.3 1.0 C 0.5 0.9 0.7 1.0 0.4 0.5 1.0 1.0 0.9 In this case, I would expect 3 1s across the whole matrix- 1 for each block. But there're 10 1s, e.g. mat.n[3,2], mat.n[1,9]. I'm not sure how this function normalized between blocks. UPDATE 2 #Original matrix. #Suggested solution produces `NaN` mat <- as.matrix(read.csv(text=",1.21,1.1,2.2,1.1,1.1,1.21,2.2,2.2,1.21,1.22,1.22,1.1,1.1,2.2,2.1,2.2,2.1,2.2,2.2,2.2,1.21,2.1,2.1,1.21,1.21,1.21,1.21,1.21,2.2,1.21,2.2,1.1,1.22,1.22,1.22,1.22,1.21,1.22,2.1,2.1,2.1,1.22 1.21,0,0,0,0,0,0,0,0,292,13,0,0,0,0,0,0,0,0,0,0,22,0,0,94,19,79,0,9,0,126,0,0,0,0,0,0,0,0,0,0,0,0 1.1,0,0,0,155,166,0,0,0,0,0,0,4,76,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.1,0,201,0,0,79,0,0,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.1,0,33,0,91,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.21,8,0,0,0,0,0,0,0,404,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,37,26,18,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,162,79,1,0,0,0,0,0,0,0,0,10,0,27,0,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,33,17,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0 1.21,207,0,0,0,0,1644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,16,17,402,0,0,0,606,0,0,0,0,0,0,0,0,0,0,0,0 1.22,13,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26,0,0,15,0,0,0,0,0 1.22,0,0,0,0,0,0,0,0,0,71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,374,6,121,6,21,0,0,0,0 1.1,0,0,0,44,0,0,0,0,0,0,0,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,0,0,0,0,0,0,0,0,0,0 1.1,0,0,0,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,18,0,0,0,0,353,116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,0,5,0 2.2,0,0,0,0,0,0,0,37,0,0,0,0,0,4,0,0,0,36,46,62,0,0,0,0,0,0,0,0,0,0,73,0,0,0,0,0,0,1,0,0,0,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,0,0,0,0,0,0,0,38,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0 2.2,17,0,23,0,0,0,444,65,0,0,0,0,0,0,0,78,0,0,42,30,15,0,0,0,0,0,0,0,4,0,18,0,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,75,8,0,0,0,0,0,0,0,87,0,74,0,85,0,0,0,0,0,0,0,0,1,0,19,0,25,0,0,0,0,0,0,0,0,0 2.2,0,0,13,0,0,0,12,20,0,0,0,0,0,0,0,118,0,29,92,0,25,0,0,0,0,0,0,0,0,0,16,0,48,0,0,0,0,0,0,0,0,0 1.21,14,0,1,0,0,0,0,0,17,0,0,0,0,0,0,0,0,0,0,14,0,0,0,0,0,0,0,0,3,0,20,0,0,0,0,0,0,0,0,0,0,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,204,0,0,0,0,0,0,0,133,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,44,0,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,67,0,0,0,0,0,0,143,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,12,15,0 1.21,79,0,0,0,0,0,0,0,34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,38,26,6,9,0,112,0,0,0,0,0,0,0,0,0,0,0,0 1.21,11,0,0,0,0,17,0,0,49,0,0,0,0,0,0,0,0,0,0,0,0,0,0,28,0,0,0,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.21,40,0,0,0,0,0,0,0,122,0,0,0,0,0,0,0,0,0,0,0,3,0,0,24,11,0,887,20,0,389,0,0,0,0,0,0,0,0,0,0,0,0 1.21,14,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,50,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.21,34,0,0,0,0,26,0,0,56,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54,9,297,13,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,39,0,0,0,0,0,0,0,0,25,0,17,12,20,25,0,0,0,0,0,0,0,0,0,393,0,7,0,0,0,0,0,0,0,0,0 1.21,177,0,0,0,0,8,0,0,775,0,0,0,0,0,0,0,0,0,0,0,0,0,0,113,0,227,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2.2,0,0,0,0,0,0,21,17,0,0,0,0,0,0,0,0,0,42,30,16,0,0,0,0,0,0,0,0,165,0,0,0,0,0,0,0,0,0,0,0,0,0 1.1,0,6,0,28,0,0,0,0,0,0,0,9,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1.22,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,4,37,0,0,0,0,0,0,0,0,3,0,0,0,0,14,7,0,0,18,0,0,0,0 1.22,0,0,0,0,0,0,0,0,0,44,785,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,21,0,44,177,13,24,0,0,0,0 1.22,0,0,0,0,0,0,30,0,0,182,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,12,0,1231,135,17,0,0,0,0 1.22,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73,1308,0,669,16,0,0,0,8 1.21,0,0,0,0,0,0,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,33,197,626,0,44,0,0,0,0 1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,37,12,80,0,0,0,0,16 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,54,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,0,0,0 2.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,58,0,1,0,0,0,0,28,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,2,0,0 1.22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,9,0,0,0,0")) ids <- read.csv(text=",x 1,1.21 2,1.1 3,2.2 4,1.1 5,1.1 6,1.21 7,2.2 8,2.2 9,1.21 10,1.22 11,1.22 12,1.1 13,1.1 14,2.2 15,2.1 16,2.2 17,2.1 18,2.2 19,2.2 20,2.2 21,1.21 22,2.1 23,2.1 24,1.21 25,1.21 26,1.21 27,1.21 28,1.21 29,2.2 30,1.21 31,2.2 32,1.1 33,1.22 34,1.22 35,1.22 36,1.22 37,1.21 38,1.22 39,2.1 40,2.1 41,2.1 42,1.22") mat <- mat[,-1] rownames(mat) <- ids$x colnames(mat) <- ids$x ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max) Any help is much appreciated, thanks.
Use ave to get the maxima: mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max) For example, there are 9 ones, as expected, and there is one 1 in each block also as expected. (There could be more than 9 if the matrix happened to have multiple maxima in one or more blocks but there shoud not be less than 9.) set.seed(123) mat <- matrix(round(runif(90, 0, 50),),9,9) rownames(mat) <- rep(LETTERS[1:3],3) colnames(mat) <- rep(LETTERS[1:3],3) ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max) sum(ans == 1) ## [1] 9 # there are no duplicates (i.e. a block showing up more than once) hence # there is exactly one 1 in each block w <- which(ans == 1, arr = TRUE) anyDuplicated(cbind(rownames(mat)[w[, 1]], colnames(mat)[w[, 2]])) ## [1] 0 ADDED If some blocks are entirely zero (which is the case in UPDATE 2) then you will get NaNs for those blocks. If you want 0s instead for the all-zero blocks try this: xmax <- function(x) if (all(x == 0)) 0 else x/max(x) ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = xmax)
Trouble transforming a data set in R; making a look up table
R (programming language) I would like to transform my data set that has sample numbers, treatment days and concentrations (variable); to set it up as a single matix where the cells are filed with only concentration values. My output is a lookup table, where the user can look up a sample number along the 1st row and a day along the first column (header), and follow these along to get a concentration. This is not my data set (it comes as a matrix), however I quickly made these three for the example. Samplenb - < c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4) Day <- c(1,5,10,15,1,5,10,15,1,5,10,15,1,5,10,15) Concentration <- c(0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9) Any help it much appreciated. I have been playing around with the reshape package functions. However, they do not seem suitable. Thank you for taking the time to help me!
For variety (and since you mentioned "reshape"), here are a few options (though MrFlick's is by far the most appropriate). The first two options assume we have grouped your vectors into a data.frame: DF <- data.frame(Samplenb, Day, Concentration) Option 1: reshape reshape(DF, direction = "wide", idvar = "Day", timevar = "Samplenb") # Day Concentration.1 Concentration.2 Concentration.3 Concentration.4 # 1 1 0.2 0.2 0.2 0.2 # 2 5 0.3 0.3 0.3 0.3 # 3 10 0.5 0.5 0.5 0.5 # 4 15 0.9 0.9 0.9 0.9 Option 2: dcast from "reshape2" library(reshape2) dcast(DF, Day ~ Samplenb, value.var="Concentration") # Day 1 2 3 4 # 1 1 0.2 0.2 0.2 0.2 # 2 5 0.3 0.3 0.3 0.3 # 3 10 0.5 0.5 0.5 0.5 # 4 15 0.9 0.9 0.9 0.9 Option 3: A manual approach--should be fast, but unless you're a coding masochist, best left as a lesson in matrix indexing in R. Nrow <- unique(Day) Ncol <- unique(Samplenb) M <- matrix(0, nrow = length(Nrow), ncol = length(Ncol), dimnames = list(Nrow, Ncol)) M[cbind(match(Day, rownames(M)), match(Samplenb, colnames(M)))] <- Concentration # 1 2 3 4 # 1 0.2 0.2 0.2 0.2 # 5 0.3 0.3 0.3 0.3 # 10 0.5 0.5 0.5 0.5 # 15 0.9 0.9 0.9 0.9
Good ol' xtabs can help out here xtabs(Concentration ~ Day + Samplenb) will produce Samplenb Day 1 2 3 4 1 0.2 0.2 0.2 0.2 5 0.3 0.3 0.3 0.3 10 0.5 0.5 0.5 0.5 15 0.9 0.9 0.9 0.9