Subset does not work with some numeric values but with others - r

I ran in a very strange problem I don't know how to solve and have never seen. I can subset a data.frame for some but not for other numeric values.
Here is the data I use:
library(dplyr)
ws <- seq(0, 1, by=.1)
kombos <- expand.grid(weightjaw2 = ws,
weightjaw3 = ws) %>% as.data.frame
kombos$kombi <- 1:nrow(kombos)
kombos$weightjaw2 <- as.numeric(kombos$weightjaw2)
kombos$weightjaw3 <- as.numeric(kombos$weightjaw3)
class(kombos$weightjaw2)
[1] "numeric"
Now, I need to subset this data.frame. This works well, say for example, the value 0.1.
kombos %>% filter(weightjaw2==0.1)
weightjaw2 weightjaw3 kombi
1 0.1 0.0 2
2 0.1 0.1 13
3 0.1 0.2 24
4 0.1 0.3 35
5 0.1 0.4 46
6 0.1 0.5 57
7 0.1 0.6 68
8 0.1 0.7 79
9 0.1 0.8 90
10 0.1 0.9 101
11 0.1 1.0 112
Strangely enough, this does not work for values of 0.3, 0.6, and 0.7.
kombos %>% filter(weightjaw2==0.3)
[1] weightjaw2 weightjaw3 kombi
<0 rows> (or 0-length row.names)
The same holds for subset(kombos, weightjaw2==0.3). Why is that and how can I solve this?
EDIT
I solved this using dyplyr::near():
kombos %>% filter(near(weightjaw2, 0.3))

The == requires both lhs and rhs to be exactly equal. The 'weightjaw2' column is not exactly equal to 0.3 due to the precision checks. One option is to convert the column to character in filter to subset the rows
library(dplyr)
kombos %>%
filter(as.character(weightjaw2) == 0.3)

Related

How to collect outputs of multivariable vector-valued function into a dataframe?

I have a function f1 that take a pair of real numbers (x, y) and returns a triple of real numbers. I would like to collect all outputs of this function for all x in a vector a and y in a vector b. Could you please elaborate on how to do so?
f1 <- function(x, y){
return (c(x+y, x-y, x*y))
}
a <- seq(0, pi, 0.1)
b <- seq(0, 2 * pi, 0.1)
Update: I mean for all pair $(x, y) \in a \times b$.
Here is a data.table option
setDT(expand.grid(a, b))[, fval := do.call(Vectorize(f1, SIMPLIFY = FALSE), unname(.SD))][]
where expand.grid + do.call + Vectorize are used, giving
Var1 Var2 fval
1: 0.0 0.0 0,0,0
2: 0.1 0.0 0.1,0.1,0.0
3: 0.2 0.0 0.2,0.2,0.0
4: 0.3 0.0 0.3,0.3,0.0
5: 0.4 0.0 0.4,0.4,0.0
---
2012: 2.7 6.2 8.90,-3.50,16.74
2013: 2.8 6.2 9.00,-3.40,17.36
2014: 2.9 6.2 9.10,-3.30,17.98
2015: 3.0 6.2 9.2,-3.2,18.6
2016: 3.1 6.2 9.30,-3.10,19.22
A more compact one is using CJ(a,b) instead of setDT(expand.grid(a, b)) (Thank #akrun's advise)
We can use expand.grid to expand the data between 'a', and 'b' values, then loop over the row with apply, MARGIN = 1 and apply the f1
out <- as.data.frame(t(apply(expand.grid(a, b), 1, function(x) f1(x[1], x[2]))))
Or with tidyverse
library(dplyr)
library(purrr)
library(tidyr)
out2 <- crossing(x = a, y = b) %>%
pmap_dfr(f2)
-output
head(out2)
# A tibble: 6 x 3
# add subtract multiply
# <dbl> <dbl> <dbl>
#1 0 0 0
#2 0.1 -0.1 0
#3 0.2 -0.2 0
#4 0.3 -0.3 0
#5 0.4 -0.4 0
#6 0.5 -0.5 0
where f2
f2 <- function(x, y){
return (tibble(add = x+y, subtract = x-y, multiply = x*y))
}
It may be better to return a list or tibble so that it becomes easier
Create all possible combinations with expand.grid and use Map to apply f1 to every pair.
val <- expand.grid(a, b)
result <- do.call(rbind, Map(f1, val$Var1, val$Var2))
head(result)
# [,1] [,2] [,3]
#[1,] 0.0 0.0 0
#[2,] 0.1 0.1 0
#[3,] 0.2 0.2 0
#[4,] 0.3 0.3 0
#[5,] 0.4 0.4 0
#[6,] 0.5 0.5 0

How to use dplyr:mutate to mulitply pairs of columns specified by parts of the variable name

I have the following example:
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012
1 1 2 5 6 0.4 0.6 0.7
2 2 5 1 1 0.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9
The table below is what I want to get.
I want to create a new column for each date (e.g. "01.2012"):
res_date = fix_date * fox_date
As I have many dates / pairs of dates, I guess this needs to be done by looping through the names.
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012 res_01.2012 res_02.2012 res_03.2012
1 1 2 5 6 0.4 0.6 0.7 0.8 3.0 4.2
2 2 5 1 1 0.5 0.5 0.5 2.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9 4.9 5.6 4.5
Anyone can help? Thanks very much in advance!
Here is an idea that uses split.default to split the data frame based on similar column names (based on your conditions). We then loop over that list and multiply the columns. In this case, we use Reduce (rather than i[1]*i[2]) to multiply in order to account for more than two columns
do.call(cbind,
lapply(split.default(df[-1], gsub('.*_', '', names(df[-1]))), function(i) Reduce(`*`, i)))
# 01.2012 02.2012 03.2012
#[1,] 0.8 3.0 4.2
#[2,] 2.5 0.5 0.5
#[3,] 4.9 5.6 4.5
Bind them back to the original with cbind.data.frame()
If you want a tidyverse approach, it will take using a bit of tidy evaluation to get what you want.
library(tidyverse)
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
# colnames with "fix"
fix <- names(df)[grepl("fix",names(df))]
# colnames with "fox"
fox <- names(df)[grepl("fox",names(df))]
# Iterate over the two vectors of names and column bind the results (map2_dfc).
# Since these are strings, we need to have them evaluated as symbols
# Creating the column name just requires the string to be evaluated.
map2_dfc(fix, fox, ~transmute(df, !!paste0("res", str_extract(.x, "_(0\\d)")) := !!sym(.x) * !!sym(.y)))
#> res_01 res_02 res_03
#> 1 0.8 3.0 4.2
#> 2 2.5 0.5 0.5
#> 3 4.9 5.6 4.5
Much more verbose than the other answers, but to my eye easier to read/edit/adapt, is a heavy gather-spread approach (the way I'd reason the problem if I was solving it step-by-step):
library(tidyr)
library(dplyr)
df %>%
gather(-id, key=colname, value=value) %>%
separate(colname, c('fixfox', 'date'), sep='_') %>%
spread(key=fixfox, value=value) %>%
mutate(res=fix*fox) %>%
gather(-id, -date, key=colname, value=value) %>%
unite(new_colname, colname, date, sep='_') %>%
spread(key=new_colname, value=value)

R: Apply function on data frame A dependent on values of data frame B

I have two data frames A and B.
A = data.frame(x = c(3,-4,2), y=c(-4,7,1), z=c(-5,-1,6))
B = data.frame(x = c(0.5,0.9,0.3), y=c(0.7,0.2,0.1), z=c(0.9,0.8,0.6))
If a value in A is negative the corresponding value in B (the same position like in A) should be subtracted from 1. If the value in A is positive the corresponding value in B should not change.
In the end B should look like this
x y z
1 0.5 0.3 0.1
2 0.1 0.2 0.2
3 0.3 0.1 0.6
Anyone an idea how this problem can be solved?
Thanks in advance,
Christian
This seems to work: B[A<0] <- 1 - B[A<0]
x y z
1 0.5 0.3 0.1
2 0.1 0.2 0.2
3 0.3 0.1 0.6

Round sequence of numbers to chosen numbers

I got a vector of numbers from 0 to 1. I'd like to divide them to X amount of groups - for example if X=5, then round the numbers to 5 groups: all numbers from 0 to 0.2 will be 0, all from 0.2 to 0.4 will be 0.2, etc.
For example, if I have x <- c(0.34,0.07,0.56) and X=5 like the above explanation, I'll get (0.2, 0, 0.4).
So far, the only way I found to that is by looping over the entire vector. Is there a more elegant way to do that?
You can simply do:
floor(x*X)/X
# [1] 0.2 0.0 0.4
More testing cases:
X = 10
floor(x*X)/X
# [1] 0.3 0.0 0.5
X = 2
floor(x*X)/X
# [1] 0.0 0.0 0.5
X = 5
floor(x*X)/X
# [1] 0.2 0.0 0.4
Data:
x <- c(0.34,0.07,0.56)
Try:
cut.alt <- function(x, X) {
out <- cut(x, breaks=(1:X-1)/X)
levels(out) <- as.character((1:X-1)/X)
out
}
cut with breaks set to (1:X-1)/X divides the vector x into groups like OP asks. Then changing the levels to the value of the cutoff gives the answer.
Or using plyr:
library(plyr)
round_any(x, 1/X,floor)
# [1] 0.2 0.0 0.4

Trouble transforming a data set in R; making a look up table

R (programming language)
I would like to transform my data set that has sample numbers, treatment days and concentrations (variable); to set it up as a single matix where the cells are filed with only concentration values. My output is a lookup table, where the user can look up a sample number along the 1st row and a day along the first column (header), and follow these along to get a concentration.
This is not my data set (it comes as a matrix), however I quickly made these three for the example.
Samplenb - < c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
Day <- c(1,5,10,15,1,5,10,15,1,5,10,15,1,5,10,15)
Concentration <- c(0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9,0.2, 0.3, 0.5, 0.9)
Any help it much appreciated. I have been playing around with the reshape package functions. However, they do not seem suitable.
Thank you for taking the time to help me!
For variety (and since you mentioned "reshape"), here are a few options (though MrFlick's is by far the most appropriate).
The first two options assume we have grouped your vectors into a data.frame:
DF <- data.frame(Samplenb, Day, Concentration)
Option 1: reshape
reshape(DF, direction = "wide", idvar = "Day", timevar = "Samplenb")
# Day Concentration.1 Concentration.2 Concentration.3 Concentration.4
# 1 1 0.2 0.2 0.2 0.2
# 2 5 0.3 0.3 0.3 0.3
# 3 10 0.5 0.5 0.5 0.5
# 4 15 0.9 0.9 0.9 0.9
Option 2: dcast from "reshape2"
library(reshape2)
dcast(DF, Day ~ Samplenb, value.var="Concentration")
# Day 1 2 3 4
# 1 1 0.2 0.2 0.2 0.2
# 2 5 0.3 0.3 0.3 0.3
# 3 10 0.5 0.5 0.5 0.5
# 4 15 0.9 0.9 0.9 0.9
Option 3: A manual approach--should be fast, but unless you're a coding masochist, best left as a lesson in matrix indexing in R.
Nrow <- unique(Day)
Ncol <- unique(Samplenb)
M <- matrix(0, nrow = length(Nrow), ncol = length(Ncol),
dimnames = list(Nrow, Ncol))
M[cbind(match(Day, rownames(M)), match(Samplenb, colnames(M)))] <- Concentration
# 1 2 3 4
# 1 0.2 0.2 0.2 0.2
# 5 0.3 0.3 0.3 0.3
# 10 0.5 0.5 0.5 0.5
# 15 0.9 0.9 0.9 0.9
Good ol' xtabs can help out here
xtabs(Concentration ~ Day + Samplenb)
will produce
Samplenb
Day 1 2 3 4
1 0.2 0.2 0.2 0.2
5 0.3 0.3 0.3 0.3
10 0.5 0.5 0.5 0.5
15 0.9 0.9 0.9 0.9

Resources