Calculate row specific based on min - r

My data looks like this
df <- data.frame(x = c(3, 5, 4, 4, 3, 2),
y = c(.9, .8, 1, 1.2, .5, .1))
I am trying to multiply each x value by either y or 1, depending on which has the least value.
df$z <- df$x * min(df$y, 1)
The problem is it is taking the min of the whole column, so it is multiplying every x by 0.1.
Instead, I need x multiplied by .9, .8, 1, 1, .5, .1...

We need pmin that will go through each value of 'y' and get the minimum val when it is compared with the second value (which is recycled)
pmin(df$y, 1)
#[1] 0.9 0.8 1.0 1.0 0.5 0.1
Likewise, we can have n arguments (as the parameter is ...)
pmin(df$y, 1, 0)
#[1] 0 0 0 0 0 0
To get the output, just multiply 'x' with the pmin output
df$x * pmin(df$y, 1)
which can also be written as
with(df, x * pmin(y, 1))

Maybe you could use an ifelse function:
df <- data.frame(x = c(3, 5, 4, 4, 3, 2),
y = c(.9, .8, 1, 1.2, .5, .1))
df$z = ifelse(df$y<1, df$x*df$y, df$x*1)
This will compare the values of each row.
Hope it helps! :)

Related

How can I match coordinates with their associated values in R

I have 250 points that I generated within a rectangle (-4,4)x(-6,6). If the popints are within a certain space they are blue and if they are outside of that space they are red.
The code I used for this is here, where i defined the confined space with squares:
library(sf)
border <- matrix(c(
-6, -4,
-6, 4,
6, 4,
6, -4,
-6, -4
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
# sample random points
rand_points <- st_sample(border, size = 250)
squares1 <- matrix(c(
-4, 0,
-4, 3,
-1, 3,
-1, 0,
-4, -0
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares2 <- matrix(c(
-2, -4,
-2, -1,
1, -1,
1, -4,
-2, -4
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares3 <- matrix(c(
2, -2,
2, 1,
5, 1,
5, -2,
2, -2
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares <- c(squares1, squares2, squares3)
red_vals <- st_difference(rand_points, squares)
blue_vals <- st_intersection(rand_points, squares)`
plot(border)
plot(negative_vals, add = TRUE, col = "red")
plot(positive_vals, add = TRUE, col = "blue")
My goal is to match the points' coordinates with their expected value. Example:
In the table, the third column is for the blue points and the fourth column for the red. If the point at that coordinate is blue it gets a +1 and if it is not blue at that coordinate -1, and vice versa for the red points.
So far, I have attained the coordinates of all the points.
y <- c(red_vals)
x <- c(blue_vals)
cdata <- c(x, y)
coord <- st_coordinates(cdata)`
I am now stuck on trying to figure out how I can classify x and y to their respective coordinates and indicate this in a dataframe.
Any help is appreciated.
You could do:
red_vals <- rand_points[rowSums(st_intersects(rand_points, squares, F)) == 0]
blue_vals <- st_intersection(rand_points, squares)
df <- rbind(cbind(st_coordinates(red_vals), PosGroup = 1, NegGroup = -1),
cbind(st_coordinates(red_vals), PosGroup = -1, NegGroup = 1)) |>
as.data.frame()
head(df)
#> X Y PosGroup NegGroup
#> X1 -5.2248158 0.03710509 1 -1
#> X2 -5.8932331 -1.41421992 1 -1
#> X3 -0.0609895 0.26541100 1 -1
#> X4 1.7345333 -3.04312404 1 -1
#> X5 -4.6801643 0.24656851 1 -1
#> X6 1.3190239 3.36491623 1 -1
Obviously the first few values are all red dots.
We can see that the points are correct by using this data frame to draw points in ggplot:
library(ggplot2)
df %>%
ggplot() +
geom_sf(data = squares) +
geom_point(aes(X, Y, color = factor(PosGroup)), pch = 1, size = 3) +
theme_classic() +
scale_color_brewer(palette = "Set1", direction = -1)

Specifying x values when converting approx() to data frame

I am trying to get a data frame from the output of approx(t,y, n=120) below. My intent is for the input values returned to be in increments of 0.25; for instance, 0, 0.25, 0.5, 0.75, ... so I've set n = 120.
However, the data frame I get doesn't return those input values.
t <- c(0, 0.5, 2, 5, 10, 30)
z <- c(1, 0.9869, .9478, 0.8668, .7438, .3945)
data.frame(approx(t, z, n = 120))
I appreciate any assistance in this matter.
There are 121, not 120, points from 0 to 30 inclusive in steps of 0.25
length(seq(0, 30, 0.25))
## [1] 121
so use this:
approx(t, z, n = 121)
Another approach is:
approx(t, z, xout = seq(min(t), max(t), 0.25))

R function with multiple operators

data ranges from -6 to 6 and I am trying to create 3 categories, however my function is not returning anyone for category 2 even though there are people present
FFMIBMDcopdcases$lowBMD = ifelse((FFMIBMDcopdcases$copd_Tscore >= -1) , 0,
ifelse((FFMIBMDcopdcases$copd_Tscore < -1), 1,
ifelse((FFMIBMDcopdcases$copd_Tscore <= -2.5), 2, NA)))
Try using cut function. Example:
myValues <- runif(n = 20, min = -6, max = 6)
as.numeric(as.character(cut(x = myValues, breaks = c(-Inf, -2.5, -1, Inf), labels = c(2, 1, 0))))
Since you want a numeric result it might be easiest to use findInterval although you will need to subtract the result from 2 to get in the inverse order ( 2 for lowest and 0 for highest) :
FFMIBMDcopdcases$lowBMD = 2 - findInterval(FFMIBMDcopdcases$copd_Tscore ,
c(-Inf, -2.5, -1, Inf) )

Using optimisation function (optimise) together with dbinom in R (optimisation issue)

When p = 0.5, n = 5 and x = 3
dbinom(3,5,0.5) = 0.3125
Lets say I dont know p (n and x is known) and want to find it.
binp <- function(bp) dbinom(3,5,bp) - 0.3125
optimise(binp, c(0,1))
It does not return 0.5. Also, why is
dbinom(3,5,0.5) == 0.3125 #FALSE
But,
x <- dbinom(3,5,0.5)
x == dbinom(3,5,0.5) #TRUE
optimize() searches the param that minimizes the output of function. Your function can return a negative value (e.g., binp(0.1) is -0.3044). If you search the param that minimizes difference from zero, it would be good idea to use sqrt((...)^2). If you want the param that makes output zero, uniroot would help you. And the param what you want isn't uniquely decided. (note; x <- dbinom(3, 5, 0.5); x == dbinom(3, 5, 0.5) is equibalent to dbinom(3, 5, 0.5) == dbinom(3, 5, 0.5))
## check output of dbinom(3, 5, prob)
input <- seq(0, 1, 0.001)
output <- Vectorize(dbinom, "prob")(3, 5, input)
plot(input, output, type="l")
abline(h = dbinom(3, 5, 0.5), col = 2) # there are two answers
max <- optimize(function(x) dbinom(3, 5, x), c(0, 1), maximum = T)$maximum # [1] 0.6000006
binp <- function(bp) dbinom(3,5,bp) - 0.3125 # your function
uniroot(binp, c(0, max))$root # [1] 0.5000036
uniroot(binp, c(max, 1))$root # [1] 0.6946854
binp2 <- function(bp) sqrt((dbinom(3,5,bp) - 0.3125)^2)
optimize(binp2, c(0, max))$minimum # [1] 0.499986
optimize(binp2, c(max, 1))$minimum # [1] 0.6947186
dbinom(3, 5, 0.5) == 0.3125 # [1] FALSE
round(dbinom(3, 5, 0.5), 4) == 0.3125 # [1] TRUE
format(dbinom(3, 5, 0.5), digits = 16) # [1] "0.3124999999999999"

Stepwise creation of one big matrix from smaller matrices in R for-loops

I have the following code:
beta <- c(1, 2, 3)
X1 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1,
0, 0, 1, 1),
nrow = 4,
ncol = 3)
Z1 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1),
nrow = 4,
ncol = 2)
Z2 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1),
nrow = 4,
ncol = 2)
library(MASS)
S1 <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
S2 <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 4, 4, 2), ncol = 2))
z <- list()
y <- list()
for(j in 1:dim(S1)[1]){
for(i in 1:dim(S2)[1]){
z[[i]] <- X1 %*% beta+Z1 %*% S1[j,]+Z2 %*% S2[i,]+matrix(rnorm(4, mean = 0 , sd = 0.27), nrow = 4)
Z <- unname(do.call(rbind, z))
}
y[[j]] <- Z
Y <- unname(do.call(rbind, y))
}
X1 is a 4x3, Z1 and Z2 are 4x2 matrices. So everytime X1 %*% beta+X2 %*% S1[j,]+X2 %*% S2[i,]+matrix(rnorm(4, mean = 0 , sd = sigma), nrow = 4) is called it outputs a 4x1 matrix. So far I store all these values in the inner and outer loop in two lists and then call rbind() to transform them into a matrix. Is there a way to directly store them in matrices?
You can avoid using lists if you rely on the apply functions and on vector recycling. I broke down your equation into its parts. (I hope I interpreted it accurately!)
Mb <- as.vector(X1 %*% beta)
M1 <- apply(S1,1,function(x) Z1 %*% x )
M2 <- apply(S2,1,function(x) Z2 %*% x ) + Mb
Mout <- apply(M1,2,function(x) M2 + as.vector(x))
as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
because the random numbers are added after the matrix multiplication (ie are not involved in any calculation), you can just put them in on the end.
Also note that you can't add a smaller matrix to a larger one, but if you make it a vector first then R will recycle it as necessary. So when Mb (a vector of length 4) is added to a matrix with 4 rows and n columns, it is recycled n times.

Resources