Subsetting a vector base on another vector - r

I have 2 input vectors, iv1 and iv2, as shown below. I would like to separate the elements of the second vector according to elements of the first. It works like this: The values in iv2 between the first 2 values of iv1 are stored in ov1, the values in iv2 between the second and third values of iv1 are stored in ov2, and so on. Note: The values in iv1 and iv2 are already in ascending order. Any thoughts please?
Input:
iv1 <- c(100, 200, 300, 400, 435)
iv2 <- c(60, 120, 140, 160, 180, 230, 250, 255, 265, 270, 295, 340, 355, 401, 422, 424, 430)
Desired output:
ov1 = c(120, 140, 160, 180)
ov2 = c(230, 250, 255, 265, 270, 295)
ov3 = c(340, 355)
ov4 = c(401, 422, 424, 430)

As #RonakShah suggested, the most efficient way in this case may be this:
split(iv2, cut(iv2, breaks = iv1,labels = paste0('ov',1:4)))
Output:
$ov1
[1] 120 140 160 180
$ov2
[1] 230 250 255 265 270 295
$ov3
[1] 340 355
$ov4
[1] 401 422 424 430

Related

Interpolate with splines without surpassing next value R

I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:
dat <- tibble(day=c(1:30),
value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487,
NA, NA, 487, 487, 487, 487))
My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:
value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335,
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823,
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)
My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?
You have monotonic data for interpolation. We can use "hyman" method in spline():
x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
plot(x, y, pch = 19) ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2) ## interpolation at NA (red)
Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:
library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))

Nested vectors in a function that is applied to dataframe in R

I have the basic data below (dput provided at the end of the question):
P <- 72
E_c <- 80000
head (deflections)
L I_comp
1 60 17299.21
2 70 25760.94
3 80 32734.69
4 90 51343.59
5 100 60167.30
6 110 64887.87
I want to find the maximum deflection, using the equation:
Where b is defined as b = L - a and a is a vector that can be defined as
a <- seq(10,L,10)
and x is a nested sequence whitin a such that
x <- seq(1,a,1)
I have written the function below to calculate the maximum deflection delta at all x values for all a values.
delta_less <- function(x){
a_seq <- seq(10,L,10)
L <- x[1]
I_comp <- x[2]
X_seq <- seq(a_seq)
delta <- ((P*(L-a_seq)*X_seq)/(6*E_c*I_comp*L))*(L^2-(L-a_seq)^2-X_seq^2)
c(val = max(delta), pos = which.max(delta))
}
test <- cbind(deflections, vars = t(apply(deflections,1,delta_less)))
The problem is that my code is not recognizing the nested sequence. It only works if I give a_seq a singular numerical value.
Data provided:
dput(deflections)
structure(list(L = c(60, 70, 80, 90, 100, 110, 120, 130, 140,
60, 70, 80, 90, 100, 110, 120, 130, 140, 60, 70, 80, 90, 100,
110, 120, 130, 140, 60, 70, 80, 90, 100, 110, 120, 130, 140),
I_comp = c(17299.2063000114, 25760.9420686703, 32734.6949444858,
51343.5889683982, 60167.3001695375, 64887.8729936444, 83451.9473750139,
103000.852590734, 112631.744898846, 23539.3443398425, 23544.8558575609,
32808.739059987, 40880.4964809276, 60171.1172692296, 64894.6680546358,
83469.4777437875, 103114.132264558, 120223.534252571, 23539.3443398425,
25772.7904955165, 32810.8231970971, 51345.5936366056, 74673.7079705815,
80752.3694583712, 103114.132264558, 103114.132264558, 147386.853127916,
23539.3443398425, 25767.8092758621, 40881.3376639665, 55608.9342154021,
80694.6568665257, 80752.3694583712, 119068.355471205, 119402.817753462,
147386.853127916)), class = "data.frame", row.names = c(NA,
-36L))

Loop and plot rectangles with colors automatically

I have this dataset listed below that will be used for my question below.
Data<-read.table(file=file.choose(),header=T)
Data;
VARIABLE TYPE NGENES BETA BETA_STD SE P
black SET 43 -0.049246 -0.0078434 0.14654 0.63156
blue SET 152 -0.080217 -0.023193 0.08137 0.83781
brown SET 163 -0.057881 -0.017266 0.079054 0.76791
cyan SET 42 0.1498 0.023586 0.14128 0.1446
darkgreen SET 2 -0.65338 -0.022727 0.67635 0.83292
green SET 172 -0.13458 -0.041115 0.073527 0.96631
greenyellow SET 40 0.026733 0.0041104 0.14624 0.42749
grey SET 4 0.16388 0.0080567 0.53064 0.37874
grey60 SET 23 -0.1455 -0.017054 0.20066 0.76576
lightcyan SET 41 0.083008 0.012918 0.15225 0.29284
magenta SET 32 -0.10777 -0.014858 0.16601 0.74184
midnightblue SET 23 0.00024188 2.84E-05 0.19544 0.49951
pink SET 64 -0.017662 -0.0034093 0.12521 0.55608
purple SET 60 0.12025 0.022504 0.12624 0.17048
red SET 73 0.40737 0.083745 0.11427 0.00018742
royalblue SET 7 -0.27895 -0.018125 0.36009 0.78067
salmon SET 170 0.040831 0.01241 0.076001 0.29559
turquoise SET 450 0.027806 0.012383 0.050585 0.29131
With this dataset I am wanting to create several rectangles on a plot what are each color coded and have a pvalue labeled on top of the rectangle. I am wanting to loop through the VARIABLE column and for each rectangle assign a color. Furthermore, I want to loop through the P column and write the P value on top of each rectangle. Thus for each row in the dataset, the color and p value should be the same. This is the script I am trying right now. I am not seeing how to loop the associated columns with this script. Any help would be nice.
coords <- matrix(
c(100, 300, 110, 310,
120, 300, 130, 310,
140, 300, 150, 310,
160, 300, 170, 310,
180, 300, 190, 310,
100, 320, 110, 330,
120, 320, 130, 330,
140, 320, 150, 330,
160, 320, 170, 330,
180, 320, 190, 330,
100, 340, 110, 350,
120, 340, 130, 350,
140, 340, 150, 350,
160, 340, 170, 350,
180, 340, 190, 350,
100, 360, 110, 370,
120, 360, 130, 370,
140, 360, 150, 370),
ncol=4,byrow=TRUE)
plot(c(100, 200), c(300, 450), type = "n",
main = "Test")
rfun <- function(x,i) {
do.call(rect,as.list(x))
}
apply(coords,1,rfun)
text((coords[,1]+coords[,3])/2,
(coords[,2]+coords[,4])/2,
seq(nrow(coords)))
I am not sure, but maybe you want something like this?
DF <- structure(list(VARIABLE = c("black", "blue", "brown", "cyan",
"darkgreen", "green", "greenyellow", "grey", "grey60", "lightcyan",
"magenta", "midnightblue", "pink", "purple", "red", "royalblue",
"salmon", "turquoise"),
TYPE = c("SET", "SET", "SET", "SET",
"SET", "SET", "SET", "SET", "SET", "SET", "SET", "SET", "SET",
"SET", "SET", "SET", "SET", "SET"),
NGENES = c(43L, 152L, 163L, 42L, 2L, 172L, 40L, 4L, 23L, 41L, 32L, 23L,
64L, 60L, 73L, 7L, 170L, 450L),
BETA = c(-0.049246, -0.080217, -0.057881, 0.1498, -0.65338, -0.13458,
0.026733, 0.16388, -0.1455, 0.083008, -0.10777, 0.00024188,
-0.017662, 0.12025, 0.40737, -0.27895, 0.040831, 0.027806),
BETA_STD = c(-0.0078434, -0.023193, -0.017266, 0.023586, -0.022727,
-0.041115, 0.0041104, 0.0080567, -0.017054, 0.012918,
-0.014858, 2.84e-05, -0.0034093, 0.022504, 0.083745,
-0.018125, 0.01241, 0.012383),
SE = c(0.14654, 0.08137, 0.079054, 0.14128, 0.67635, 0.073527, 0.14624,
0.53064, 0.20066, 0.15225, 0.16601, 0.19544, 0.12521, 0.12624,
0.11427, 0.36009, 0.076001, 0.050585),
P = c(0.63156, 0.83781, 0.76791, 0.1446, 0.83292, 0.96631, 0.42749,
0.37874, 0.76576, 0.29284, 0.74184, 0.49951, 0.55608, 0.17048,
0.00018742, 0.78067, 0.29559, 0.29131)),
class = "data.frame",
row.names = c(NA, -18L))
coords <- matrix(
c(100, 300, 110, 310,
120, 300, 130, 310,
140, 300, 150, 310,
160, 300, 170, 310,
180, 300, 190, 310,
100, 320, 110, 330,
120, 320, 130, 330,
140, 320, 150, 330,
160, 320, 170, 330,
180, 320, 190, 330,
100, 340, 110, 350,
120, 340, 130, 350,
140, 340, 150, 350,
160, 340, 170, 350,
180, 340, 190, 350,
100, 360, 110, 370,
120, 360, 130, 370,
140, 360, 150, 370),
ncol=4,byrow=TRUE)
rfun <- function(x, i) do.call(rect, c(as.list(x), border = i))
plot(c(100, 200), c(300, 450), type = "n",
main = "Test")
invisible(sapply(seq_len(nrow(DF)),
function(y) do.call(rect, c(as.list(coords[y,]), border = DF$VARIABLE[y]))))
text((coords[,1]+coords[,3])/2,
(coords[,2]+coords[,4])/2,
round(DF$P, 2))
Created on 2020-08-04 by the reprex package (v0.3.0)

Replicating a repeated measures compound symmetry structure from SAS to R using lme

I'm trying to replicate an analysis in a paper by Milliken (https://sci-hub.tw/10.1016/s0169-7161(03)22007-1, section 8) from SAS code to R. I'm quite stumped to be honest. It's a split plot repeated measure design where the correlation structure is a compound symmetry structure. Below is the data and SAS code and it's results.
Data
library(magrittr)
library(tidyr)
library(dplyr)
dta <- data.frame(
tmp = c(rep(900, 3), rep(1000, 3), rep(1100, 3)),
posit = rep(c("top", "mid", "bot"), 3),
lot_1 = c(189, 211, 178, 213, 220, 197, 194, 212, 189),
lot_2 = c(195, 206, 162, 199, 230, 198, 215, 208, 193),
lot_3 = c(183, 210, 173, 189, 228, 202, 194, 201, 180),
lot_4 = c(187, 223, 181, 183, 221, 168, 232, 215, 192),
lot_5 = c(173, 191, 149, 202, 213, 151, 190, 198, 182)
)
dta <- dta %>%
tidyr::pivot_longer(., cols = c(lot_1, lot_2, lot_3, lot_4, lot_5),
names_to = "Lot") %>%
dplyr::mutate(Lot = as.factor(Lot),
tmp = as.factor(tmp),
lot_tmp = as.factor(paste0(Lot, "-", tmp)))
SAS Code
proc mixed data = dta cl covtest ic;
class Posit temp lot;
model thick = temp Posit Posit*temp/ddfm = kr; random lot;
repeated posit/type = cs subject = lot*temp r rcorr
Output from SAS
R code attempt
## this works but isn't doing the same thing as above
library(nlme)
m1 <- lme(
value ~ temp + posit + temp:posit,
random = ~ 1 | lot ,
correlation = corCompSymm(form=~1|lot),
data = dta, method = "REML"
)
I'm stuck at this point on how to add a repeated structure to the posit factor.
Thank you for the help!

Iterate a numeric vector in a list

I have a list NList of numeric vector like
[[1]]
[1] 1959 9 4 62
[[2]]
[1] 2280 2 13
[[3]]
[1] 15 4 13
[[4]]
[1] 2902 178 13
and the structure is like
list(c(1959, 13), c(2280, 178, 13), c(2612, 178, 13), c(2902,
178, 13), c(2389, 178, 13), c(216, 736, 13), c(2337, 178, 13),
c(2639, 2126, 13), c(2924, 676, 178, 13), c(2416, 674, 178,
13), c(2223, 13), c(842, 178, 13), c(2618, 1570, 178, 13),
c(854, 178, 13), c(1847, 178, 13), c(2529, 178, 13), c(511,
178, 13), c(2221, 736, 13), c(415, 674, 178, 13), c(2438,
178, 13), c(2127, 178, 13), c(1910, 2126, 13), c(1904, 674,
178, 13), c(2310, 674, 178, 13), c(1732, 178, 13), c(1843,
178, 13), c(2539, 178, 13), c(1572, 676, 178, 13), c(1616,
876, 13).....)
I want to iterate the numeric vectors in this list, I would like to do something as:
sum<- 0
index<-1
list1 <- apply(NList,1,function (i){
#I want to get each of the numeric vector here
row <- NList[i]
#then I want to iterate the numeric vector for some calculation.
#I am expecting, for [[1]], I get f(1959,9)+f(9,4)+f(4,62), in which f is my customized function, below I use a simple multiple as example
for (j in (1:(length(row)-1)))
{
origin <- row[j]
dest <- row[j+1]
#a simple calculation example...I am expecting an array of sum which is the calculation result
sum[index] <- sum[index] + origin*dest
}
index <- index+1
})
but it does not work and returns:
dim(X) must have a positive length
The lapply is not working for me and return sum as 0...
listR1 <- lapply(NList,function (i){
row <- i
for (j in 1:length(row))
{origin <- row[j]
dest <- row[j+1]
sum[index] <- sum[index] + origin*dest
}
})
Did I miss something? How can I do this?
Thanks!
I took the function out of your apply statement to look at it a bit closer.
f=function(Row)
{
Sum<- 0
for (j in 1:(length(Row)-1) )
{
Sum<- j + Row[j]*Row[j+1]
}
Sum # returns the Sum
}
Then I can apply the function to each row with:
list1 <- lapply(NList,f)
Okay, so this code would work:
f=function(a,b) sum(a,b)
test.func=function (i){
for (j in 1:(length(i)-1))
ret.val[j]=f(i[j],i[j+1])
ret.val
}
# Use lapply for a list.
lapply(NList,test.func)
Or you could do it in one line:
lapply(NList,apply(seq_along(i)[-length(i)],function(x) f(i[x],i[x+1])))

Resources