Related
I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:
dat <- tibble(day=c(1:30),
value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487,
NA, NA, 487, 487, 487, 487))
My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:
value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335,
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823,
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)
My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?
You have monotonic data for interpolation. We can use "hyman" method in spline():
x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
plot(x, y, pch = 19) ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2) ## interpolation at NA (red)
Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:
library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))
I have the basic data below (dput provided at the end of the question):
P <- 72
E_c <- 80000
head (deflections)
L I_comp
1 60 17299.21
2 70 25760.94
3 80 32734.69
4 90 51343.59
5 100 60167.30
6 110 64887.87
I want to find the maximum deflection, using the equation:
Where b is defined as b = L - a and a is a vector that can be defined as
a <- seq(10,L,10)
and x is a nested sequence whitin a such that
x <- seq(1,a,1)
I have written the function below to calculate the maximum deflection delta at all x values for all a values.
delta_less <- function(x){
a_seq <- seq(10,L,10)
L <- x[1]
I_comp <- x[2]
X_seq <- seq(a_seq)
delta <- ((P*(L-a_seq)*X_seq)/(6*E_c*I_comp*L))*(L^2-(L-a_seq)^2-X_seq^2)
c(val = max(delta), pos = which.max(delta))
}
test <- cbind(deflections, vars = t(apply(deflections,1,delta_less)))
The problem is that my code is not recognizing the nested sequence. It only works if I give a_seq a singular numerical value.
Data provided:
dput(deflections)
structure(list(L = c(60, 70, 80, 90, 100, 110, 120, 130, 140,
60, 70, 80, 90, 100, 110, 120, 130, 140, 60, 70, 80, 90, 100,
110, 120, 130, 140, 60, 70, 80, 90, 100, 110, 120, 130, 140),
I_comp = c(17299.2063000114, 25760.9420686703, 32734.6949444858,
51343.5889683982, 60167.3001695375, 64887.8729936444, 83451.9473750139,
103000.852590734, 112631.744898846, 23539.3443398425, 23544.8558575609,
32808.739059987, 40880.4964809276, 60171.1172692296, 64894.6680546358,
83469.4777437875, 103114.132264558, 120223.534252571, 23539.3443398425,
25772.7904955165, 32810.8231970971, 51345.5936366056, 74673.7079705815,
80752.3694583712, 103114.132264558, 103114.132264558, 147386.853127916,
23539.3443398425, 25767.8092758621, 40881.3376639665, 55608.9342154021,
80694.6568665257, 80752.3694583712, 119068.355471205, 119402.817753462,
147386.853127916)), class = "data.frame", row.names = c(NA,
-36L))
I have this dataset listed below that will be used for my question below.
Data<-read.table(file=file.choose(),header=T)
Data;
VARIABLE TYPE NGENES BETA BETA_STD SE P
black SET 43 -0.049246 -0.0078434 0.14654 0.63156
blue SET 152 -0.080217 -0.023193 0.08137 0.83781
brown SET 163 -0.057881 -0.017266 0.079054 0.76791
cyan SET 42 0.1498 0.023586 0.14128 0.1446
darkgreen SET 2 -0.65338 -0.022727 0.67635 0.83292
green SET 172 -0.13458 -0.041115 0.073527 0.96631
greenyellow SET 40 0.026733 0.0041104 0.14624 0.42749
grey SET 4 0.16388 0.0080567 0.53064 0.37874
grey60 SET 23 -0.1455 -0.017054 0.20066 0.76576
lightcyan SET 41 0.083008 0.012918 0.15225 0.29284
magenta SET 32 -0.10777 -0.014858 0.16601 0.74184
midnightblue SET 23 0.00024188 2.84E-05 0.19544 0.49951
pink SET 64 -0.017662 -0.0034093 0.12521 0.55608
purple SET 60 0.12025 0.022504 0.12624 0.17048
red SET 73 0.40737 0.083745 0.11427 0.00018742
royalblue SET 7 -0.27895 -0.018125 0.36009 0.78067
salmon SET 170 0.040831 0.01241 0.076001 0.29559
turquoise SET 450 0.027806 0.012383 0.050585 0.29131
With this dataset I am wanting to create several rectangles on a plot what are each color coded and have a pvalue labeled on top of the rectangle. I am wanting to loop through the VARIABLE column and for each rectangle assign a color. Furthermore, I want to loop through the P column and write the P value on top of each rectangle. Thus for each row in the dataset, the color and p value should be the same. This is the script I am trying right now. I am not seeing how to loop the associated columns with this script. Any help would be nice.
coords <- matrix(
c(100, 300, 110, 310,
120, 300, 130, 310,
140, 300, 150, 310,
160, 300, 170, 310,
180, 300, 190, 310,
100, 320, 110, 330,
120, 320, 130, 330,
140, 320, 150, 330,
160, 320, 170, 330,
180, 320, 190, 330,
100, 340, 110, 350,
120, 340, 130, 350,
140, 340, 150, 350,
160, 340, 170, 350,
180, 340, 190, 350,
100, 360, 110, 370,
120, 360, 130, 370,
140, 360, 150, 370),
ncol=4,byrow=TRUE)
plot(c(100, 200), c(300, 450), type = "n",
main = "Test")
rfun <- function(x,i) {
do.call(rect,as.list(x))
}
apply(coords,1,rfun)
text((coords[,1]+coords[,3])/2,
(coords[,2]+coords[,4])/2,
seq(nrow(coords)))
I am not sure, but maybe you want something like this?
DF <- structure(list(VARIABLE = c("black", "blue", "brown", "cyan",
"darkgreen", "green", "greenyellow", "grey", "grey60", "lightcyan",
"magenta", "midnightblue", "pink", "purple", "red", "royalblue",
"salmon", "turquoise"),
TYPE = c("SET", "SET", "SET", "SET",
"SET", "SET", "SET", "SET", "SET", "SET", "SET", "SET", "SET",
"SET", "SET", "SET", "SET", "SET"),
NGENES = c(43L, 152L, 163L, 42L, 2L, 172L, 40L, 4L, 23L, 41L, 32L, 23L,
64L, 60L, 73L, 7L, 170L, 450L),
BETA = c(-0.049246, -0.080217, -0.057881, 0.1498, -0.65338, -0.13458,
0.026733, 0.16388, -0.1455, 0.083008, -0.10777, 0.00024188,
-0.017662, 0.12025, 0.40737, -0.27895, 0.040831, 0.027806),
BETA_STD = c(-0.0078434, -0.023193, -0.017266, 0.023586, -0.022727,
-0.041115, 0.0041104, 0.0080567, -0.017054, 0.012918,
-0.014858, 2.84e-05, -0.0034093, 0.022504, 0.083745,
-0.018125, 0.01241, 0.012383),
SE = c(0.14654, 0.08137, 0.079054, 0.14128, 0.67635, 0.073527, 0.14624,
0.53064, 0.20066, 0.15225, 0.16601, 0.19544, 0.12521, 0.12624,
0.11427, 0.36009, 0.076001, 0.050585),
P = c(0.63156, 0.83781, 0.76791, 0.1446, 0.83292, 0.96631, 0.42749,
0.37874, 0.76576, 0.29284, 0.74184, 0.49951, 0.55608, 0.17048,
0.00018742, 0.78067, 0.29559, 0.29131)),
class = "data.frame",
row.names = c(NA, -18L))
coords <- matrix(
c(100, 300, 110, 310,
120, 300, 130, 310,
140, 300, 150, 310,
160, 300, 170, 310,
180, 300, 190, 310,
100, 320, 110, 330,
120, 320, 130, 330,
140, 320, 150, 330,
160, 320, 170, 330,
180, 320, 190, 330,
100, 340, 110, 350,
120, 340, 130, 350,
140, 340, 150, 350,
160, 340, 170, 350,
180, 340, 190, 350,
100, 360, 110, 370,
120, 360, 130, 370,
140, 360, 150, 370),
ncol=4,byrow=TRUE)
rfun <- function(x, i) do.call(rect, c(as.list(x), border = i))
plot(c(100, 200), c(300, 450), type = "n",
main = "Test")
invisible(sapply(seq_len(nrow(DF)),
function(y) do.call(rect, c(as.list(coords[y,]), border = DF$VARIABLE[y]))))
text((coords[,1]+coords[,3])/2,
(coords[,2]+coords[,4])/2,
round(DF$P, 2))
Created on 2020-08-04 by the reprex package (v0.3.0)
I'm trying to replicate an analysis in a paper by Milliken (https://sci-hub.tw/10.1016/s0169-7161(03)22007-1, section 8) from SAS code to R. I'm quite stumped to be honest. It's a split plot repeated measure design where the correlation structure is a compound symmetry structure. Below is the data and SAS code and it's results.
Data
library(magrittr)
library(tidyr)
library(dplyr)
dta <- data.frame(
tmp = c(rep(900, 3), rep(1000, 3), rep(1100, 3)),
posit = rep(c("top", "mid", "bot"), 3),
lot_1 = c(189, 211, 178, 213, 220, 197, 194, 212, 189),
lot_2 = c(195, 206, 162, 199, 230, 198, 215, 208, 193),
lot_3 = c(183, 210, 173, 189, 228, 202, 194, 201, 180),
lot_4 = c(187, 223, 181, 183, 221, 168, 232, 215, 192),
lot_5 = c(173, 191, 149, 202, 213, 151, 190, 198, 182)
)
dta <- dta %>%
tidyr::pivot_longer(., cols = c(lot_1, lot_2, lot_3, lot_4, lot_5),
names_to = "Lot") %>%
dplyr::mutate(Lot = as.factor(Lot),
tmp = as.factor(tmp),
lot_tmp = as.factor(paste0(Lot, "-", tmp)))
SAS Code
proc mixed data = dta cl covtest ic;
class Posit temp lot;
model thick = temp Posit Posit*temp/ddfm = kr; random lot;
repeated posit/type = cs subject = lot*temp r rcorr
Output from SAS
R code attempt
## this works but isn't doing the same thing as above
library(nlme)
m1 <- lme(
value ~ temp + posit + temp:posit,
random = ~ 1 | lot ,
correlation = corCompSymm(form=~1|lot),
data = dta, method = "REML"
)
I'm stuck at this point on how to add a repeated structure to the posit factor.
Thank you for the help!
I have a list NList of numeric vector like
[[1]]
[1] 1959 9 4 62
[[2]]
[1] 2280 2 13
[[3]]
[1] 15 4 13
[[4]]
[1] 2902 178 13
and the structure is like
list(c(1959, 13), c(2280, 178, 13), c(2612, 178, 13), c(2902,
178, 13), c(2389, 178, 13), c(216, 736, 13), c(2337, 178, 13),
c(2639, 2126, 13), c(2924, 676, 178, 13), c(2416, 674, 178,
13), c(2223, 13), c(842, 178, 13), c(2618, 1570, 178, 13),
c(854, 178, 13), c(1847, 178, 13), c(2529, 178, 13), c(511,
178, 13), c(2221, 736, 13), c(415, 674, 178, 13), c(2438,
178, 13), c(2127, 178, 13), c(1910, 2126, 13), c(1904, 674,
178, 13), c(2310, 674, 178, 13), c(1732, 178, 13), c(1843,
178, 13), c(2539, 178, 13), c(1572, 676, 178, 13), c(1616,
876, 13).....)
I want to iterate the numeric vectors in this list, I would like to do something as:
sum<- 0
index<-1
list1 <- apply(NList,1,function (i){
#I want to get each of the numeric vector here
row <- NList[i]
#then I want to iterate the numeric vector for some calculation.
#I am expecting, for [[1]], I get f(1959,9)+f(9,4)+f(4,62), in which f is my customized function, below I use a simple multiple as example
for (j in (1:(length(row)-1)))
{
origin <- row[j]
dest <- row[j+1]
#a simple calculation example...I am expecting an array of sum which is the calculation result
sum[index] <- sum[index] + origin*dest
}
index <- index+1
})
but it does not work and returns:
dim(X) must have a positive length
The lapply is not working for me and return sum as 0...
listR1 <- lapply(NList,function (i){
row <- i
for (j in 1:length(row))
{origin <- row[j]
dest <- row[j+1]
sum[index] <- sum[index] + origin*dest
}
})
Did I miss something? How can I do this?
Thanks!
I took the function out of your apply statement to look at it a bit closer.
f=function(Row)
{
Sum<- 0
for (j in 1:(length(Row)-1) )
{
Sum<- j + Row[j]*Row[j+1]
}
Sum # returns the Sum
}
Then I can apply the function to each row with:
list1 <- lapply(NList,f)
Okay, so this code would work:
f=function(a,b) sum(a,b)
test.func=function (i){
for (j in 1:(length(i)-1))
ret.val[j]=f(i[j],i[j+1])
ret.val
}
# Use lapply for a list.
lapply(NList,test.func)
Or you could do it in one line:
lapply(NList,apply(seq_along(i)[-length(i)],function(x) f(i[x],i[x+1])))