how to separate the mixed models, and fit separate linear models - r

I am trying to fit a linear model and separate the mixed models. Then fit separate linear models to model_steeper and model_flatter. First, I create training samples with Input >= 5 and separate the points
nSample<-length(data$Input)
Train.Sample<-data.frame(trainInput=data$Input,trainOutput=rep(NA,nSample))
Train.Sample.Steeper<-data.frame(trainSteepInput=data$Input,trainSteepOutput=rep(NA,nSample))
Train.Sample.Flatter<-data.frame(trainFlatInput=data$Input,trainFlatOutput=rep(NA,nSample))
head(cbind(data,Train.Sample,Train.Sample.Steeper,Train.Sample.Flatter))
and the result is:
dput(head(cbind(data,Train.Sample,Train.Sample.Steeper,Train.Sample.Flatter)))
structure(list(Output = c(0.430030802963404, -0.387872242279496,
-0.773463398992163, 3.47962503801818, -1.18311295613965, -0.534018180113726
), Input = c(-0.707348558586091, -0.596670078579336, -1.55126970726997,
2.00976222474128, -1.69353070948273, -0.437843651510775), trainInput = c(-0.707348558586091,
-0.596670078579336, -1.55126970726997, 2.00976222474128, -1.69353070948273,
-0.437843651510775), trainOutput = c(NA, NA, NA, NA, NA, NA),
trainSteepInput = c(-0.707348558586091, -0.596670078579336,
-1.55126970726997, 2.00976222474128, -1.69353070948273, -0.437843651510775
), trainSteepOutput = c(NA, NA, NA, NA, NA, NA), trainFlatInput = c(-0.707348558586091,
-0.596670078579336, -1.55126970726997, 2.00976222474128,
-1.69353070948273, -0.437843651510775), trainFlatOutput = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, 6L), class = "data.frame")
Then, I tried:
Train.Sample.Steep.lm <- lm(trainSteepOutput ~ trainSteepInput, Train.Sample.Steeper)
But the error is:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
I do not know what to do next. Does any one know this?

Related

Conditional statements for one row vs all rows

I don't know what's going on here. Do I have some logical flaw in the code?
I want to match two datasets by their time difference. One case is around 4hs different from the entry in the other set. I am calculating the difference, e.g.:
qnr$submitdate[10]-raw1$time[7]
Time difference of 4 hours
I am specifying a time window:
sum(qnr$submitdate[10]-raw1$time[7] <= 4 & qnr$submitdate[10]-raw1$time[7] > 3.995)
[1] 1
Perfect, 1 match!
Now when I am considering the whole data set, I get 0 matches, how can that be?
sum(qnr$submitdate[10]-raw1$time <= 4 & qnr$submitdate[10]-raw1$time > 3.995)
[1] 0
Specifically, I want to match an identifier:
for (i in 1:nrow(qnr)){
match <- raw1$subject[(qnr$submitdate[i]-raw1$time <= 4 & qnr$submitdate[i]-raw1$time > 3.995)]
if(length(match)>0) qnr$subject[i] <- match
}
this works, but only for some cases, not the one mentioned above. Can someone please help me and enlighten me?
Data:
qnr <- structure(list(submitdate = structure(c(1635427498, 1635427876,
1635428218, 1635429757, 1635430844, 1635432380, 1635435962, 1635453487,
1635464448, 1635508264, 1635509440, 1635509727, 1635510277, 1635511263,
1635511718, 1635514199, 1635514329, 1635517928, 1635519441, 1635519704,
1635520386, 1635521108, 1635522747, 1635525148, 1635526577), tzone = "UTC", class = c("POSIXct",
"POSIXt")), subject = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))
raw1 <- structure(list(time = structure(c(1635413099, 1635413819, 1635416446,
1635417980, 1635421563, 1635439088, 1635493864, 1635495041, 1635495326,
1635495876, 1635496863, 1635499803, 1635499932, 1635503528, 1635505042,
1635508347, 1635512177, 1635512850, 1635518752, 1635519382), class = c("POSIXct",
"POSIXt"), tzone = ""), subject = c("9wtd4kldpun6bhgq", "qbvhqxuw67x1eduw",
"k2dc9c88t3jcfssy", "vmvwfc6z7j236nhk", "7qo7ra1jj25ue3fb", "5xx9qkkb53nzxev5",
"o6zaaq469c7t2jps", "dfsj021ojphza6uc", "4k0l4a3yrb33hel1", "vf6usaa0cl8kz17t",
"f1wwfeoeekoru88z", "oe8e2u6w4a1f6f6m", "tnxxywtpsj8nejoa", "zht8w1bfhq4dk22l",
"atd314r9a4htlaal", "mwbh9eafxczk0x8u", "ke7m4qqp4aodd1fb", "v13fx76lsohsa1hh",
"8kvynhcvfs09g658", "5scqtdz8ha8cuxt1")), row.names = c(79226L,
26641L, 79425L, 79624L, 79823L, 26789L, 2961L, 3109L, 3257L,
47585L, 3405L, 3553L, 3701L, 47784L, 3849L, 47983L, 48182L, 48381L,
48580L, 48779L), class = c("data.table", "data.frame"))
The reason is that subtracting times uses the timediff-function which uses units="auto" as a standard.
It works, when changing
qnr$submitdate[i]-raw1$time
to
difftime(qnr$submitdate[i],raw1$time, units="hours")

R - object not found error when it exists

I am trying to understand why "R" cannot find a variable that is definitely in my dataframe.
Here is the dput for "DF.1" in my code below:
library("dplyr")
library("stringr")
DF.1 <- structure(list(`ID` = c("APP-5XUEJHC1XN-2019",
"APP-AVO1K5F33B-2019", "APP-J12JZHOWTM-2019", "APP-VROJDQSZ3P-2019",
"APP-00AURK6GEP-2019", "APP-00VACS4YZI-2018", "APP-00W7N0XXSO-2019",
"APP-01AQMLSHX6-2019", "APP-021R8JXC6O-2018", "APP-022XIXHHIQ-2019",
"APP-025ZNBC262-2018", "APP-02IUB6YJ05-2019", "APP-02PSFXZI1U-2019",
"APP-02TZN2M3JT-2019", "APP-034IPEAN7E-2018", "APP-03XWZT90ZW-2018",
"APP-040I2UPEEI-2019", "APP-0442F1YUCB-2019", "APP-04DKWB5EF3-2019",
"APP-04E58XMYDH-2018"), `Observations` = c("Single",
"Single", "Single", "Single", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -20L))
DF.2 <- DF.1 %>% dplyr::mutate(
"New Var" = case_when(
str_detect(tolower(`Observations`), "single") ~ "Single Protocol",
str_detect(tolower(`Observations`), "multiple") |
!(str_detect(tolower(`Observations`), paste(c("single", "multiple"), collapse = '|'))) |
is.na(`Observations`) ~ "Multiple Protocol"))
When I run the above code, I get the following error:
Error in eval_tidy(pair$lhs, env = default_env) :
object 'Observations' not found
The variable is in the dataframe, so I am wondering if there is a conflict with either case_when or str_detect.
you need to assign the structure(...) piece to an object (DF.1 <- ...):
DF.1 <- structure(list(`ID` = c("APP-5XUEJHC1XN-2019",
"APP-AVO1K5F33B-2019", "APP-J12JZHOWTM-2019", "APP-VROJDQSZ3P-2019",
"APP-00AURK6GEP-2019", "APP-00VACS4YZI-2018", "APP-00W7N0XXSO-2019",
"APP-01AQMLSHX6-2019", "APP-021R8JXC6O-2018", "APP-022XIXHHIQ-2019",
"APP-025ZNBC262-2018", "APP-02IUB6YJ05-2019", "APP-02PSFXZI1U-2019",
"APP-02TZN2M3JT-2019", "APP-034IPEAN7E-2018", "APP-03XWZT90ZW-2018",
"APP-040I2UPEEI-2019", "APP-0442F1YUCB-2019", "APP-04DKWB5EF3-2019",
"APP-04E58XMYDH-2018"), `Observations` = c("Single",
"Single", "Single", "Single", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -20L))

linear regression model with dplyr on sepcified columns by name

I have the following data frame, each row containing four dates ("y") and four measurements ("x"):
df = structure(list(x1 = c(69.772808673525, NA, 53.13125414839,
17.3033274666411,
NA, 38.6120670385487, 57.7229000792707, 40.7654208618078, 38.9010405201831,
65.7108936694177), y1 = c(0.765671296296296, NA, 1.37539351851852,
0.550277777777778, NA, 0.83037037037037, 0.0254398148148148,
0.380671296296296, 1.368125, 2.5250462962963), x2 = c(81.3285388496182,
NA, NA, 44.369872853302, NA, 61.0746827226573, 66.3965114460601,
41.4256874481852, 49.5461413070349, 47.0936997726146), y2 =
c(6.58287037037037,
NA, NA, 9.09377314814815, NA, 7.00127314814815, 6.46597222222222,
6.2462962962963, 6.76976851851852, 8.12449074074074), x3 = c(NA,
60.4976916064608, NA, 45.3575294731303, 45.159758146854, 71.8459173097114,
NA, 37.9485456227131, 44.6307631013742, 52.4523342186143), y3 = c(NA,
12.0026157407407, NA, 13.5601157407407, 16.1213657407407, 15.6431018518519,
NA, 15.8986805555556, 13.1395138888889, 17.9432638888889), x4 = c(NA,
NA, NA, 57.3383407228293, NA, 59.3921356160536, 67.4231673171527,
31.853845252547, NA, NA), y4 = c(NA, NA, NA, 18.258125, NA,
19.6074768518519,
20.9696527777778, 23.7176851851852, NA, NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
I would like to create an additional column containing the slope of all the y's versus all the x's, for each row (each row is a patient with these 4 measurements).
Here is what I have so far:
df <- df %>% mutate(Slope = lm(vars(starts_with("y") ~
vars(starts_with("x"), data = .)
I am getting an error:
invalid type (list) for variable 'vars(starts_with("y"))'...
What am I doing wrong, and how can I calculate the rowwise slope?
You are using a tidyverse syntax but your data is not tidy...
Maybe you should rearrange your data.frame and rethink the way you store your data.
Here is how to do it in a quick and dirty way (at least if I understood your explanations correctly):
df <- merge(reshape(df[,(1:4)*2-1], dir="long", varying = list(1:4), v.names = "x", idvar = "patient"),
reshape(df[,(1:4)*2], dir="long", varying = list(1:4), v.names = "y", idvar = "patient"))
df$patient <- factor(df$patient)
Then you could loop over the patients, perform a linear regression and get the slopes as a vector:
sapply(levels(df$patient), function(pat) {
coef(lm(y~x,df[df$patient==pat,],na.action = "na.omit"))[2]
})

custom rmeta - forest plot generation does not work: " 'x' and 'units' must have length > 0"

I tried to generate a "forest plot" without summary estimates using the rmeta package. However, using ?forestplot and then starting from the description or the example does not help, I am always getting the same error. I would assume that it is a simple one that has to do with the matrix/vector lengths somewhat not lining up but I kept changing and adjusting and still cannot find the error...
Here is the example code:
tabletext<-cbind(c(NA, NA, NA, NA, NA, NA),
c(NA, NA, NA, NA, NA, NA),
c("variable1","subgroup","2nd", "3rd", "4th", "5th"),
c(NA,"mean","1.8683639", "2.5717301", "4.4966049, 9.0008054")
)
tabletext
png("forestplot.png")
forestplot(tabletext, mean = c(NA, NA, 1.8683639, 2.5717301, 4.4966049, 9.0008054), lower = c(NA, NA, 1.4604643, 2.0163468, 3.5197956, 6.9469213), upper = c(NA, NA, 2.3955105, 3.2897459, 5.7672966, 11.7288609),
is.summary = c(rep(FALSE, 6)), zero = 1, xlog=FALSE, boxsize=0.75, xticks = NULL, clip = c(0.9, 12))
dev.off()
Error message:
clip = c(0.9, 12))
Error in unit(rep(1, sum(widthcolumn)), "grobwidth", labels[[1]][widthcolumn]) :
'x' and 'units' must have length > 0
dev.off()
Any help is very much appreciated!
This works with the forestplot-package although you need to remove the xticks=NULL:
tabletext<-cbind(c(NA, NA, NA, NA, NA, NA),
c(NA, NA, NA, NA, NA, NA),
c("variable1","subgroup","2nd", "3rd", "4th", "5th"),
c(NA,"mean","1.8683639", "2.5717301", "4.4966049, 9.0008054")
)
png("forestplot.png")
forestplot(tabletext,
mean = c(NA, NA, 1.8683639, 2.5717301, 4.4966049, 9.0008054),
lower = c(NA, NA, 1.4604643, 2.0163468, 3.5197956, 6.9469213),
upper = c(NA, NA, 2.3955105, 3.2897459, 5.7672966, 11.7288609),
is.summary = c(rep(FALSE, 6)), zero = 1,
xlog=FALSE, boxsize=0.75, clip = c(0.9, 12))
dev.off()
Gives (I recommend some polishing before submitting for publishing):

How to create a proper dataset for boxplots

I'm having trouble to create a proper boxplot of my dataset. All of the solutions on this platform don't work because their dataset all look different with variables against each other.
So I want to ask: how do I need to format my dataset if it only contains 3 variables and their measured values in 3 columns. In the boxplot examples here, they plot a variable against another one but here this is not the case right?
Using boxplot(data) gives me 3 boxplots. But I want to show the MEAN and also the population size on each boxplot. I don't know how to use the solution as they are all about ggplot2 or boxplot with variables against each other.
I know that this must be simple, but I think I'm plotting the boxplots on a bad method and that's why the solutions on this site don't work?
Data:
structure(list(Rest = c(3.479386607, 3.478445796, 2.52227462,
1.726115552, 3.917693859, 2.300840122), Peat = c(16.79515746,
22.76673699, 24.43289941, 15.64168939, 31.60459098, 16.2369787
), Top.culture = c(8.288, 8.732, 5.199, 6.539, 3.248, 10.156)), .Names = c("Rest",
"Peat", "Top.culture"), row.names = c(NA, 6L), class = "data.frame")
If text annotation is what is meant by 'show the mean and also the population size' then:
boxplot(dat)
text(1:3, 12.5, paste( "Mean= ",round(sapply(dat,mean, na.rm=TRUE), 2),
"\n N= ",
sapply(dat, function(x) length( x[!is.na(x)] ) )
) )
This used your more complex data-object from the other (duplicated) question.
dat <- structure(list(Rest = c(3.479386607, 3.478445796, 2.52227462, 1.726115552, 3.917693859, 2.300840122, 2.326307503, 2.344828287, 4.654278623, 3.68669447, 3.343706863, 0.712228306, 2.735897248, 1.936723375, 2.724260325, 2.069633651, 1.741484154, 2.304391217, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Peat = c(16.79515746, 22.76673699, 24.43289941, 15.64168939, 31.60459098, 16.2369787, 32.63285246, 35.91852324, 19.27802839, 21.78974576, 30.39119451, 35.4846573, 42.21807817, 42.00913743, 40.96996704, 19.85075354, 17.247096, 22.81689524, 43.35990368, 37.57273508, 23.76889902, 38.34604591, 20.98376674, 16.44173119, 17.27639888, NA, NA, NA, NA, NA, NA), Top.culture = c(8.288, 8.732, 5.199, 6.539, 3.248, 10.156, 3.436, 5.584, 4.483, 2.087, 3.28, 2.71, 2.196, 4.971, 4.475, 6.361, 5.49, 9.085, 3.52, 5.772, 9.308, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Rest", "Peat", "Top.culture" ), class = "data.frame", row.names = c(NA, -31L))

Resources