I met a strange problem when I tried to get the predicted mean to draw a trajectory plot.
I used the results from the following 3-class model:
WK$TIME_25<-(WK$TIME_QRT_R-25)/10 #TIME_QRT_R is the orginal time variable
m3a <- hlme(GFR_QRT_MEAN~ poly(TIME_25, degree = 3, raw =
TRUE)+GENDER+AGE_BL+HOSP+PROTEINURIA_bl,random = ~ 1, mixture=~poly(TIME_25, degree = 3, raw
= TRUE),subject = "ID",data = WK, ng = 3,B=m1)
After I got the results, I tried to create new dataset to draw trajectory mean plot. My codes are as below:
datnew<-data.frame(TIME_QRT_R=seq(0,27,length=100)
datnew$TIME_25<-(datnew$TIME_QRT_R-25)/10
datnew$GENDER<-1
datnew$AGE_BL<-64
datnew$HOSP<-1 # 1 stands for hospitalization
datnew$PROTEINURIA_bl<-1 # 1 stands for the presence of renal damage
mon_p<-predictY(m3a, datnew, var.time='TIME_QRT_R', draws=T)
To my surprise, I got the following error message after running the 'predictY' statement above:
"Error: cannot subset columns that don't exist. x column 'TIME_25' doesn't exist"
I created 'TIME_25' in the 'datnew' data, and the hlme model also includes this variable. Why did I get this error? Could you let me know how to fix it? Thank you!
Sincerely,
Liang Feng
Related
Hi unfortunately I'm new to genetics.
I have data that contains SNP`s the outcome variable is disease severity (sever\mild). what I have to do is to perform Cochran–Armitage test for trend to test the association between SNP and disease severity (sever\mild) and for each SNP to have a P-value. I read about the test on Wikipedia and found that there is a function to perform the test in R :
catt(y, x, score = c(0, 1, 2))
but I couldn’t still grasp the concept of assigning value to the X variable based on my data ( for every SNP - should I take into account the CHROM and POS column ?) i know that im supposed to have 2*3 table.
my data :
enter image description here
thank you :)
Good morning,
I´m currently trying to run a truncated regression loop on my dataset. In the following I will give you a reproducible example of my dataframe.
library(plyr)
library(truncreg)
df <- data.frame("grid_id" = rep(c(1,2), 6),
"htcm" = rep(c(160,170,175), 4),
stringsAsFactors = FALSE)
View(df)
Now I tried to run a truncated regression on the variable "htcm" grouped by grid_id to receive only coefficients (intercept such as sigma), which I then stored into a dataframe. This code is written based on the ideas of #hadley
reg <- dlply(df, "grid_id", function(.)
truncreg(htcm ~ 1, data = ., point = 160, direction = "left")
)
regcoef <- ldply(reg, coef)
As this code works for one of my three datasets, I receive error messages for the other two ones. The datasets do not differ in any column but in their absolute length
(length(df1) = 4,000; length(df2) = 100,000; length(df3) = 13,000)
The error message which occurs is
"Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of type vector, was 'NULL'
I do not even know how to reproduce an example where this error code occurs, because this code works totally fine with one of my three datasets.
I already accounted for missing values in both columns.
Does anyone has a guess what I can fix to this code?
Thanks!!
EDIT:
I think I found the origin of error in my code, the problem is most likely about that in a truncated regression model, the standard deviation is calculated which automatically implies more than one observation for any group. As there are also groups with only n = 1 observations included, the standard deviation equals zero which causes my code to detect a vector of length = NULL. How can I drop the groups with less than two observations within the regression code?
I am getting a weird problem with ezANOVA. When I try to execute code below it says that some data is missing, but when I look at the data, nothing is missing.
model_acc <- ezANOVA(data = nback_acc_summary[complete.cases(nback_acc_summary),],
dv = Stimulus1.ACC,
wid = Subject,
within = c(ExperimentName, Target),
between = Group,
type = 3,
detailed = T)
When I run these lines I get an error message that says:
Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.
Then I run
ezDesign(nback_acc_summary)
And get the message:
Error in as.list(c(x, y, row, col)) :
argument "x" is missing, with no default
I am not sure what to change in the code, because I can't really figure out what the problem is. I've researched the issue online, and it seems like quite a lot of users have encountered it before, but there is a very limited amount of solutions posted. I would be grateful for any kind of help.
Thanks!
For an ANOVA model you must have observations in all conditions created by the design of your model.
For example, if ExperimentName, Target, and Group each have two levels each, you have 2 x 2 x 2 = 8 conditions which require multiple observations in each condition. Then, add a constraint to this that your model is repeated measures which means that each Subject within a level of your between factor Group must have observations for all of the within conditions (i.e., ExperimentName x Target = 2 x 2 = 4).
The first error suggests you have fallen short of having enough data in the conditions suggested by your model.
The following should produce a plot to help identify which conditions are missing data:
ezDesign(
data = nback_acc_summary[complete.cases(nback_acc_summary), ],
x = Target,
y = Subject,
row = ExperimentName,
col = Group
)
I do not seem to able to see where I am getting things wrong. I am using the package phylolm to do some regressions with phylogenetic data
my model is not running and returns the error : Error in phyloglm(testVar ~ ...the number of rows in the data does not match the number of tips in the tree.
I have done everything to check but the species in my tree and those in my data are matching.
my code is
diet<-read.csv("dat.csv",h=T,dec = ".")
phy=read.nexus("ConsTree.tre")# the phylogenetic data
keep.spp<-levels(diet$ScientificName)
phylo<-drop.tip(phy,phy$tip.label[-match(keep.spp, phy$tip.label)])
setdiff(phylo$tip.label,diet$ScientificName)# this confirms that all is OK
t1<-phyloglm(testVar~Var1+Var2+Var3, diet, phylo)
t1<-phyloglm(testVar~Var1+Var2+Var3, diet, phylo,
method = c("logistic_MPLE","logistic_IG10","poisson_GEE"),
btol = 10, log.alpha.bound = 4,
start.beta=NULL, start.alpha=NULL,
boot = 0, full.matrix = TRUE)
#
Error in phyloglm(testVar~Var1+Var2+Var3,..the number of rows in the data does not match the number of tips in the tree.
Can anyone point where I am getting things wrong?
aplogies for being blind to this small thing...
I was supposed to rename my row.names in my data
row.names(diet)<-diet$ScientificName
I am attempting to build a Partial Least Squares Path Model using 'plspm'. After reading through the tutorial and formatting my data I am getting hung up on an error:
"Error in if (w.dif < tol || itermax == iter) break : missing value where TRUE/FALSE needed".
I assume that this error is the result of missing values for some of the latent variables (e.g. Soil_Displaced) has a lot of NAs because this variable was only measured in a subset of the replicates in the experiment. Is there a way to get around this error and work with variables with a lot of missing values. I am attaching my code and dateset here and the dataset can also be found in this dropbox file; https://www.dropbox.com/sh/51x08p4yf5qlbp5/-al2pwdCol
this is my code for now:
# inner model matrix
warming = c(0,0,0,0,0,0)
Treatment=c(0,0,0,0,0,0)
Soil_Displaced = c(1,1,0,0,0,0)
Mass_Lost_10mm = c(1,1,0,0,0,0)
Mass_Lost_01mm = c(1,1,0,0,0,0)
Daily_CO2 = c(1,1,0,1,0,0)
Path_inner = rbind(warming, Treatment, Soil_Displaced, Mass_Lost_10mm, Mass_Lost_01mm,Daily_CO2 )
innerplot(Path_inner)
#develop the outter model
Path_outter = list (3, 4:5, 6, 7, 8, 9)
# modes
#designates the model as a reflective model
Path_modes = rep("A", 6)
# Run it plspm(Data, inner matrix, outer list, modes)
Path_pls = plspm(data.2011, Path_inner, Path_outter, Path_modes)
Any input on this issue would be helpful. Thanks!
plspm does work limited with missing values, you have to set the scaling to numeric.
for your example the code looks as follows:
example_scaling = list(c("NUM"),
c("NUM", "NUM"),
c("NUM"),
c("NUM"),
c("NUM"),
c("NUM"))
Path_pls = plspm(data.2011, Path_inner, Path_outter, Path_modes, scaling = example_scaling)
But heres the limitation:
However if your dataset contains one observation where all indicators of a latent variable are missing values, this won't work.
First Case: F.e. the latent variable "Treatment" has 2 indicators, if one of them is NA, it works fine.
Second Case: But if there is just one observation where both indicators are NA, it won't work.
Since youre measuring the other 5 latent variables with just one indicator and you say your data contains lots of missing values, the second one will likely be the case.
PLSPM will not work with missing values therefore I had to interpolate some of the missing values from known observations. When this was done the code above worked great!.