In data(aml) : data set ‘aml’ not found - r

library(readr)
aml <- read.csv("~/Documents/MH4315/aml.dat", sep="")
View(aml)
data(aml)
aml.km<-survfit(Surv(time,status)~x, data = aml)
plot(aml.km, main="Estimated survival function of the two
groups", lty=c(1,2) )
When i run my code, it pops up an error that in data(aml): data set 'aml' is not found.
I am not sure what is wrong with my code. And so, is the code for my aml.km correct?

I'm not sure what you expected from the data(aml) call. The data function is used to load data objects from installed packages. There is an aml dataset in both the survivla package and the boot package and I think they are identical, at least they have the same number of columns and rows as well as a similar description in their associated help pages. Since at the moment I do have the survival package loaded when I do this:
data(aml)
Nothing happens at the console but in this case no-news-is-good-news. The dataset is in the workspace.
You on the other hand have read in an aml data object from your local disk so we cannot really be sure what it has in it. When I run you survfit call I get no error:
aml.km<-survfit(Surv(time,status)~x, data = aml)
> aml.km
Call: survfit(formula = Surv(time, status) ~ x, data = aml)
n events median 0.95LCL 0.95UCL
x=Maintained 11 7 31 18 NA
x=Nonmaintained 12 11 23 8 NA
#----------------------
png();plot(aml.km); dev.off()
#RStudioGD

Related

De-identifying survival or flexsurvreg objects in R

Please consider the following:
I need to provide some R code syntax to analyse data with the flexsurv package. I am not allowed to receive/analyse directly or on-site. I am however allowed to receive the analysis results.
Problem
When we run the flexsurvreg() function on some data (here ovarian from the flexsurv package), the created object (here fitw) contains enough information to "re-create" or "back-engineer" the actual data. But then I would technically have access to the data I am not allowed to have.
# Load package
library("flexsurv")
#> Loading required package: survival
# Run flexsurvreg with data = ovarian
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ factor(rx) + age,
data = ovarian, dist="weibull")
# Look at first observation in ovarian
ovarian[1, ]
#> futime fustat age resid.ds rx ecog.ps
#> 1 59 1 72.3315 2 1 1
# With the following from the survival object, the data could be re-created
fitw$data$Y[1, ]
#> time status start stop time1 time2
#> 59 1 0 59 59 Inf
fitw$data$m[1, ]
#> Surv(futime, fustat) factor(rx) age (weights)
#> 1 59 1 72.3315 1
Potential solution
We could write the code so that it also sets all those data that might be used for this back-engineering to NA as follows:
# Setting all survival object observation to NA
fitw$data$Y <- NA
fitw$data$m <- NA
fitw$data$mml$scale <- NA
fitw$data$mml$rate <- NA
fitw$data$mml$mu <- NA
Created on 2021-08-27 by the reprex package (v2.0.0)
Question
If I proceed as the above and set all these parameters to NA, could I then receive the fitw object (e.g. as an .RDS file) without ever being able to "back-engineer" the original data? Or is there any other way to share fitw without the attached data?
Thanks!
Setting, e.g. fitw$data <- NULL will remove all the individual-level data from the fitted model object. Some of the output functions may not work with objects stripped of data however. In the current development version on github, printing the model object should work. Also summary and predict methods should work, as long as covariate values are supplied in newdata - omitting them won't work, since the default is to take the covariate values from the observed data.

Error "$ operator is invalid for atomic vectors" despite not using atomic vectors or $

Hello fellow Stackers! This is my first question so i am curious if you can help me! :)
First: I checked similar questions and unfortunately none of the solutions worked for me. Tried it for nearly 3 days now :/ Since I am working with sensitive data I cannot provide the original table for reprex, unfortunately. However I will create a small substitutional example-table for testing.
To get to the problem:
I want to predict a norm value using the package "CNorm". It requires raw data, classification data, a model and min/max values and some other things that are less important. The problem is: Whatever I do, whatever Data-type and working directory I use, it gives me the error "$ operator is invalid for atomic vectors" to change that I transformed the original .sav-file to a Dataframe. Well- nothing happened. I tested the type of the data and it said dataframe, not atomic vector. Also i tried using "[1]" for location or ["Correct"] for names but still the same error showed up. Same for using 2 single Dataframes, using lists. I have tried to use $ to check, if i get a different error but also the same. I even used another workspace to check if the old workspace was bugged.
So maybe I just did really stupid mistakes but I really tried and it did not work out so I am asking you, what the solution might be. Here is some data to test! :)
install.packages("haven")
library(haven)
install.packages("CNORM")
library(CNORM)
SpecificNormValue <- predictNorm((Data_4[1]),(Data_4[2]),model = T,minNorm = 12, maxNorm = 75, force = FALSE, covariate = NULL)
So that is one of the commands I used on the Dataframe "Data_4". I also tried not using brackets or using "xxx" to get the column names but to no avail.
The following is the example Dataframe. To test it more realistic I would recommend an Exel-file with 2 columns and 900 rows(+ Column title) (like the original). The "correct"-values can be random selected by Excel and they differ from 35 to 50, the age differs from 6 to 12.
Correct
Age
40
6
45
7
50
6
35
6
I really hope someone of you can figure out the problem and how I get the command properly running. I really have no other idea right now.
Thanks for checking my question and thanks in advance for your time! I would be glad to hear from you!
The source of that error isn't your data, it's the third argument to predictNorm: model = T. According to the predictNorm documentation, this is supposed to be a "regression model or a cnorm object". Instead you are passing in a logical value (T = TRUE) which is an atomic vector and causes this error when predictNorm tries to access the components of the model with $.
I don't know enough about your problem to say what kind of model you need to use to get the answer you want, but for example passing it an object constructed by cnorm() returns without an error using your data and parameters (there are some warnings because of the small size of your test dataset):
library(haven)
library(cNORM)
#> Good morning star-shine, cNORM says 'Hello!'
Data_4 <- data.frame(correct = c(40, 45, 50, 35),
age = c(6,7,6,6))
SpecificNormValue <- predictNorm(Data_4$correct,
Data_4$age,
model = cnorm(Data_4$correct, Data_4$age),
minNorm = 12,
maxNorm = 75,
force = FALSE,
covariate = NULL)
#> Warning in rankByGroup(raw = raw, group = group, scale = scale, weights =
#> weights, : The dataset includes cases, whose percentile depends on less than
#> 30 cases (minimum is 1). Please check the distribution of the cases over the
#> grouping variable. The confidence of the norm scores is low in that part of the
#> scale. Consider redividing the cases over the grouping variable. In cases of
#> disorganized percentile curves after modelling, it might help to reduce the 'k'
#> parameter.
#> Multiple R2 between raw score and explanatory variable: R2 = 0.0667
#> Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
#> force.in, : 21 linear dependencies found
#> Reordering variables and trying again:
#> Warning in log(vr): NaNs produced
#> Warning in log(vr): NaNs produced
#> Specified R2 falls below the value of the most primitive model. Falling back to model 1.
#> R-Square Adj. = 0.993999
#> Final regression model: raw ~ L4A3
#> Regression function: raw ~ 30.89167234 + (6.824413606e-09*L4A3)
#> Raw Score RMSE = 0.35358
#>
#> Use 'printSubset(model)' to get detailed information on the different solutions, 'plotPercentiles(model) to display percentile plot, plotSubset(model)' to inspect model fit.
Created on 2020-12-08 by the reprex package (v0.3.0)
Note I used Data_4$age and Data_4$correct for the first two arguments. Data_4[,1] and Data_4[[1]] also work, but Data_4[1] doesn't, because that returns a subset of a data frame not a vector as expected by predictNorm.

adonis: Error right-hand-side of formula has no usable terms

I have this chao distance matrix based on all fungi abundances:
CR10 CR11 CR13 CR14 CR17 CR18 CR19
CR11 0.4531840
CR13 0.4288178 0.4624915
CR14 0.5903908 0.5466617 0.4942469
CR17 0.4784990 0.3387325 0.6136265 0.5779121
CR18 0.7649840 0.7537409 0.7526077 0.5632825 0.4153391
CR19 0.3772907 0.4579895 0.3208187 0.3706775 0.5644193 0.7380274
CR20 0.4598706 0.5529427 0.6424340 0.6690386 0.3855154 0.5509150 0.6406800
and the table with 33 environmental variables for the same plots.
when I run:
fungAbundAdonis <- lapply(colnames(env2), function(x) {
form <- as.formula(paste("OTU.table2", x, sep="~"))
z <- adonis(form, data = env2, permutations=999)
return(data.frame(env = rownames(z$aov.tab), Rsq = z$aov.tab$R2,P = z$aov.tab$P))}
)
I get this error:
Error in adonis(form, data = env2, permutations = 999) :
right-hand-side of formula has no usable terms.
I don't understand why because when I use the same script with the distance matrix of plots from 1 to 9 and 12 15 and 16 and the environmental table for these plots it works fine. Does anybody know what the source of the error could be?
thanks!
Your question has no reproducible example, and I have to guess. However, I can reproduce your error message if the variable is constant in the right-hand-side. This may happen when you subset env2 and in that selected subset a variable has only one value. (This only concerns vegan 2.5-x or release version: vegan 2.6-0 will not give an error message.)

Error in dataframe *tmp* replacement has x data has y

I'm a beginner in R. Here is a very simple code where I'm trying to save the residual term:
# Create variables for child's EA:
dat$cldeacdi <- rowMeans(dat[,c('cdcresp', 'cdcinv')],na.rm=T)
dat$cldeacu <- rowMeans(dat[,c('cucresp', 'cucinv')],na.rm=T)
# Create a residual score for child EA:
dat$cldearesid <- resid(lm(cldeacu ~ cldeacdi, data = dat))
I'm getting the following message:
Error in `$<-.data.frame`(`*tmp*`, cldearesid, value = c(-0.18608488908881, :
replacement has 366 rows, data has 367
I searched for this error but couldn't find anything that could resolve this. Additionally, I've created the exact same code for mom's EA, and it saved the residual just fine, with no errors. I'd be grateful if someone could help me resolve this.
I have a feeling you have NAs in your data. Look at this example:
#mtcars data set
test <- mtcars
#adding just one NA in the cyl column
test[2, 2] <- NA
#running linear model and adding the residuals to the data.frame
test$residuals <- resid(lm(mpg ~ cyl, test))
Error in `$<-.data.frame`(`*tmp*`, "residuals", value = c(0.382245430809409, :
replacement has 31 rows, data has 32
As you can see this results in a similar error to yours.
As a validation:
length(resid(lm(mpg ~ cyl, test)))
#31
nrow(test)
#32
This happens because lm will run na.omit on the data set prior to running the regression, so if you have any rows with NA these will get eliminated resulting in fewer results.
If you run na.omit on your dat data set (i.e. dat <- na.omit(dat) at the very beginning of your code then your code should work.
This is an old thread, but maybe this can help someone else facing the same issue. To LyzandeR's point, check for NA's as a first line of defense. In addition, make sure that you don't have any factors in x, as this can also cause the error.

Removing character level outlier in R

I have a linear model1<-lm(divorce_rate~marriage_rate+median_age+population) for which the leverage plot shows an outlier at 28 (State variable id for "Nevada"). I'd like to specify a model without Nevada in the dataset. I tried the following but got stuck.
data<-read.dta("census.dta")
attach(data)
data1<-data.frame(pop,divorce,marriage,popurban,medage,divrate,marrate)
attach(data1)
model1<-lm(divrate~marrate+medage+pop,data=data1)
summary(model1)
layout(matrix(1:4,2,2))
plot(model1)
dfbetaPlots(lm(divrate~marrate+medage+pop),id.n=50)
vif(model1)
dataNV<-data[!data$state == "Nevada",]
attach(dataNV)
model3<-lm(divrate~marrate+medage+pop,data=dataNV)
The last line of the above code gives me
Error in model.frame.default(formula = divrate ~ marrate + medage + pop, :
variable lengths differ (found for 'medage')
I suspect that you have some glitch in your code such that you have attach()ed copies that are still lying around in your environment -- that's why it's really best practice not to use attach(). The following code works for me:
library(foreign)
## best not to call data 'data'
mydata <- read.dta("http://www.stata-press.com/data/r8/census.dta")
I didn't find divrate or marrate in the data set: I'm going to speculate that you want the per capita rates:
## best practice to use a new name rather than transforming 'in place'
mydata2 <- transform(mydata,marrate=marriage/pop,divrate=divorce/pop)
model1 <- lm(divrate~marrate+medage+pop,data=mydata2)
library(car)
plot(model1)
dfbetaPlots(model1)
This works fine for me in a clean session:
dataNV <- subset(mydata2,state != "Nevada")
## update() may be nice to avoid repeating details of the
## model specification (not really necessary in this case)
model3 <- update(model1,data=dataNV)
Or you can use the subset argument:
model4 <- update(model1,subset=(state != "Nevada"))

Resources