I'm trying to apply a CAP function to chemical data collected in different years.
I have a data archive:
head(ISPA_data)
SrCa MgCa MnCa RbCa CuCa ZnCa BaCa PbCa NaCa LiCa CoCa NiCa
1 5178 25.101 9.334 0.166 4.869 8.379 34.846 0.194 5464 0.313 2.510 25.181
2 6017 22.922 7.185 0.166 4.685 8.720 24.659 0.154 4600 0.300 2.475 25.060
3 5628 26.232 6.248 0.179 4.628 10.157 23.942 0.166 5378 0.300 2.529 25.252
4 4769 35.598 7.683 0.131 4.370 8.735 50.068 0.180 5938 0.568 2.159 21.645
5 5330 28.284 6.828 0.130 5.370 12.742 34.257 0.220 5614 0.397 2.275 23.852
6 5786 24.603 4.797 0.156 5.317 13.331 66.896 0.117 5001 0.423 2.298 24.361
and a environmental dataset:
head(ISPA.env)
Year OM Code Location
<dbl> <chr> <chr> <chr>
1 1975 0.04349 CSP75_25 CSP
2 1975 0.0433 CSP75_28 CSP
3 1975 0.04553 CSP75_31 CSP
4 1975 0.0439 CSP75_33 CSP
5 1975 0.02998 CSP75_37 CSP
6 1975 0.0246 CSP75_39 CSP
When performing CAPdiscrim,
Ordination.model1 <- CAPdiscrim(ISPA_data~Year,
ISPA.env,
dist="euclidean",
axes=4,
m=0,
add=FALSE,
permutations=999)
this Error occurs:
Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one
Besides: Warning message:
In cmdscale(distmatrix, k = nrow(x) - 1, eig = T, add = add) :
only 13 of the first 19 eigenvalues are > 0
All data has the same length.
Can anyone help me? Thanks!
Related
This is my data. Daily return data for different sectors.
I would like to compute the 3 month rolling correlation between sectors but keep the date field and have it line up.
> head(data)
Date Communication Services Consumer Discretionary Consumer Staples Energy Financials - AREITs Financials - ex AREITs Health Care
1 2003-01-02 -0.0004 0.0016 0.0033 0.0007 0.0073 0.0006 0.0370
2 2003-01-03 -0.0126 -0.0008 0.0057 -0.0019 0.0016 0.0062 0.0166
3 2003-01-06 0.0076 0.0058 -0.0051 0.0044 0.0063 0.0037 -0.0082
4 2003-01-07 -0.0152 0.0052 -0.0024 -0.0042 -0.0037 -0.0014 0.0027
5 2003-01-08 0.0107 0.0017 0.0047 -0.0057 0.0013 -0.0008 -0.0003
6 2003-01-09 -0.0157 0.0019 -0.0020 0.0009 -0.0016 -0.0012 0.0055
`
My data type is this
$ Date : Date[1:5241], format: "2003-01-02" "2003-01-03" "2003-01-06" "2003-01-07" ...
$ Communication Services : num [1:5241] -0.0004 -0.0126 0.0076 -0.0152 0.0107 -0.0157 0.0057 -0.0131 0.0044 0.0103 ...
$ Consumer Discretionary : num [1:5241] 0.0016 -0.0008 0.0058 0.0052 0.0017 0.0019 -0.0022 0.0057 -0.0028 0.0039 ...
$ Consumer Staples : num [1:5241] 0.0033 0.0057 -0.0051 -0.0024 0.0047 -0.002 0.0043 -0.0005 0.0163 0.004 ...
$ Energy : num [1:5241] 0.0007 -0.0019 0.0044 -0.0042 -0.0057 0.0009 0.0058 0.0167 -0.0026 -0.0043 ...
$ Financials - AREITs : num [1:5241] 0.0073 0.0016 0.0063 -0.0037 0.0013 -0.0016 0 0.0025 -0.0051 0.0026 ...`
Currently what I am doing is this:
rollingcor <- rollapply(data, width=60, function(x) cor(x[,2],x[,3]),by=60, by.column=FALSE)
This works fine and works out the rolling 60 day correlation and shifts the window by 60 days. However it doesnt keep the date column and I find it hard to match the dates.
The end goal here is to produce a df in which the the date is every 3 months and the other columns are the correlations between all the sectors in my data.
Please read the information at the top of the r tag and, in particular provide the input in an easily reproducible manner using dput. In the absence of that we will use data shown below based on the 6x2 BOD data frame that comes with R and use a width of 4. The names on the correlation columns are the row:column numbers in the correlation matrix. For example, compare the 4th row of the output below with cor(data[1:4, -1]) .
fill=NA causes it to output the same number of rows as the input by filling with NA's.
library(zoo)
# test data
data <- cbind(Date = as.Date("2023-02-01") + 0:5, BOD, X = 1:6)
# given data frame x return lower triangular part of cor matrix
# Last 2 lines add row:column names.
Cor <- function(x) {
k <- cor(x)
lo <- lower.tri(k)
k.lo <- k[lo]
m <- which(lo, arr.ind = TRUE) # rows & cols of lower tri
setNames(k.lo, paste(m[, 1], m[, 2], sep = ":"))
}
cbind(data, rollapplyr(data[-1], 4, Cor, by.column = FALSE, fill = NA))
giving:
Date Time demand X 2:1 3:1 3:2
1 2023-02-01 1 8.3 1 NA NA NA
2 2023-02-02 2 10.3 2 NA NA NA
3 2023-02-03 3 19.0 3 NA NA NA
4 2023-02-04 4 16.0 4 0.8280576 1.0000000 0.8280576
5 2023-02-05 5 15.6 5 0.4604354 1.0000000 0.4604354
6 2023-02-06 7 19.8 6 0.2959666 0.9827076 0.1223522
I have a data frame in which the variables were calculated by using the unite function in R dyplr. When I load the new data frame, all the variables are there as they should be; however, when trying to further work with it and index it by using the mutate function I get an error saying:! object 'inverted_facing' not found. I tried renaming the variables, but they are not found by the rename function either. All the online advice talks about either forgetting to create the variables or misspelling them in the code, but this is not the case. They are in the data frame and they are spelled correctly in the code. Why are they not found? Can anyone help with fixing this?
Data Frame:
> subj_means_index_Acc
# A tibble: 100 × 7
# Groups: subject, site, category [100]
subject site category `inverted_facing ` inverted_nonFacing `upright _facing ` `upright _nonFacing`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 P01 EBA "chairs " 0.875 0.969 0.969 1
2 P01 EBA "targets" 0.656 0.75 0.906 1
3 P01 OPA "chairs " 0.969 1 0.906 0.969
4 P01 OPA "targets" 0.469 0.688 0.906 0.969
5 P02 EBA "chairs " 0.906 0.875 0.938 0.906
6 P02 EBA "targets" 0.812 0.812 0.906 0.938
7 P02 OPA "chairs " 0.938 0.781 0.875 0.938
8 P02 OPA "targets" 0.781 0.906 0.875 0.906
9 P03 EBA "chairs " 0.719 0.938 0.906 0.781
10 P03 EBA "targets" 0.938 0.844 0.969 0.938
# … with 90 more rows
Code:
subj_means_index_Acc <- subjmeans_condition %>%
select(subject, site, category, orientation, direction, mAcc) %>%
unite("condition", orientation, direction) %>%
spread(condition, mAcc)
subj_means_index_Acc <- subj_means_index_Acc %>%
mutate(inv_eff_facing = (1-inverted_facing) - (1-upright_facing),
inv_eff_nonfacing =(1-inverted_nonFacing) - (1-upright_nonFacing)) %>%
mutate(inv_eff_fac_nf = inv_eff_facing - inv_eff_nonfacing)
I have datasheets with multiple measurements that look like the following:
FILE DATE TIME LOC QUAD LAI SEL DIFN MTA SEM SMP
20 20210805 08:38:32 H 1161 2.80 0.68 0.145 49. 8. 4
ANGLES 7.000 23.00 38.00 53.00 68.00
CNTCT# 1.969 1.517 0.981 1.579 1.386
STDDEV 1.632 1.051 0.596 0.904 0.379
DISTS 1.008 1.087 1.270 1.662 2.670
GAPS 0.137 0.192 0.288 0.073 0.025
A 1 08:38:40 31.66 33.63 34.59 39.13 55.86
1 2 08:38:40 -5.0e-006
B 3 08:38:48 25.74 20.71 15.03 2.584 1.716
B 4 08:38:55 0.344 1.107 2.730 0.285 0.265
B 5 08:39:02 3.211 5.105 13.01 4.828 1.943
B 6 08:39:10 8.423 22.91 48.77 16.34 3.572
B 7 08:39:19 12.58 14.90 18.34 18.26 4.125
I would like to read the entire datasheet and extract the values for 'QUAD' and 'LAI' only. For example, for the data above I would only be extracting a QUAD of 1161 and an LAI of 2.80.
In the past the datasheets were formatted as long data, and I was able to use the following code:
library(stringr)
QUAD <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^QUAD).*$")))
LAI <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^LAI).*$")))
data_extract <- data.frame(
QUAD = QUAD[!is.na(QUAD)],
LAI = LAI[!is.na(LAI)]
)
data_extract
Unfortunately, this does not work because of the wide formatting in the current datasheet. Any help would be hugely appreciated. Thanks in advance for your time.
My dataset is very similar to the dataset 'Melanoma' included in the RiskRegression package : 3307 patients, 502 events of interest (fracture), 264 deaths (competing risk). The time is the years after bone examination (DXA) and status is coded in this way O=censored,1=fracture,2=death).
I am trying to fit a Fine-Gray model with interaction, but when I introduce an interaction term under the form of var1 * var2) I receive an error message :
« Error in design[pos, , drop = FALSE] : subscript out of bounds » .
Here is my code :
fgr<-FGR(Hist(time,status)~age+htot_bmd+tot_bmd+amof+PR+atcdtfam+AlcFR+PR+BMI3C+malchronFR+malchronFR*BMI3C+atcdtfam*PR,data=df2,cause=1)
I tried the code provided in the paper of Zhongheng et al. "Model validation for competing risks data" with the data set 'Melanoma' introducing an interaction but the same error message appears.
Is it possible to introduce an interaction with FGR and how to do it ?
Thanks
You can do with your data with the following code:
> library(riskRegression)
> library(survival)
> library(prodlim)
> library(cmprsk)
> library(readxl)
> df2 <- read_xlsx("/Users/zhang/Downloads/df2.xlsx")
New names:
* `` -> ...1
> df2
# A tibble: 300 x 14
...1 neck_bmd htot_bmd tot_bmd age AlcFR PR atcdtfam malchronFR amof BMI3C time event
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.960 0.953 1.04 79.1 0 0 0 0 2 3 9.00 Cen
2 2 0.612 0.620 0.988 79.2 0 0 0 0 0 3 4.76 MOF
3 3 0.880 0.990 0.827 78.6 0 0 0 1 1 2 9.14 Cen
4 4 0.869 0.905 0.866 79.0 0 0 0 0 0 2 9.11 Cen
5 5 0.863 0.991 1.17 79.0 1 0 1 0 0 2 10.2 Cen
6 6 0.722 0.902 0.842 78.8 0 0 0 0 0 2 9.09 Cen
7 7 0.853 0.929 1.33 76.9 0 0 0 0 0 3 10.1 Cen
8 8 0.830 0.912 0.947 77.0 0 0 0 1 0 2 8.13 Cen
9 9 0.872 0.968 1.22 77.2 1 0 0 0 0 2 8.12 Cen
10 10 0.639 0.776 0.822 76.7 0 0 0 1 0 2 8.12 Cen
# … with 290 more rows, and 1 more variable: status <dbl>
> modMatrix <- model.matrix(~age+htot_bmd+tot_bmd+amof+atcdtfam+
+ AlcFR+PR+BMI3C*malchronFR+neck_bmd,df2)[,-1]
> dtInteraction <- cbind(data.frame( modMatrix),
+ status=df2$status, time=df2$time)
> fgr.Interaction<- FGR(as.formula(paste("Hist(time,status)~",
+ paste(names(dtInteraction[1:11]),collapse = "+"))),
+ data = dtInteraction,cause = 1)
> score.cv<-riskRegression::Score(list("Fine-Gray"= fgr.Interaction),
+ formula = Hist(time,status)~1,
+ data=dtInteraction,times = sort(unique(dtInteraction$time))[25:200],
+ cens.method="jackknife",
+ se.fit=1L,plots="calibration")
> plotCalibration(score.cv,times = df2$time[11],
+ cens.method="local")
You can use a model.matrix function as follows. The crr()function can do the interaction.
> library(riskRegression)
> library(survival)
> library(prodlim)
> library(cmprsk)
> data(Melanoma)
> Melanoma$id<-1:nrow(Melanoma)
> set.seed(123)
> ind.split<-sample(1:nrow(Melanoma),
+ round(nrow(Melanoma)*4/5),
+ replace = F)
> dftrain<-Melanoma[ind.split,]
> dftest<-Melanoma[-ind.split,]
> fgr.full<-FGR(Hist(time,status)~age+thick+ici+
+ epicel+ulcer+sex+invasion,
+ data=dftrain,cause=1)
> modMatrix <- model.matrix(~thick+ici+
+ epicel+ulcer*age+invasion,dftrain)[,-1]
>
> fgrMod <- crr(ftime = dftrain$time,
+ fstatus = dftrain$status,
+ cov1 = modMatrix,failcode=2)
> summary(fgrMod)
Competing Risks Regression
Call:
crr(ftime = dftrain$time, fstatus = dftrain$status, cov1 = modMatrix,
failcode = 2)
coef exp(coef) se(coef) z p-value
thick 0.1194 1.127 0.1292 0.924 0.3600
ici1 -0.7607 0.467 1.0721 -0.710 0.4800
ici2 -0.8531 0.426 0.9379 -0.910 0.3600
ici3 -0.1924 0.825 1.0895 -0.177 0.8600
epicelpresent 0.8973 2.453 0.8434 1.064 0.2900
ulcerpresent -0.7101 0.492 1.9776 -0.359 0.7200
age 0.0627 1.065 0.0227 2.766 0.0057
invasionlevel.1 -1.2031 0.300 0.7068 -1.702 0.0890
invasionlevel.2 -2.0365 0.130 1.4121 -1.442 0.1500
ulcerpresent:age 0.0152 1.015 0.0320 0.473 0.6400
exp(coef) exp(-coef) 2.5% 97.5%
thick 1.127 0.887 0.87475 1.45
ici1 0.467 2.140 0.05716 3.82
ici2 0.426 2.347 0.06780 2.68
ici3 0.825 1.212 0.09752 6.98
epicelpresent 2.453 0.408 0.46968 12.81
ulcerpresent 0.492 2.034 0.01019 23.71
age 1.065 0.939 1.01844 1.11
invasionlevel.1 0.300 3.330 0.07515 1.20
invasionlevel.2 0.130 7.664 0.00819 2.08
ulcerpresent:age 1.015 0.985 0.95348 1.08
Num. cases = 164
Pseudo Log-likelihood = -52.3
Pseudo likelihood ratio test = 21.1 on 10 df,
Then you can try the following code:
library(riskRegression)
library(survival)
library(prodlim)
library(cmprsk)
data(Melanoma)
Melanoma$id<-1:nrow(Melanoma)
set.seed(123)
ind.split<-sample(1:nrow(Melanoma),
round(nrow(Melanoma)*4/5),
replace = F)
dftrain<-Melanoma[ind.split,]
dftest<-Melanoma[-ind.split,]
fgr.NoInteraction<-FGR(Hist(time,status)~age+thick+ici+
epicel+ulcer+sex+invasion,
data=dftrain,cause=1)
modMatrix <- model.matrix(~thick+ici+
epicel+ulcer*age+invasion,dftrain)[,-1]
dtInteraction <- cbind(data.frame( modMatrix),status=dftrain$status,
time=dftrain$time)
fgr.Interaction<- FGR(as.formula(paste("Hist(time,status)~",paste(names(dtInteraction[1:9]),collapse = "+"))),
data = dtInteraction,cause = 1)
score.cv<-riskRegression::Score(list("Fine-Gray"= fgr.Interaction),
formula = Hist(time,status)~1,
data=dtInteraction,times = sort(unique(dtInteraction$time)),
cens.method="jackknife",
se.fit=1L,plots="calibration")
plotCalibration(score.cv,times = 3330,cens.method="local")
We still need to add interaction term with model.matrix. However, we can only use object from FGR as input to the Score function. The other figures in the paper can be done with similar tricks.
I am new to coding and to R. Currently working with package relsurv. For this I would like to calculate the relative survival at certain timepoints.
I am using the following to assess RS at five years:
rcurve2 <- rs.surv(Surv(time_days17/365.241,event_17)~1+
ratetable(age = age_diagnosis*365.241, sex = sex,
year = year_diagnosis_days), data = survdata, ratetable = swepop,
method="ederer1",conf.int=0.95,type="kaplan-meier",
add.times = 5*365.241)
summary(rcurve2)
However, I get the same result in my summary output regardless of what number I put after add.times ie for all event/cenasoring points (see below)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0.205 177 1 0.9944 0.00562 0.9834 1.005
0.627 176 1 0.9888 0.00792 0.9734 1.004
0.742 175 1 0.9831 0.00968 0.9644 1.002
0.827 174 1 0.9775 0.01114 0.9559 1.000
0.849 173 1 0.9718 0.01242 0.9478 0.996
0.947 172 1 0.9662 0.01356 0.9400 0.993
...cont.
I am clearly not getting it right! Would be grateful for your help!
A very good question!
When adding "imaginary" times using add.times, they are automatically censored, and wont show up with the summary() function. To see your added times set censored = TRUE:
summary(rcurve2, censored = TRUE)
You should now find your added time in the list that follows.
EXAMPLE
Using built in data with the relsurv package
data(slopop)
data(rdata)
#note the last argument add.times=1000
rcurve2 <- rs.surv(Surv(time,cens)~sex+ratetable(age=age*365.241, sex=sex,
year=year), ratetable=slopop, data=rdata, add.times = 1000)
When using summary(rcurve2) the time 1000 wont show up:
>summary(rcurve2)
[...]
973 200 1 0.792 0.03081 0.734 0.855
994 199 1 0.790 0.03103 0.732 0.854
1002 198 1 0.783 0.03183 0.723 0.848
[...]
BUT using summary(rcurve2, censored=TRUE) it will!
>summary(rcurve2, censored=TRUE)
[...]
973 200 1 0.792 0.03081 0.734 0.855
994 199 1 0.790 0.03103 0.732 0.854
1000 198 0 0.791 0.03106 0.732 0.854
1002 198 1 0.783 0.03183 0.723 0.848
[...]