Function throws error while using lapply, works fine otherwise - r

I have a list containing data tables. A sample list can be created using following code.
mydata=read.table(textConnection("
MSA_id code variable Caucasian African.American Asian Hispanic Other
412 111011 1 64 2 0 0 0
412 111011 2 464 17 4 11 0
412 111021 1 2006 43 32 22 61
412 111021 2 559 18 6 10 0
412 111031 1 56 1 0 0 0
412 111031 2 1 0 0 0 0"),header=TRUE)
setDT(mydata)
z = split(mydata,mydata$code)
> z[1:2]
$`111011`
MSA_id code variable Caucasian African.American Asian Hispanic Other
1: 412 111011 1 64 2 0 0 0
2: 412 111011 2 464 17 4 11 0
$`111021`
MSA_id code variable Caucasian African.American Asian Hispanic Other
1: 412 111021 1 2006 43 32 22 61
2: 412 111021 2 559 18 6 10 0
I want to reformat elements of this list (data.tables) based on their values.
From my code, the elements of reformatted list should like this:
First Element:
[,1] [,2]
[1,] 64 2
[2,] 464 32
Second Element
Caucasian African.American Asian Hispanic
1: 2006 43 32 22
2: 559 18 6 10
Algorithm for this is:
Remove first 3 columns and the last column.
If minimum value of Caucasian is 0, or sum of minimum values of rest
3 (that is:African.American,Asian,Hispanic) categories is 0, then
set the element as NA.
Else if minimum of African.American is 0 or sum of minimum values of
Asian and Hispanic is 0, then sum up African.American, Asian, and
Hispanic as single category.
Else if minimum value of Asian is 0 or minimum value of Hispanic is
0, sum up Asian and Hispanic as single category.
Else keep the format as it is.
I created a function to do it. When I use this function on one element at a time, it works fine, but when I use lapply, it breaks.
formatTable <- function(z){
a = z[[1]]
b = a[,list(Caucasian,African.American,Asian,Hispanic),] # Deleting columns 1,2,3 and 8
if ( min(b$Caucasian) == 0) {
formatTable=NA
} else if ( (min(b$African.American) + min(b$Asian) + min(b$Hispanic)) == 0) {
formatTable=NA
} else if ( (min(b$African.American) == 0) | (min(b$Asian) + min(b$Hispanic)==0)) {
formatTable = cbind(b$Caucasian, b$African.American+b$Asian+b$Hispanic)
} else if ( min(b$Asian)==0 | min(b$Hispanic)==0) {
formatTable = cbind(b$Caucasian, b$African.American, b$Asian+b$Hispanic)
} else
formatTable = b
}
Using this function, t1=formatTable(z[1]) and t2=formatTable(z[2]) gives correct result, however if I use tbls = lapply(z[1:2],formatTable) it says Error in FUN(X[[1L]], ...) : object 'Caucasian' not found.
Please help on why lapply throws this error.

Related

Looking for code in R to summarize by ____H or ____D?

I have a chart with ASV's per sample, the samples are sorted by number (sample) and a letter which corresponds to human or dog. I am trying to see which ASV's are in only humans, or only dogs. My thought for how to do this is sum all rows by dog or human, ignoring individual samples, and see values of 0 or greater than zero.
I am unsure of code, have tried a few things but none have worked. Mainly working with phyloseq and DESeq2.This is the table Im working with, 11,000 ASV samples.
I'm a little confused what the row names and column names represent but I gave it a go. Correct me if this is not exactly what you meant.
The data.table package has a neat function, melt( ) that allows you to transform data from wide to long format. This will make it easier for you to analyze and sum your values.
library(data.table)
data <- data.table(
`ASV_ID` = c(3,5,6,7,10,11,12,14,15,16,20),
`2104H` = c(0,353,483,305,289,200,0,0,0,284,406),
`2104D` = c(470,39,43,427,48,488,356,390,482,0,0),
`2105H` = c(0,784,816,0,704,100,0,0,0,158,141),
`2105D` = c(0,0,0,0,0,0,0,0,0,0,0))
data
ASV_ID 2104H 2104D 2105H 2105D
1: 3 0 470 0 0
2: 5 353 39 784 0
3: 6 483 43 816 0
4: 7 305 427 0 0
5: 10 289 48 704 0
6: 11 200 488 100 0
7: 12 0 356 0 0
8: 14 0 390 0 0
9: 15 0 482 0 0
10: 16 284 0 158 0
11: 20 406 0 141 0
data2 <- melt(
data = data,
id.vars = c("ASV_ID"),
measure.vars = c("2104H","2104D","2105H","2105D"),
variable.name = "sample",
value.name = "value")
data2[,.(Sum = sum(value)),by=.(sample)]
sample Sum
1: 2104H 2320
2: 2104D 2743
3: 2105H 2703
4: 2105D 0

Why does the frequency reduce if I use ifelse function in R?Is there a way to create categories from the combination of 2 variables/columns?

when I do
table(df$strategy.x)
0 1 2 3
70 514 223 209
table(df$strategy.y)
0 1 2 3
729 24 7 4
I want to create a variable with both of these combined. I tried this
df <- df %>%
mutate(nstrategy1 = ifelse(strategy.x==1| strategy.y==1 , 1, 0))
table(df$nstrategy1)
0 1
399 519
I am supposed to get 514 + 24 = 538 but I got 519 instead
df <- df %>% mutate(nstrategy2 = ifelse(strategy.x==2| strategy.y==2 , 1, 0))
table(df$nstrategy2)
0 1
578 228
Similarly, I am supposed to get 223 + 7 = 230, but I got 228 instead
Is there a good way to merge both strategy.x and strategy.y and end up with a table like the following with 4 categories?
0 1 2 3
799 538 230 213
table(mtcars$am) # 13 1's
table(mtcars$vs) # 14 1's
mtcars$ones = ifelse(mtcars$am == 1 | mtcars$vs == 1, 1, 0)
table(mtcars$ones) # 20 1's < 13 + 14 = 27
Why is it showing only 20 1's instead of 27? It's because there are 7 + 6 + 7 = 20 cars with either one or two 1's in am and vs. There are 13 with am==1 (6+7), and 14 with vs==1 (7+7). Seven cars are in the bottom left because they have 1's in both dimensions, which you are expecting/seeking to count twice.
table(mtcars$am, mtcars$vs)
# 0 1
# 0 12 7
# 1 6 7
The simplest way to get the sum of the two results would be by adding the two table objects:
table(mtcars$am) + table(mtcars$vs)
# 0 1
# 37 27

Conditional filling NA rows with comparing non-NA labeled rows

I want to fill NA rows based on checking the differences between the closest non-NA labeled rows.
For instance
data <- data.frame(sd_value=c(34,33,34,37,36,45),
value=c(383,428,437,455,508,509),
label=c(c("bad",rep(NA,4),"unable")))
> data
sd_value value label
1 34 383 bad
2 33 428 <NA>
3 34 437 <NA>
4 37 455 <NA>
5 36 508 <NA>
6 45 509 unable
I want to evaluate how to change NA rows with checking the difference between sd_value and value those close to bad and unablerows.
if we want to get differences between the rows we can do;
library(dplyr)
data%>%
mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))
sd_value value label diff_val diff_sd_val
1 34 383 bad 0 0
2 33 428 <NA> 45 -1
3 34 437 <NA> 9 1
4 37 455 <NA> 18 3
5 36 508 <NA> 53 -1
6 45 509 unable 1 9
The condition how I want to label the NA rows is
if the diff_val<50 and diff_sd_val<9 label them with the last non-NA label else use the first non-NA label after the last NA row.
So that the expected output would be
sd_value value label diff_val diff_sd_val
1 34 383 bad 0 0
2 33 428 bad 45 -1
3 34 437 bad 9 1
4 37 455 bad 18 3
5 36 508 unable 53 -1
6 45 509 unable 1 9
The possible solution I cooked up so far:
custom_labelling <- function(x,y,label){
diff_sd_val<-c(NA,diff(x))
diff_val<-c(NA,diff(y))
label <- NA
for (i in 1:length(label)){
if(is.na(label[i])&diff_sd_val<9&diff_val<50){
label[i] <- label
}
else {
label <- label[i]
}
}
return(label)
}
which gives
data%>%
mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))%>%
mutate(custom_label=custom_labelling(sd_value,value,label))
Error in mutate_impl(.data, dots) :
Evaluation error: missing value where TRUE/FALSE needed.
In addition: Warning message:
In if (is.na(label[i]) & diff_sd_val < 9 & diff_val < 50) { :
the condition has length > 1 and only the first element will be used
One option is to find NA and non-NA index and based on the condition select the closest label to it.
library(dplyr)
#Create a new dataframe with diff_val and diff_sd_val
data1 <- data%>% mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))
#Get the NA indices
NA_inds <- which(is.na(data1$label))
#Get the non-NA indices
non_NA_inds <- setdiff(1:nrow(data1), NA_inds)
#For every NA index
for (i in NA_inds) {
#Check the condition
if(data1$diff_sd_val[i] < 9 & data1$diff_val[i] < 50)
#Get the last non-NA label
data1$label[i] <- data1$label[non_NA_inds[which.max(i > non_NA_inds)]]
else
#Get the first non-NA label after last NA value
data1$label[i] <- data1$label[non_NA_inds[i < non_NA_inds]]
}
data1
# sd_value value label diff_val diff_sd_val
#1 34 383 bad 0 0
#2 33 428 bad 45 -1
#3 34 437 bad 9 1
#4 37 455 bad 18 3
#5 36 508 unable 53 -1
#6 45 509 unable 1 9
You can remove diff_val and diff_sd_val columns later if not needed.
We can also create a function
custom_label <- function(label, diff_val, diff_sd_val) {
NA_inds <- which(is.na(label))
non_NA_inds <- setdiff(1:length(label), NA_inds)
new_label = label
for (i in NA_inds) {
if(diff_sd_val[i] < 9 & diff_val[i] < 50)
new_label[i] <- label[non_NA_inds[which.max(i > non_NA_inds)]]
else
new_label[i] <- label[non_NA_inds[i < non_NA_inds]]
}
return(new_label)
}
and then apply it
data%>%
mutate(diff_val = c(0, diff(value)),
diff_sd_val = c(0, diff(sd_value)),
new_label = custom_label(label, diff_val, diff_sd_val))
# sd_value value label diff_val diff_sd_val new_label
#1 34 383 bad 0 0 bad
#2 33 428 <NA> 45 -1 bad
#3 34 437 <NA> 9 1 bad
#4 37 455 <NA> 18 3 bad
#5 36 508 <NA> 53 -1 unable
#6 45 509 unable 1 9 unable
If we want to apply it by group we can add a group_by statement and it should work.
data%>%
group_by(group) %>%
mutate(diff_val = c(0, diff(value)),
diff_sd_val = c(0, diff(sd_value)),
new_label = custom_label(label, diff_val, diff_sd_val))

IF "OR" multiple conditions

I have a standard 2x2 table
Yes No
Yes a b
No c d
I want to create a condition whereby IF(a or b or c or d = 0) then 0.5 is added on to each of the cells a,b,c,d.
I have tried this:
if(a && b && c && d == 0){
a=a+0.5, b=b+0.5, c=c+0.5, d=d+0.5
}
But I am getting an error saying
Error: unexpected ',' in:
"if(a && b && c && d== 0){
a=a+0.5,"
i.e. I don't think it is letting me put multiple things to execute.
Also I don't think that the && is right between each of the letters as I believe that means IF(a and b and ...)
UPDATE TO QUESTION:
I have another related question.
If I have say a set of say n tables, all in the format:
Yes No
Yes a b
No c d
and if one of the a,b,c or d in any of the n tables is equal to zero then 0.5 is added on to each of the a,b,c,d for all of the n tables. How would I do that?
My list looks like the following:
n11 n12 n21 n22
1 188 1157 173 1168
2 2 201 1 101
3 369 2280 354 2289
4 1 61 0 61
5 1306 16870 1333 16773
6 4 81 3 79
7 6 117 5 118
8 19 334 15 318
9 1 49 0 48
10 0 36 1 33
11 2 114 3 113
12 13 433 37 696
13 1 64 0 65
14 4 157 1 160
15 1 42 0 43
16 1 150 5 146
17 7 1124 10 1117
18 2 78 2 77
and what I am trying to say is that if any of the aspects of the cells of the table are 0, then I want 0.5 to be added on to every cell.
In R you can't use , to separate line, but you can use ;.
Also, the way you are doing considers a,b and c are boolean (TRUE/FALSE), which is not the case as they are numbers. Your condition should be :
if (a == 0 || b == 0 || c == 0 || d == 0)
Note that your code will run nevertheless, even if a,b and c are not boolean since they are numbers and there is an equivalence between FALSE and a == 0. This means you could also write your condition as :
if (!a || !b || !c || !d)
For the UPDATE, I consider matList is the list of matrices :
for (ii in 1:length(matList())) {
if (any(matList[[ii]] == 0)) {
matList = lapply(matList, function(X){X+0.5})
break # Exit the for loop
}
}
lapply applies mat + 0.5 (i.e + 0.5 to each element of the matrix thanks to R sugar) to every element (here matrices) of the list matList and returns the resulting list.
The problem is with the commas that separate your variables. R syntax does not allow you to do it. Write it this way:
if (a && b && c && d == 0){
a=a+0.5
b=b+0.5
c=c+0.5
d=d+0.5
}
Another problem is that the behaviour you described does not match with your code. If you write && it means and, not or. If you want to check if each element is equal to 0, you should write the following:
Modified based on Rodrigo's comment, the correct code would be:
if (0 %in% c(a,b,c,d)){
a=a+0.5
b=b+0.5
c=c+0.5
d=d+0.5
}

mistake in multivePenal but not in frailtyPenal

The libraries used are: library(survival)
library(splines)
library(boot)
library(frailtypack) and the function used is in the library frailty pack.
In my data I have two recurrent events(delta.stable and delta.unstable) and one terminal event (delta.censor). There are some time-varying explanatory variables, like unemployment rate(u.rate) (is quarterly) that's why my dataset has been splitted by quarters.
Here there is a link to the subsample used in the code just below, just in case it may be helpful to see the mistake. https://www.dropbox.com/s/spfywobydr94bml/cr_05_males_services.rda
The problem is that it takes a lot of time running until the warning message appear.
Main variables of the Survival function are:
I have two recurrent events:
delta.unstable (unst.): takes value one when the individual find an unstable job.
delta.stable (stable): takes value one when the individual find a stable job.
And one terminal event
delta.censor (d.censor): takes value one when the individual has death, retired or emigrated.
row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392
When I apply multivePenal I obtain the following message:
Error en aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
Además: Mensajes de aviso perdidos
In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created
#### multivePenal function
fit.joint.05_malesP<multivePenal(Surv(.t0,.t,delta.stable)~cluster(contadorbis)+terminal(as.factor(delta.censor))+event2(delta.unstable),formula.terminalEvent=~1, formula2=~as.factor(h.skill),data=cr_05_males_serv,Frailty=TRUE,recurrentAG=TRUE,cross.validation=F,n.knots=c(7,7,7), kappa=c(1,1,1), maxit=1000, hazard="Splines")
I have checked if Surv(.t0,.t,delta.stable) contains NA, and there are no NA's.
In addition, when I apply for the same data the function frailtyPenal for both possible combinations, the function run well and I get results. I take one week looking at this and I do not find the key. I would appreciate some of light to this problem.
#delta unstable+death
enter code here
fit.joint.05_males<-frailtyPenal(Surv(.t0,.t,delta.unstable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+ as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+ as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities)+
terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###Be patient. The program is computing ...
###The program took 2259.42 seconds
#delta stable+death
fit.joint.05_males<frailtyPenal(Surv(.t0,.t,delta.stable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities)+terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###The program took 3167.15 seconds
Because you neither provide information about the packages used, nor the data necessary to run multivepenal or frailtyPenal, I can only help you with the Surv part (because I happened to have that package loaded).
The Surv warning message you provided (In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created) suggests that something is strange with your variables .t0 (the time argument in Surv, refered to as 'start time' in the warning), and/or .t (time2 argument, 'Stop time' in the warning). I check this possibility with a simple example
# read the data you feed `Surv` with
df <- read.table(text = "row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392", header = TRUE)
# create survival object
mysurv <- with(df, Surv(time = .t0, time2 = .t, event = stable))
mysurv
# create a new data set where one .t for some reason is less than .to
# on row five .t0 is 61, so I set .t to 60
df2 <- df
df2$.t[df2$.t == 86] <- 60
# create survival object using new data which contains at least one Stop time that is less than Start time
mysurv2 <- with(df2, Surv(time = .t0, time2 = .t, event = stable))
# Warning message:
# In Surv(time = .t0, time2 = .t, event = stable) :
# Stop time must be > start time, NA created
# i.e. the same warning message as you got
# check the survival object
mysurv2
# as you can see, the fifth interval contains NA
# I would recommend you check .t0 and .t in your data set carefully
# one way to examine rows where Stop time (.t) is less than start time (.t0) is:
df2[which(df2$.t0 > df2$.t), ]
I am not familiar with multivepenal but it seems that it does not accept a survival object which contains intervals with NA, whereas might frailtyPenal might do so.
The authors of the package have told me that the function is not finished yet, so perhaps that is the reason that it is not working well.
I encountered the same error and arrived at this solution.
frailtyPenal() will not accept data.frames of different length. The data.frame used in Surv and data.frame named in data= in frailtyPenal must be the same length. I used a Cox regression to identify the incomplete cases, reset the survival object to exclude the missing cases and, finally, run frailtyPenal:
library(survival)
library(frailtypack)
data(readmission)
#Reproduce the error
#change the first start time to NA
readmission[1,3] <- NA
#create a survival object with one missing time
surv.obj1 <- with(readmission, Surv(t.start, t.stop, event))
#observe the error
frailtyPenal(surv.obj1 ~ cluster(id) + dukes,
data=readmission,
cross.validation=FALSE,
n.knots=10,
kappa=1,
hazard="Splines")
#repair by resetting the surv object to omit the missing value(s)
#identify NAs using a Cox model
cox.na <- coxph(surv.obj1 ~ dukes, data = readmission)
#remove the NA cases from the original set to create complete cases
readmission2 <- readmission[-cox.na$na.action,]
#reset the survival object using the complete cases
surv.obj2 <- with(readmission2, Surv(t.start, t.stop, event))
#run frailtyPenal using the complete cases dataset and the complete cases Surv object
frailtyPenal(surv.obj2 ~ cluster(id) + dukes,
data = readmission2,
cross.validation = FALSE,
n.knots = 10,
kappa = 1,
hazard = "Splines")

Resources