I have a dataframe and trying to execute a shapiro-wilk test in multiples columns.
When a try to use the following code:
DF.Shapiro <- do.call(rbind, lapply(DF[c(3:41)], function(x) shapiro.test(x)[c("statistic", "p.value")]))
Always appears this message:
"Error in shapiro.test(x) : all 'x' values are identical"
Or
"Error in FUN(X[[i]], ...) : all 'x' values are identical"
How can a solve this?
Without data it's difficult to say but maybe the following untested solution will do what the question asks for.
num <- which(sapply(DF[3:41], is.numeric))
num <- intersect(3:41, num)
do.call(
rbind.data.frame,
lapply(DF[num], function(x){
tryCatch(shapiro.test(x)[c("statistic", "p.value")],
error = function(e) e)
})
)
Edit
If some of the tests return an error, the lapply instruction will return different types of data and the rbind.data.frame method will also give an error. The following code solves that problem by saving the lapply results in a named list, test_list and checking the list members for errors before binding the right ones.
test_list <- lapply(DF[num], function(x){
tryCatch(shapiro.test(x)[c("statistic", "p.value")],
error = function(e) e)
})
err <- sapply(test_list, inherits, "error")
err_list <- test_list[err]
do.call(rbind.data.frame, test_list[!err])
Related
I have been utilizing code from the answer of a previous question with great success. Last night, after successfully using the code many times I started to receive an error when trying to execute the second part of the code to access list results. The lst_elements list is not being created. This is my code.
# Run Google Distance API ALl Transit
res <- lapply(1:nrow(Lankenau), function(x) {
google_distance(origins = c(Lankenau[x,"LAT"],Lankenau[x,"LONG"]),
destinations = c(Lankenau[x,"O_Lat"],Lankenau[x,"O_Long"]),
mode = "transit",arrival_time = time)})
lst_elements <- lapply(res, function(x){
stats::setNames(
cbind(
distance_elements(x)[[1]][['duration']],
distance_elements(x)[[1]][['distance']]
)
, c("duration_text", "duration_value", "distance_text", "distance_value")
)
})
and the error received
Error in names(object) <- nm : attempt to set an attribute on NULL
3.
stats::setNames(cbind(distance_elements(x)[[1]][["duration"]],
distance_elements(x)[[1]][["distance"]]), c("duration_text",
"duration_value", "distance_text", "distance_value"))
2.
FUN(X[[i]], ...)
1.
lapply(res, function(x) {
stats::setNames(cbind(distance_elements(x)[[1]][["duration"]],
distance_elements(x)[[1]][["distance"]]), c("duration_text",
"duration_value", "distance_text", "distance_value")) ...
any tips would be great! I'm not sure what happened. The exact same good is still working for a different dataframe. Would this suggest that the error is stemming from the data.frame itself?
Likely it is data-specific due to maybe missing or bad inputs to the google_distance() call where NULL is returned in corresponding position in your res list.
Consider wrapping tryCatch to return a one-row data frame of NAs for those problem elements. If all elements emerge as one-row NA's then all runs of google_distance() failed.
lst_elements <- lapply(res, function(x){
tryCatch(setNames(cbind(distance_elements(x)[[1]][['duration']],
distance_elements(x)[[1]][['distance']]
), c("duration_text", "duration_value",
"distance_text", "distance_value")
),
error = function(e)
data.frame(duration_text=NA, duration_value=NA,
distance_text=NA, distance_value=NA)
)
})
final_df <- do.call(rbind, lst_elements)
Basically i'm looping through items in a database which give me list objects from which i am trying to extract data. When i try to extract things from the list to a database, and When data is not available for a given object i will get the followign error:
>scd[a,paste0("b",".iss..",b)]= sc[["content"]][["abstracts-retrieval-response"]][["item"]][["bibrecord"]][["tail"]][["bibliography"]][["reference"]][[b]][["ref-info"]][["ref-volisspag"]][["voliss"]][["#issue"]]
Error in `*tmp*`[[jj]] : subscript out of bounds
when the error occurs it stops my loop. I simply want R to return NA when there is no infomation available, so how would i go about doign this. I've tried wrapping the above code around try({}) but it doesn't seem to fix the problem- I think this is because it only works for functions? is there an easy way to get it to work for the above?
I tried the following code:
scd[a,paste0("b",".vol..",b)]= as.numeric(tryCatch(sc[["content"]][["abstracts-retrieval-response"]][["item"]][["bibrecord"]][["tail"]][["bibliography"]][["reference"]][[b]][["ref-info"]][["ref-volisspag"]][["voliss"]][["#volume"]],error = function(e) NA))
Error in `*tmp*`[[jj]] : subscript out of bounds
After the further investigation the issue arises because when there is no data available in the list it returns NULL rather than NA and then it cannot add this to the dataframe.
tryCatch will work for lists when the object doesn't actually exist:
test_list <- list(a = NULL, b = "some_value")
for(i in 1:3){
obj <- tryCatch({
x <- test_list[[i]]
x <- ifelse(is.null(x), NA, x)
}
, error = function(e) NA)
print(obj)
}
[1] NA
[1] "some_value"
[1] NA
Obviously, use with caution.
With your code:
scd[a,paste0("b",".iss..",b)] <- tryCatch({
x <- sc[["content"]][["abstracts-retrieval-response"]][["item"]][["bibrecord"]][["tail"]][["bibliography"]][["reference"]][[b]][["ref-info"]][["ref-volisspag"]][["voliss"]][["#issue"]]
x <- ifelse(is.null(x), NA, x)
}, error = function(e) NA)
I have some code below. It works fine, except for a strange error if I do not "re-reinforce" my list objects as data.frame's (see comment in code). The error occurs when I try to filter out empty list objects.
If I do not "re-reinforce" (df.list <- lapply(df.list, as.data.frame)) I get the error:
Error in LoadData(LCap.small) :
(list) object cannot be coerced to type 'double'
You can try out the code by inputting any vector with tickers. such as LoadData(c('YHOO', 'GOOG', 'FALSEGOOG')). Note that the error happens when I filter out tickers that do not exist in the Yahoo databse.
What is the cause and how can it be resolved?
Code
LoadData <- function(x) {
if(is.atomic(x) != TRUE & is.data.frame(x) != TRUE) stop('x must be either a data.frame or an atomic object')
if(is.data.frame(x) == TRUE) x <- as.character(x[,1])
df.list <- lapply(x, function(x) {
poss.error <- tryCatch(
{
quantmod::getSymbols(x, env = NULL, return.class = 'data.frame') # quantmod returns data.frame class object
},
error = function(e) {
message(cat(x, "could not be retrieved"))
return(e)
})
if(!inherits(poss.error, 'error')) {
x <- poss.error %>% setNames(gsub(pattern = '^(.*)[.]', replacement = "", colnames(poss.error))) %>%
mutate(Index = as.Date(rownames(poss.error))) %>%
rename(Adj.Close = Adjusted) %>%
select(Index, Open:Adj.Close)
return(x)
}
})
# add names
names(df.list) <- x
# make list to data.frame
df.list <- lapply(df.list, as.data.frame) # bypass error (list) object cannot be coerced to type 'double'
# filter out empty data frames, error occurs here if previous line of code is not present
df.list <- df.list[sapply(df.list, nrow) > 0]
return(df.list)
}
The anonymous function called in the first lapply sometimes returns NULL (when the ticker symbol does not exist). Then your df.list has data frames and NULL objects. Since nrow(NULL) = NULL your sapply then returns an object with numbers and NULLs, this does not convert to double. Use sapply(ldf.list, is.NULL) in its place (or, maybe better, fix the function in the first lapply to always return a data frame).
I'm trying to use tryCatch to generate a list of p-values there are several rows in the matrix that don't have enough observations for a t test. Here is the code I generated so far:
pValues <- c()
for(i in row.names(collapsed.gs.raw)){
tryCatch({
t <- t.test(as.numeric(collapsed.gs.raw[i,]) ~ group)
pValues <- c(pValues, t$p.value)
},
error = function(err) {
pValues <- c(pValues, "NA")
message("Error")
return(pValues)
})}
It definitely throws an error [I put in the message("Error") line to confirm]. The problem is that the vector pValues doesn't have any "NA" in it, though it should.
Thanks in advance for your help!
The pvalues in your function is a local variable. You might be able to fix this with <<-, but it would be preferred to have the function just return the one desired value and collect them outside the function with sapply. Perhaps something like (untested):
pValues <- sapply(rownames(collapsed.gs.raw), function(i) {
tryCatch({
t.test(as.numeric(collapsed.gs.raw[i,]) ~ group)$p.value
},
error = function(err) {
message("Error")
return(NA)
})
})
Using penalizedSVM R package, I am trying to do feature selection. There is a list of several data.frames called trainingdata.
trainingdata <-lapply(trainingdata, function(data)
{
levels(data$label) <- c(-1, 1)
train_x<-data[, -1]
train_x<-data.matrix(train_x)
trainy<-data[, 1]
print(which(!is.finite(train_x)))
scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
cross.outer=0, grid.search="discrete",
lambda1.set=lambda1.scad, parms.coding="none",
show="none", maxIter=1000, inner.val.method="cv",
cross.inner=5, seed=seed, verbose=FALSE)
data <- data[c(1, scad.fix$model$xind)]
data
})
Some iterations go well but then on one data.frame I am getting the following error message.
[1] "feature selection method is scad"
Error in svd(m, nv = 0, nu = 0) : infinite or missing values in 'x'
Calls: lapply ... scadsvc -> .calc.mult.inv_Q_mat2 -> rank.condition -> svd
Using the following call, I am also checking whether x is really infinite but the call returns 0 for all preceding and the current data.frame where the error has occurred.
print(which(!is.finite(train_x)))
Is there any other way to check for infinite values? What else could be done to rectify this error? Is there any way that one can determine the index of the current data.frame being processed within lapply?
For the first question , infinite or missing values in 'x' suggests that you change your condition to something like .
idx <- is.na(train_x) | is.infinite(train_x)
You can assign 0 for example to theses values.
train_x[idx] <- 0
For the second question , concerning how to get the names of current data.frame within lapply you can loop over the names of data.farmes, and do something like this :
lapply(names(trainingdata), function(data){ data <- trainingdata[data]....}
For example:
ll <- list(f=1,c=2)
> lapply(names(list(f=1,c=2)), function(x) data <- ll[x])
[[1]]
[[1]]$f
[1] 1
[[2]]
[[2]]$c
[1] 2
EDIT
You can use tryCatch before this line scad.fix<-svm.fs
tryCatch(
scad.fix<-svm.fs(....)
, error = function(e) e)
})
for example, here I test it on this list, the code continues to be executing to the end of list ,even there is a NA in the list.
lapply(list(1,NA,2), function(x){
tryCatch(
if (any(!is.finite(x)))
stop("infinite or missing values in 'x'")
, error = function(e) e)
})