Missing value often results in errors in lapply (in R) - r

I got errors in this code:
FUN = function(files) {
df_week<- data.table::fread(files)
#Sun rate
for (i in 1: nrow(df_week) ){
#check if df is not NA
if(!is.na(df_week[i]))
{
if(df_week$Sun[i] >=10 ) {df_week$Sunr[i] =5}
....
}
}
files = list.files(pattern="1_Stas*")
lapply(files, FUN)
Output:
Error in if (!is.na(df_week[i])) { : argument is of length zero
In addition: Warning message:
13 failed to parse.
Why does the code if () {} gives errors?
If the input contains missing value or NA, the ouput should be NaN or NA , and lapply should continue to the next list of files.
I have tried it with a single file without using lapply and function, the output appears in the environment as empty data.
So when I do it one by one, there's no error. When it is done using lapply, very often there would be problems. Should I uses for loop instead?
Any suggestions to fix it and make lapply continue to the next list of files when the previous file contains missing value?
Thanks.

Just use a for loop. The apply family is good for quick one-liners when you need to apply a single operation to one dimension of your data. Your code will become:
files = list.files(pattern="1_Stas*")
df_week <- data.table::fread(files) # Make sure length(files) == 1
#Sun rate
for (i in 1:nrow(df_week)) {
#check if df is not NA
if (!is.na(df_week[i])) {
if (df_week$Sun[i] >= 10) {
df_week$Sunr[i] = 5
}
...
}
}
The line containing if(!is.na(df_week[i])) { needs clarification. From context, length(dim(df_week)) > 1, so you probably want
if (all(!is.na(df_week[i]))) {
...
}
all(!is.na(df_week[i])) returns true when the ith row of df_week containes no NA values.

Related

How to handle exception in R like in Java?

I am writing a server program in R where I upload PDF files and which later extracts data from the tables inside the pdf.
If required table and data is there, it works fine. But if not, it gives me error for files[i][[1]][[3]] and files[i][[1]][[4]].
error: subscript out of bounds
I want to default the value of buy and sell price at NA, if the table is not there.
all_data <- eventReactive(input$done, {
req(input$files)
files = {}
cost_price_list = {}
sell_price_list = {}
df <- data.frame(cost_price_list = character(), sell_price_list = character())
files <- lapply(input$files$datapath, extract_tables)
for (i in 1:length(input$files$datapath))
{
tryCatch(
{
cost_price_list <- files[i][[1]][[3]]
sell_price_list <- files[i][[1]][[4]]
},
error=function(cond) {
cost_price_list[i] = NA
sell_price_list[i] = NA
}
)
df[nrow(df) + 1,] <- c(cost_price_list[i],sell_price_list[i])
}
#return dataframe as table
df
})
But the above code doesn't work for me if the table is not there in pdf.
What am i doing wrong?
Please help.
There are multiple problems with your code.
First of, files = {} works but almost certainly doesn’t do what you intended. Did you mean files = list()? Otherwise, the idiomatic way of expressing what your code does is to write files = NULL in R. A more idiomatic way would be not to assign an empty object at all. In fact, your code overrides this initial value of files anyway, so files = {} is entirely redundant; and so are the next two assignments.
Next, since files is a list, you need to use double brackets to extract single elements from it: files[[i]], not files[i].
Another problem with your code is that assignment in R is always local. So cost_price_list[i] = NA creates a local variable inside the error handler function, even though cost_price_list exists in an outer scope.
If you want to assign to the outer variable, you need to explicitly specify the scope (or use <<-, but I recommend against this practice):
…
outer = environment()
for (i in 1:length(input$files$datapath))
{
tryCatch(
{
cost_price_list <- files[i][[1]][[3]]
sell_price_list <- files[i][[1]][[4]]
},
error=function(cond) {
outer$cost_price_list[i] = NA
outer$sell_price_list[i] = NA
}
)
df[nrow(df) + 1,] <- c(cost_price_list[i],sell_price_list[i])
}
…
But this still won’t work, because these variables do not have a value that can be meaningfully subset into (and it is not entirely clear what your code is attempting to do). Still, I hope the above gives you a foundation to work with.

R: How to read in file and skip lines that have errors with fread function?

I'm trying to read in a CSV file with fread function but while reading it breaks cause of extra characters in the row. So I was wondering if there is a way to read the file, skip the rows with errors, and continue reading it? Thank you for any advice.
Below, you can see the error I get
In fread("data.csv", :
Stopped early on line 617854. Expected 52 fields but found 54. Consider fill=TRUE and comment.char=. First discarded non-empty line:
I think you could use the nrows and skip arguments of fread to kind of do this yourself. I haven't got an appropriately broken csv to hand to test this on so no promises that this will work, but maybe something like the stuff below. This is basically an attempt to automate taking that row number flagged in the warning, and resuming reading the csv in for all rows after that row.
Essentially I'm reading in 100,000 rows at a time, and if that's successful, I write that data to a list called data_chunks. If it throws a warning, I pick up the warning message, use some regex to figure out what the line number is, and read up to that line. I then return that data.table and write to the data_chunks list. I then update the rows_to_skip value by the number of rows across all my data.tables in data_chunks, plus the number of problem rows (I return a bad_row boolean along with the data.table to indicate this, and add it to bad_rows at each iteration).
It is all in a while loop so will keep executing until the number of rows_to_skip exceeds the number of rows to be read, in which case, an error is thrown and the if statement triggers the break, and you exit the loop. Finally, use rbindlist to bind all the rows together across your list. This feels pretty hacky and probably isn't all that reliable but for the sake of getting your data loaded into R it may be a start at least:
data_chunks <- list()
i <- 1
rows_to_skip <- 0
rows_to_read <- 100000
bad_rows <- 0
file_name <- "my.csv"
while (TRUE) {
out <- tryCatch(
list(
data = data.table::fread(file_name, nrows = rows_to_read, skip = rows_to_skip, header = FALSE),
bad_row = FALSE
),
error = function(e) {
e
},
warning = function(w) {
warn_msg <- conditionMessage(w)
warn_matches <- regexec("line (\\d+)", warn_msg)
rows_to_read <- as.numeric(regmatches(warn_msg, warn_matches)[[1]][2]) - 1
if (!is.na(rows_to_read)) {
list(
data = data.table::fread(file_name, nrows = rows_to_read, skip = rows_to_skip, header = FALSE),
bad_row = TRUE
)
} else {
NULL
}
})
if ("error" %in% class(out) || is.null(out)) {
break
} else {
data_chunks[[i]] <- out[["data"]]
}
bad_rows <- bad_rows + out[["bad_row"]]
rows_to_skip <- sum(sapply(data_chunks, nrow)) + bad_rows
i <- i + 1
}
mydata <- data.table::rbindlist(data_chunks, use.names = FALSE)

For Loop to rename objects in a data frame (+ ignore NA) in R

I have a data frame that contains a column with binary variables (pointed or broad). To do my calculations I need to replace them with 0 or 1. I want to write a for loop which is doing this for me.
My code:
binary_To_Number<-function(df)
{
for(i in df)
{
if(i=="pointed")
{
i<-1
}
else if(i=="broad")
{
i<-0
}
else if(is.na(i))
{
print("NA")
}
else
{
}
}
}
binary_To_Number(town$shape)
I tried to use this piece of code. My first problem with it is that I don't know how to save the results. So my code is changing the i temporarily but won't save it in the df. I know that you can create an empty storage vector to store results in it, but can I replace the variable in my df immediately?
The second problem is that my code stops and gives me an error message if it comes to an i which contains NA.
Error in if (i == "pointed") { : missing value where TRUE/FALSE needed
Is there something I can do about it or do I need to replace the NA with a placeholder first?
You can also use dplyr (ensures 0 for not pointed):
library(dplyr)
df <- df %>%
mutate(
isPointed = as.integer(tolower(shape) == 'pointed')
)
Output:
shape isPointed
1 Pointed 1
2 broad 0
3 pointed 1
The dataframe I used:
df <- data.frame(
shape = c('Pointed', 'broad', 'pointed'),
stringsAsFactors = FALSE
)

R - Apply custom function to single column row by row

I have created a custom function and wish to apply it to a single column of a dataframe row by row, then assign it back to the original column
The custom function is below, and aims to fix the dates in an excel file.
format_dates = function(x) {
x = trimws(x)
if ( grepl('/', x, fixed=TRUE) ) {
as.Date(x, tryFormats = c("%d/%m/%Y", "%m/%d/%Y"))
} else {
tryCatch(
{ as.Date(as.integer(x), origin='1899-12-30') },
warning=function(x) { return( NA )
} )
}
}
It is mandatory to do this row by row. I have searched high and low and I have seen many replies using lapply, apply, and sapply but they do not work. As an example, I tried:
df$Child_Date_of_Birth = apply(df$Child_Date_of_Birth, 2, format_dates)
With the result of
Error in apply(df$Child_Date_of_Birth, 2, format_dates) :
dim(X) must have a positive length
This is frustrating, as in Pandas you can simply run
df['Child_Date_of_Birth'] = df['Child_Date_of_Birth'].apply(format_dates)
but in R this becomes the most obscure thing ever??
Anyone able to enlighten me... will appreciate it
An example data would be helpful but I think you can try sapply :
df$Child_Date_of_Birth <- sapply(df$Child_Date_of_Birth, format_dates)

How to check if subscript will be out of bounds?

If i want to check the existence of a variable I use
exists("variable")
In a script I am working on I sometimes encounter the problem of a "subscript out of bounds" after running, and then my script stops. In an if statement I would like to be able to check if a subscript will be out of bounds or not. If the outcome is "yes", then execute an alternative peace of the script, and if "not", then just continue the script as it was intended.
In my imagination in case of a list it would look something like:
if {subscriptOutofBounds(listvariable[[number]]) == TRUE) {
## execute this part of the code
}
else {
## execute this part
}
Does something like that exist in R?
You can compare the length of your list with other number. As an illustration, say I have a list with 3 index and want to check by comparing them with a vector of number 1 to 100.
lol <- list(c(1:10),
c(100:200),
c(3:50))
lol
check_out <- function(x) {
maxi <- max(x)
if (maxi > length(lol)) {
#Excecute this part of code
print("Yes")
}
else {
#Excecute this part of code
print("No")
}
}
num <- 1:100
check_out(num)
The biggest number of vector num is 100 and your list only has 3 index (or length =3), so it will be out of bound from your list, then it will return Yes

Resources