why write.csv is not updating the changes on test$number_s - r

I am changing the values of a column of a data frame. Then, I am saving the file, supposedly with the changes, but not. What am I missing? Thanks,
test <- data.frame(name_s = c("x","y","z"), number_s = c(1,2,3))
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <- test$number_s[[x]] + 1
})
write.csv(test,paste0("test ",format(Sys.time(),"%Y%m%d"),".csv"),
row.names = F)
that was oversimplified, the real deal is this one:
date_format_1 = "[0-9]-[:alpha:][:alpha:][:alpha:]"
date_format_2 = "[:alpha:][:alpha:][:alpha:]-[0-9][0-9]"
test <- data.frame(name_s = c("v","w","x","y","z"), event_text = c("Aug-89","7-May","9-Jun","4-Dec-2021","Feb-99"))
lapply(1:length(test$event_text), function(x) {
if (str_detect(test$event_text[[x]], paste0("\\b",date_format_1,"\\b")) == T){
test$event_text[x] <- paste0(str_sub(test$event_text[[x]],1,1), "/F",
which(month.abb %in% str_sub(
test$event_text[[x]], 3,5)))
} else if(str_detect(test$event_text[[x]], paste0("\\b", date_format_2,"\\b"))
== T) {
test$event_text[[x]] = paste0(which(month.abb %in% str_sub(
test$event_text[x],1,3)),"/F",str_sub(test$event_text[[x]],-2))
} else {
test$event_text[x] <- test$event_text[[x]]
}
})
write.csv(test,paste0("test ",format(Sys.time(),"%Y%m%d"),".csv"),
row.names = F)

Below I have written two calls to lapply that fix the issue you were having. The problem stems from the fact that R has scoped variables and so the value is changed within the function but the result is never returned or extracted from the function. As such I have demonstrated this by printing the dataframe after each of the lapply() calls below.
We can fix this in two ways. The first more correct version is to let lapply modify the exact vector directly by adding one to each value and returning x+1. (Note I have skipped curly braces and this will return the value from the next ppiece of code run, in this case x+1 alternatively you could write function(x) {return(x+1)} in that argument).
An alternate approach that will run slower but still use the indexing method is to use global assignment. <<- assigns the variable to the global scope/environment rather than the local scope of the function. (Note this code is run sequentially so the written call to this function is adding + 1 for the second time to the dataframe when shown below).
test <- data.frame(name_s = c("x","y","z"), number_s = c(1,2,3))
# Original Behaviour, doesn't work due to scoping issues
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <- test$number_s[[x]] + 1
})
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 3
#>
#> [[3]]
#> [1] 4
print(test)
#> name_s number_s
#> 1 x 1
#> 2 y 2
#> 3 z 3
# function that is syntactically and functionally correct
# instead of modifying the vector in the function scope the function returns the
# mutated vector which we then assign to the dataframe's vector
test$number_s <- lapply(test$number_s, function(x) x + 1)
print(test)
#> name_s number_s
#> 1 x 2
#> 2 y 3
#> 3 z 4
# function that is syntactically odd but functionally correct
# the function affects the values in the global scope, this works but is slower
# and is not best practice as it would be difficult to read
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <<- test$number_s[[x]] + 1
})
#> [[1]]
#> [1] 3
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 5
print(test)
#> name_s number_s
#> 1 x 3
#> 2 y 4
#> 3 z 5
Created on 2021-07-23 by the reprex package (v2.0.0)

Related

Recreation of tibble::tibble()

I need to write a function that creates objects similar to tibble::tibble(). The function should take a parameter list() = x that contains the columns (dont need to check for type, length of the columns,.. ).
I can’t use tibble::tibble() or similar functions like data.frame(), data.table::data.table().
I currently am stuck since I can’t figure out how to bind two columns of different types.
I tried collecting the types of the rows to display it in a row and then put the columns together somehow but I still havent figuered out how.
My code looks like the following:
library(tibble) #implemented but isnt used
my_tibble <- function(x) {
ncol <- as.integer(length(x))
nrow <- length(x[[1]])
y <- list()
i <- 1
while (i <= length(x)) {
y <- append(y, typeof(x[[i]]), length(y))
i = i + 1
}
}
And example output should look like this:
my_tb <- my_tibble(list(x=1:3, y=letters[1:3]))
my_tb
A tibble: 3 x 2
x y
<int> <chr>
1 1a
2 2b
3 3c
A tibble and a data.frame are essentially just lists with at least one additionaly attribute (row.names") and a special printing method. So here is a very simple implementation of a tibble:
my_tbl <- function(...) {
out <- list(...)
class(out) <- c("my_tbl", "data.frame")
attr(out, "row.names") <- seq_along(out[[1]])
return(out)
}
print.my_tbl <- function(x) {
cat("my tibble: ", nrow(x), " x ", ncol(x), "\n")
cat(colnames(x), "\n")
cat(paste0("<", sapply(x, class), ">"), "\n")
for (r in seq_along(x[[1]])) {
cat(unlist(x[1,]), "\n")
}
}
So now you can use the new function:
my_tbl(x=1:3, y=letters[1:3])
#> my tibble: 3 x 2
#> x y
#> <integer> <character>
#> 1 a
#> 1 a
#> 1 a
This also works with more complicated input:
my_tbl(list(x=1:3, y=letters[1:3]))
#> my tibble: 2 x 1
#>
#> <list>
#> 1 2 3
#> 1 2 3
Created on 2022-11-03 with reprex v2.0.2

R - rbind a dataframe with a NULL

I am trying to understand an example from a textbook.
The example code is like this:
x <- cbind(x1,x2,x3)
z <- NULL
y <- rbind(z,x)
My question is, why did it rbind to a Null when the output seems same as just x?
This example may be to illustrate that rbind()ing an object to NULL just returns the non-NULL object. This is something you can make use of if you have code that may or may not return a data.frame/matrix/vector (and otherwise returns NULL). The case you show is fairly trivial, but consider this example:
results <- lapply(1:4, function(i) {
if (i %% 2 == 0) {
return(data.frame(a = i, b = i / 2))
} else {
return(NULL)
}
})
# a list of mixed results (some may be NULL)
results
#> [[1]]
#> NULL
#>
#> [[2]]
#> a b
#> 1 2 1
#>
#> [[3]]
#> NULL
#>
#> [[4]]
#> a b
#> 1 4 2
# get a data.frame of just the non-NULL rows
do.call('rbind', results)
#> a b
#> 1 2 1
#> 2 4 2
Here, we are going to iteratively apply rbind() to each element in the list resulting from a call to lapply().
The function we apply is arbitrary, but it has some internal logic that leads to a data.frame result, but in other cases returns NULL. Consider for example an API request that may or may not return data, or may or may not be successful if you are not connected to internet.
Since rbind() of an object with NULL just returns the original object, there is no additional handling or need to return dummy values for the cases that do not meet the condition.

How to get and properly identify a function stored in a list in R

I need to be able to add functions in a list and be able to call it using a for() function.
Example:
Sample1 = function(a,c,b) c*3 + b*2 + a*1 + 3
Sample2 = function(d,e,f) d*1 + e*2 + f*3 + 5
sampleRList = list(Sample1,Sample2)
When I call
for(item in sampleRList){
print(typeof(item))
}
it prints out closure like normal since it is a function, but when I try to get the variables in the function using
foo <- function(x) {
if(length(x) > 1) {
res <- lapply(x, foo)
} else {
res <- if(is.numeric(x)) x else NA
}
na.omit(unlist(res))
}
foo(body(item))
^(sample code) where x would be the current item, it does not return the list containing the variables/numbers.
example of expected result is
[1] 1 2 3 4
How do I fix this? Thanks.
foo code taken from here <<
It's not terribly clear what you are asking here. I think you are saying that you want to extract the multiplicative coefficients from the functions Sample1 and Sample2 using the function foo, but you need to be able to do it if Sample1 and Sample2 are in a list.
In your example, you call foo(body(Ex1)), but you have no object called Ex1, so this code throws an error. If on the other hand you call:
foo(body(sampleRList[[1]]))
#> [1] 3 2 1 3
Then you can see that you retrieve the correct coefficients from Sample1, and to retrieve those from Sample2 you would do:
foo(body(sampleRList[[2]]))
#> [1] 1 2 3 5
If you wanted to get both at the same time you could do:
lapply(sampleRList, function(x) foo(body(x)))
#> [[1]]
#> [1] 3 2 1 3
#>
#> [[2]]
#> [1] 1 2 3 5

Retrieving the path to all data.frame class objects in rData files

I have multiple .rData files whose top level Global Environment variables are a mix of data.frames, lists, deeply nested lists. I know that many of the nested lists have within them data.frame types, but I'm having trouble retrieving the path to them.
I had a faced a similar problem before with another type of class using the following code
names(rapply(mget(ls(.GlobalEnv), envir=.GlobalEnv), length, classes="fluor.spectral.data", how="unlist"))
and while not the most elegant solution, it achieved what I needed and quickly. returning names like "Fluor.Spec.WA.M12.SC.13" which then allows me to manipulate the object after formating the '.' into '$'.
Can someone help me retrieve the path to all data.frame class types, nested or otherwise in highly variable .rData files? Thanks in advance
If you want to return all data.frames loaded in the global environment, either present as individual object or as element of a nested list, use rrapply in the rrapply-package (extension of base rrapply).
library(rrapply)
w <- data.frame(1)
x <- list(1, 2, 3)
y <- 5
z <- list(1, 2, list(1, df = data.frame(a = 1, b = 2)))
rrapply(as.list(.GlobalEnv), classes = "data.frame", how = "flatten")
#> $w
#> X1
#> 1 1
#>
#> $df
#> a b
#> 1 1 2
Setting classes = "data.frame" avoids recursion into data.frame columns (as base rapply would do), and how = "flatten" will return the collected data.frames as a flattened list.
NB: If you want to return the complete object paths to the found data.frames, set how = "prune" instead of how ="flatten":
rrapply(as.list(.GlobalEnv), classes = "data.frame", how = "prune")
#> $w
#> X1
#> 1 1
#>
#> $z
#> $z[[1]]
#> $z[[1]]$df
#> a b
#> 1 1 2
Edit: In order to also return data.frames present in slots of some S4-class, a possible way to extend the above call would be:
## define S4-class with a data.frame in "df" slot
userClass <- setClass("user", slots = c(df = "data.frame"))
v <- userClass(df = data.frame(user = 1))
rrapply(as.list(.GlobalEnv),
classes = c("data.frame", "user"),
f = function(x) {
if(class(x) == "user") {
slot(x, "df")
} else {
x
}
},
how = "flatten")
#> $v
#> user
#> 1 1
#>
#> $w
#> X1
#> 1 1
#>
#> $df
#> a b
#> 1 1 2
In this case, classes = c("data.frame", "user") will check for data.frames and S4-objects of class "user". The f function applied to the object, returns the object itself if it is a data.frame or the "df" slot if it is an S4-object.
Note that this code assumes that the S4-class name is known as well as the slot(s) which contain the data.frame objects.

tryCatch inside dplyr's mutate?

Is there any exception handeling mechanism in dplyr's mutate()? What I mean is a way to catch exceptions and handle them.
Let us suppose that I have a function that throws an error in some cases (in the example if the input is negative), for the sake of simplicity I define the function, but in real life it will be a function in some R package. Let us suppose this function is vectorized:
# function throwing an error
my_func <- function(x){
if(x > 0) return(sqrt(x))
stop('x must be positive')
}
my_func_vect <- Vectorize(my_func)
Now, let's suppose I want to use this function inside mutate().
If this function is used inside a mutate(), it stops at the first error and no result is returned:
library(dplyr)
# dummy data
data <- data.frame(x = c(1, -1, 4, 9))
data %>% mutate(y = my_func_vect(x))
# Error in mutate_impl(.data, dots) : Evaluation error: x must be positive.
Is there a way to catch the error, and do something (e.g. return an NA) in this case, while getting results for the other elements?
The result I expect is what would be achieved using a loop with tryCatch(), i.e. something along the lines of:
y <- rep(NA_real_, length(data$x))
for(i in seq_along(data$x)) {
tryCatch({
y[i] <- my_func_vect(data$x[i])
}, error = function(err){})
}
y
# Result is: 1 NA 2 4
We can also make use of purrr's safely() or possibly() functions.
From the purrr help:
safely: wrapped function instead returns a list with components result and error. One value is always NULL.
quietly: wrapped function instead returns a list with components result, output, messages and warnings.
possibly: wrapped function uses a default value (otherwise) whenever an error occurs.
It doesn't change the fact that you have to apply the function to each row separately.
library(dplyr)
library(purrr)
# function throwing an error
my_func <- function(x){
if(x > 0) return(sqrt(x))
stop('x must be positive')
}
my_func_vect <- Vectorize(my_func)
# dummy data
data <- data.frame(x = c(1, -1, 4, 9))
With map:
data %>%
mutate(y = map_dbl(x, ~possibly(my_func_vect, otherwise = NA_real_)(.x)))
#> x y
#> 1 1 1
#> 2 -1 NA
#> 3 4 2
#> 4 9 3
Using rowwise():
data %>%
rowwise() %>%
mutate(y = possibly(my_func_vect, otherwise = NA_real_)(x))
#> Source: local data frame [4 x 2]
#> Groups: <by row>
#>
#> # A tibble: 4 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 -1 NA
#> 3 4 2
#> 4 9 3
The others functions are somewhat more difficult to use and apply in a 'data-frame environment', as they are more suited to work with lists, and returns such.
Created on 2018-05-15 by the reprex package (v0.2.0).
You want to evaluate every occuring error individually, maybe you shouldn't use the vectorized function. Instead use map from the purrr package- which is effectively the same as lapply here.
Make a function to catch the error for standard use if you want NA values in the case you get an error.
try_my_func <- function(x) {
tryCatch(my_func(x), error = function(err){NA})
}
Then use mutate with map
data %>% mutate(y = purrr::map(x, try_my_func))
x y
1 1 1
2 -1 NA
3 4 2
4 9 3
Or similarly, if you don't want to declare a new function.
data %>% mutate(y = purrr::map(x, ~ tryCatch(my_func(.), error = function(err){NA})))
And lastly if you Do want to use a Vectorized function, you can skip the map function altogether. But personally I never use Vectorize so I'd do it with map.
data %>% mutate(y = Vectorize(try_my_func)(x))

Resources