I just started learning to write functions in R. As a start, I am trying to replicate the summary function as below. But not able to return expected result
summary_function <- function(df = as.data.frame(x)){
result <- summary(x)
return(as.table(result))
}
> summary_function(df = iris) ## below is the output I am getting
Length Class Mode
a 4 -none- numeric
b 10 -none- numeric
c 20 -none- numeric
d 100 -none- numeric
Expected output is actual summary of iris. Is there a way to achieve this?
I also tried with below function so that the output should return first 10 rows of the dataset. But the output is not returning as expected
first_ten_rows <- function(df = x){
result <- head(x, n = 10)
return(result)
}
the "as.table()" in your attempt was coercing the summary object into a less legible format
summary_function <- function(df = as.data.frame(x)){
result <- summary(df)
return(result)
}
R typically returns the last calculated line too, so if you want to shorten it:
summary_function <- function(df = as.data.frame(x)){
summary(df)
}
would get you the same result.
If you would like to read more about "summary" objects, running
?summary
will open more documentation in the help panel in RStudio
In your second question, there is an "x" out of place in "head()" which should be "df"
first_ten_rows <- function(df = x){
result <- head(df, n = 10)
return(result)
}
Related
I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))
I am trying to store in a column as a list the results of a function taking its input from a tibble.
I was thinking to use a vectorized version with mutate from tidyverse as follow:
test <- tibble(a=rep(1,10), b=seq(1,10))
return_list <- function(x){
return(c("a1" = x, "a2"= x + x))
}
return_list_v <- Vectorize(return_list)
test %>% mutate(results=return_list_v(a))
but I am getting the following error:
Erreur : Column `results` must be length 10 (the number of rows) or one, not 20
Is there something that could be done to make this example work?
Thank you in advance.
SIMPLIFY=FALSE keeps the output to be a list, instead of making a matrix.
test <- tibble(a=rep(1,10), b=seq(1,10))
return_list <- function(x){
return(c("a1" = x, "a2"= x + x))
}
return_list_v <- Vectorize(return_list, SIMPLIFY=FALSE)
test %>% mutate(results=return_list_v(a))
I am trying to write a code that would automatically calculate Wilcoxon test p-value for several comparisons.
Data used: 2 data sets with the same information representing two groups of participants completed the same 5 tasks which means that the each table contains 5 columns (tasks) and X rows with tasks scores.
data_17_18_G2 # first data set (in data.table format)
data_18_20_G2 # second data set (in data.table format)
Both data sets have identical names of column which are to be used in the W-test the next way:
wilcox.test(Group1Task1, Group2Task1, paired = F)
wilcox.test(Group1Task2, Group2Task2, paired = F)
and so on.
The inputs (e.g., Grou1Task1) are two vectors of task scores (the first one will be from data_17_18_G2 and the other one from data_18_20_G2
Desired output: a data table with a column of p-values
The problem I faced is that no matter how I manipulated the val1 and val2 empty objects, in the second and the third lines the right size "as.numeric(unlist(data_17_18_G2[, ..i]))" gives a correct output (a numeric vector) but it's left size "val1[i]" always returns only one value from the vector. That gave me the idea that the main problem appeared on the step of creating an empty vector, however, I wasn't able to solve it.
Empty objects:
result <- data.table(matrix(ncol=2))
val1 <- as.numeric() # here I also tried functions "numeric" and "vector"
val2 <- as.numeric()
res <- vector(mode = "list", length = 7)
For loop
for (i in 1:5) {
val1[i] <- as.numeric(unlist(data_17_18_G2[ , ..i]))
val2[i] <- as.numeric(unlist(data_18_20_G2[ , ..i]))
res[i] <- wilcox.test(val1[i], val2[i], paired = F)
result[i, 1] <- i
result[i, 2] <- res$p.value
}
Output:
Error in `[<-.data.table`(`*tmp*`, i, 2, value = NULL) :
When deleting columns, i should not be provided
1: В val1[i] <- as.numeric(unlist(data_17_18_G2[, ..i])) :
number of items to replace is not a multiple of replacement length
2: В val2[i] <- as.numeric(unlist(data_18_20_G2[, ..i])) :
number of items to replace is not a multiple of replacement length
3: В res[i] <- wilcox.test(val1[i], val2[i], paired = F) :
number of items to replace is not a multiple of replacement length
Alternative:
I changed the second and the third lines
for (i in 1:5) {
val1[i] <- as.numeric(data_17_18_G2[ , ..i])
val2[i] <- as.numeric(data_18_20_G2[ , ..i])
res[i] <- wilcox.test(val1[i], val2[i], paired = F)
result[i, 1] <- i
result[i, 2] <- res$p.value
}
And got this
Error in as.numeric(data_17_18_G2[, ..i]) :
(list) object cannot be coerced to type 'double'
which means that the function wilcox.test cannot interpret this type of input.
How can I improve the code so that I get a data table of p-values?
There would appear to be some bugs in the code. I have rewritten the code using the cars dataset as a example.
## use the cars dataset as a example (change with appropriate data)
data(cars)
data_17_18_G2 <- as.data.table(cars)
data_18_20_G2 <- data_17_18_G2[,2:1]
## Fixed code
result <- data.table(matrix(as.numeric(), nrow=ncol(data_17_18_G2), ncol=2))
val1 <- as.numeric()
val2 <- as.numeric()
res <- vector(mode = "list", length = 7)
for (i in 1:ncol(data_17_18_G2)) {
val1 <- as.numeric(unlist(data_17_18_G2[ , ..i]))
val2 <- as.numeric(unlist(data_18_20_G2[ , ..i]))
res[[i]] <- wilcox.test(val1, val2, paired = F)
result[i, 1] <- as.numeric(i)
result[i, 2] <- as.numeric(res[[i]]$p.value)
}
Hope this gives you the output you are after.
I am trying to build a function that takes a numeric vector of homework scores (of length n), and an optional logical argument drop, to compute a single homework value. If drop = TRUE, the lowest HW score must be dropped.
step1 function to get average
get_average <- function(x,na.rm=TRUE) {
if(na.rm==TRUE){
x = remove_missing(x)}
total <- 0
for (n in 1:length(x)) {
total= total + x[n]
}
return(total/length(x))
}
put it all together
score_homework <- function(x,drop=TRUE)
{
if(drop==TRUE)
x = drop_lowest(x)
{get_average(x)}}
However I keep getting the error Error in score_homework() : argument "x" is missing, with no default
I'm not sure this is what you want, but here goes.
First generate some dummy data:
# Set seed
set.seed(1234)
# Generate dummy homework data with <NA> values
homework <- c(rep(NA, 20), rnorm(n = 100, mean = 50, sd = 10))
# Have a quick look
hist(homework)
Then we write the function:
# Make function
homework_func <- function(data, drop = TRUE) {
# Remove NA
data <- data[!is.na(data)]
# Calculate the average depending on whether 'drop' is T or F
if(drop == TRUE) {
data <- data[data > min(data)]
mean(data)
} else {
mean(data)
}
}
# Use function with 'drop = TRUE'
homework_func(data = homework, drop = TRUE)
#> [1] 48.65349
# Use function with 'drop = FALSE'
homework_func(data = homework, drop = FALSE)
#> [1] 48.43238
Here is a function to eliminate the lowest missing score that's less complicated than the version in the original post. I sort the scores in descending order in case the there is a tie for the lowest score. In that case, we should only remove one instance of the lowest score. Also, you're really better off using R's mean() function than writing your own.
scores <- c(78,93,61,NA,61,83,92,95,NA,100)
removeMinScore <- function(x) {
x <- x[order(-x)] # order descending
x <- x[!is.na(x)] # remove NAs
x[1:length(x)-1] # return all but lowest score, removes only 1 tied value
}
That said, if you must write your own version of mean(), here is a simpler approach that takes advantage of existing R functions.
TIP: Since is.na() returns a vector of TRUE and FALSE values, you can sum these to count the number of non-missing values in a vector.
mymean <- function(x) {sum(x, na.rm=TRUE) / sum(!is.na(x))}
The results look like this.
The modified version of score_homework() would be:
score_homework <- function(x,drop=TRUE){
if(drop == TRUE) return mean(removeMinScore(x),na.rm=TRUE)
else mean(x,na.rm=TRUE)
}
The results from testing the function are as follows.
I'm trying to create empty numeric object like this
corr <- cor()
to use it later on in a loop.
but, it keep returning this error
Error in is.data.frame(x) : argument "x" is missing, with no default.
Here is my full script:
EVI <- "D:\\Modis_EVI\\Original\\EVI_Stack_single5000.tif"
y.EVI <- brick(EVI)
m.EVI.cropped <- as.matrix(y.EVI)
time <- 1:nlayers(y.EVI)
corr <- cor()
inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
for (i in 1:nrow(m.EVI.cropped)){
EVI.m <- m.EVI.cropped[i,]
time <- 1:nlayers(y.EVI)
Corr[i] <- cor(EVI.m, time, method="pearson", use="pairwise.complete.obs")
}
Any advice please?
Since you are asking for advice:
It is very likely that you don't need to do this since you can probably use (i) a vectorized function or (ii) a lapply loop that pre-allocates the return object for you. If you insist on using a for loop, set it up properly. This means you should pre-allocate which you can, e.g., do by using corr <- numeric(n), where n is the number of iterations. Appending to a vector is extremely slooooooow.
We can create empty objects with numeric(0), logical(0), character(0) etc.
For example
num_vec <- numeric(0)
creates an empty numeric vector that can be filled up later on:
num_vec[1] <- 2
num_vec
# [1] 2
num_vec[2] <- 1
num_vec
# [1] 2 1