function applied to dataset R - r

Below are two dataframes labeled as 'A' and 'C'. I have created a function that would take the top 5 rows for dataframe and want the same applied to dataframe C. However, it only replicates it for A. How would I have this function be applied for C only. Thanks!
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
## The "same" with automatic column names:
A<-data.frame(1, 1:10, sample(L3, 10, replace = TRUE))
L3 <- LETTERS[7:9]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
## The "same" with automatic column names:
C<-data.frame(1, 1:10, sample(L3, 10, replace = TRUE))
function_y<-function(Data_Analysis_Task) {
sample2<-head(A, 5)
return(sample2)
}
D<-function_y(C)

We need to have the same argument passed inside the function as well
function_y <- function(Data_Analysis_Task) {
head(Data_Analysis_Task, 5)
}
D <- function_y(C)
If we use head(A, 5), inside the function, it looks for the object 'A', inside the function env first, then if it doesn't find, looks at the parent env, and so on until it finds the object 'A' in the global env. So, it would return the same output of head of 'A' every time the function is called

Related

How to merge dataframe lists of unequal length?

This question is similar to Joining dataframes from lists of unequal length.
I have a shiny script where I am using fileImport to allow the user to import a variable number of data files. Each datafile is then split into a list of dataframes, and these are imported as a list. So I have a list of a list of dataframes.
The input datafiles have two format possibilities, one may be 129 dataframes long, the other may be 67 - where the 67 is actually a subset of the 129 (so all 67 are present in the 129, but not all 129 are present in the 67). I am then trying to rbind the dataframes by name.
A reproducible example:
# Some data
df.l1 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l2 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l3 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)),
df3 = data.frame(A = LETTERS[1:10],
B = rnorm(10, 15, 2)))
This works when binding lists of equal length (e.g. df.l1 and df.l2)
df.two <- list(df.l1, df.l2)
list.merged <- do.call(function(...) Map(rbind, ...), df.two)
But fails when binding list of dataframes with variables lengths.
df.three <- list(df.l1, df.l2, df.l3)
list.merged <- do.call(function(...) Map(rbind, ...), df.three)
Giving the error:
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
As I said above, similar questions have been asked, but this situation is unique given the variable number of lists I am trying to merge. Help is greatly appreciated!
For a robust handling of this I would use dplyr::bind_rows or data.table::rbindlist. First you bind each list, then you bind at the upper level:
tidyverse version:
library(dplyr)
bind_rows(lapply(df.three, bind_rows))
data.table version:
library(data.table)
rbindlist(lapply(df.three, rbindlist))
Not only will this handle weird corner cases you don't expect, but it will also be much faster than do.call.
edit in response to comment
Try this:
library(purrr)
library(dplyr)
df_names <- unique(unlist(sapply(df.three, names)))
result <- list()
for (n in df_names) {
result[[n]] <- map(df.three, n)
}
map(result, dplyr::bind_rows)

Is it possible to loop over multiple objects and call to elements within each object in the loop

In R I would like to loop over a set of three functions, with the output requiring saving from each function with a name related to the input. This works when applied to one file but I would like to loop over 300+ objects and the function requires specifying elements within the object.
I attempted to create lists of the objects and output names and looping over it with a for loop for a single function (a.ppp) and received an error "Error in i[["X"]] : subscript out of bounds". I am very new to for loops and have limited coding background and am unsure if the loop structure i have created is correct. I have tried multiple options including looping over a dataframe or nesting loops based on some other stack overflow questions.
Some toy data, representing my setup. I have dataframes eg. a-g
a <- data.frame(X = c(1, 2, 3),
Y = c(3,2,1),
Z = c(4,5,6),
M = c('A', 'B', 'C'))
I would like to loop over the following three functions.
library(spatstat)
a.ppp = ppp(a$X,a$Y,c(0,3),c(0,3),marks = a$M)
a.nnd = nndist(a.ppp,by=a.ppp$marks)
a.append = cbind(a,a.nnd)
My Attempt has included
listObj = c("a","b","c","d","e","f","g")
list.ppp = c("a.ppp","b.ppp","c.ppp","d.ppp","e.ppp","f.ppp","g.ppp")
for (i in listObj) {
for (j in list.ppp) {
j=ppp(i[["X"]],i[["Y"]],c(0,12),c(0,12),marks=i[["M"]])
}
}
I recieved the error:
#Error in i[["X"]] : subscript out of bounds
My Expected results would be a .ppp and .append output for a to g
Just Thought I'd Follow up, Based on the extremely helpful comment from Joran. I have figured the issue out through a modification of his provided code. The code I used was as follows
library(spatstat)
a <- data.frame(X = c(1, 2, 3),
Y = c(3,2,1),
Z = c(4,5,6),
M = c('A', 'B', 'C'))
#Create a list of all the vectors in the environment - Not an ideal method but
suitable for the case
dfs= mget(ls())
#Create empty lists to be populated during the loop
dfs_ppp = list()
dfs_nnd = list()
dfs_final= list()
for (i in seq_along(dfs)){
dfs_ppp[[i]] <- ppp(dfs[[i]]$X,dfs[[i]]$Y,c(-1,14),c(-1,14),marks = dfs[[i]]$M)
dfs_nnd[[i]] = nndist(dfs_ppp[[i]],by=dfs_ppp[[i]]$marks)
dfs_final[[i]] = cbind(dfs[[i]],dfs_nnd[[i]])
}
Try something more like this:
library(spatstat)
a <- data.frame(X = c(1, 2, 3),
Y = c(3,2,1),
Z = c(4,5,6),
M = c('A', 'B', 'C'))
# Put your data frames (a, b, c, etc.) in a list
dfs <- list(x = a,b = a,z = a)
for (i in seq_along(dfs)){
ppp_obj <- ppp(dfs[[i]]$X,dfs[[i]]$Y,c(0,3),c(0,3),marks = dfs[[i]]$M)
nnd_obj = nndist(ppp_obj,by=ppp_obj$marks)
dfs[[i]]$nnd <- nnd_obj
}

How can I delete values by column in a data frame?

I need to take abundance values by column without zeros, by this reason I used an empty list and a loop (for loop). When I delete [i] in the first line of my loop I get the desired result only in the column of total values (sum by an object), but in the way in which I learn to write them, I only obtain an undesired result.
set.seed(1000)
df <- data.frame(Category = sample(LETTERS[1:10]),
Object = sample(letters[1:10]),
A = sample(0:20, 10, rep = TRUE),
B = sample(0:20, 10, rep = TRUE),
C = sample(0:20, 10, rep = TRUE))
sincero <- list()
for (i in colnames(df[ , 3:5])){
sincero[i] = df[df[ , i] != 0, ]
sincero
}
sincero

R: object y not found in function (x,y) [function to pass through data frames in r]

I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}

Return col name for unique value in data frame

I need to know how can I return the column name for a unique value in a data frame
Like that example:
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
This command just return to me NULL
colnames(d[5,2])
but the result should be "y"
How can I fix this?
You have to index a vector that contains the colnames,
try colnames(d)[2]
You should use either colnames(d[2]) or colnames(d)[2] to get the column names.

Resources