r - How to expand data.frame over unused factor levels? - r

I need to do this for a list of dataframes that all have a common variable. I want to expand each dataframe so that they would have the common variable expanded to all of the levels present in all of the dataframes.
myList <- list(A = data.frame(A1 = rnorm(10), A2 = rnorm(10), A3 = rnorm(10),
year = factor(c(2000:2009))),
B = data.frame(B1 = rnorm(10), B2 = rnorm(10), B3 = rnorm(10),
year = factor(c(2001:2010))))
masterYear <- unique(unlist(lapply(myList, function(x) levels(x$year)), use.names = F))
I've thus far tried to use dplyr and tidyr packages in a function
funExpand <- function(x){
levels(x$year) <- c(levels(x$year), setdiff(masterYear, levels(x$year)))
vars <- names(x)[-length(names(x))]
x %>%
tidyr::complete_(x, c(vars), fill = list(0))
x
}
myList2 <- lapply(myList, funExpand)
But that yields an error. I've tried various combinations of tidyr::complete and tidyr::complete_ functions (first argument x or year?), all yielding some error. That tells me that I'm not interpreting the complete functions correctly.
Aside fixes for this error, I also welcome all suggestions for improving the process.

Updated to reflect comment by OP
Try this,
myList2 <- lapply(myList,
function(db) {
db$year <- factor(as.character(db$year), levels=masterYear)
merge(db, data.frame(year=setdiff(masterYear, db$year)), all=T)
})
The new rows will have NA, if you really need them to be 0 add another line db[is.na(db)] <- 0 in the function.

I guess you don't need x %>%
funExpand <- function(x) {levels(x$year) <- c(levels(x$year),
setdiff(masterYear, levels(x$year)))
vars <- names(x)[-length(names(x))]
complete_(x, vars, fill=list(0))}
lapply(myList, funExpand)

Related

How to add a list name as a column at a nested list data.frame in r

I am trying to add the list name that wraps the sublist with a dataset as a column of the last one to allow me call the rbind afterwards and join all the datasets. In other words, to join the datasets I need to keep the original list name for reference. I acomplished it with a for, but I wonder if I could use the purrr methods or apply functions instead because my data is far more complex than this below.
To ilustrate what I mean, I share a simple example:
x1 <- data.frame(a = 1:3, b = letters[1:3])
x2 <- data.frame(a = 4:6, b = letters[4:6])
y1 <- data.frame(a = 10:15, b = letters[10:15])
y2 <- data.frame(a = 13:17, b = letters[13:17])
l1 <- list(x1, y1)
l2 <- list(x2, y2)
l <- list(l1, l2)
nm <- c("list1", "list2")
names(l) <- nm
lmod <- l
for (i in 1:length(lengths(l))) {
for(j in 1:length(l[[i]])) {
lmod[[i]][[j]]$nm = names(l[i])
}
}
Using lapply I tryied something like:
lmod <- lapply(l, function(x) {
x[[1]] <- names(x)
x[[2]] <- names(x)
return(x)
})
But it did not work at all.
Does anyone has a clue on this one?
Map(lapply, l, list(cbind), nm=names(l))
This can also be written as:
Map(\(x, y)lapply(x, cbind, nm = y),l, names(l))
all.equal(lmod, Map(lapply, l, list(cbind), nm=names(l)))
[1] TRUE

Function to change all variables of factor type to lower case

I need to create a function in order to change all my factor variables to lower case.
I've already done that:
change_lower=function(x){if(is.factor(x)) tolower(x)}
But I think I'm doing something wrong, maybe the if isn't good for what I want. Any ideas?
You can use mutate_if if you want to automatically convert a large number of columns. Be sure to convert to character first (as #DanY pointed out):
library(dplyr)
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
df <- df %>% mutate_if(is.factor, function(x) tolower(as.character(x)))
In base R:
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
ind <- names(df)[sapply(df, is.factor)]
for (i in ind){
df[[i]] <- tolower(as.character(df[[i]]))
}
or
df[,ind] <- lapply(ind, function(x) tolower(as.character(df[[x]])))
# Input data:
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
# Convert factors to lowercase:
df <- lapply(df, function(x){if(is.factor(x)) as.factor(tolower(as.character(x))) else x})
# Proof:
str(df)

R: object y not found in function (x,y) [function to pass through data frames in r]

I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}

Least error prone way to add columns to an R data.frame through functions

The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)

rename mulitple datasets after applying a function in R

I am trying to apply a function to different dataframes. After doing that I want to get the resulting dataframe and save them keeping their original names and adding something else to differentiate the new dataframes.
This is what I've tried, which is obviously not working.
#Creating dummi data
N <- 8
df1 <- data.frame(x1 = rnorm(N), x2 = sample(1:10, size = N, replace = TRUE), x3 = 1*(runif(n = N) < .75))
df2 <- data.frame(y1 = rnorm(N), y2 = sample(100:200, size = N, replace = TRUE), y3 = runif(N))
df3 <- data.frame(z1 =rnorm(N), z2 = sample(8:80, size = N,replace = TRUE), Z3 = runif(N))
# Making a list of the three data frames
mydata <- list(df1=df1, df2=df2, df3= df3)
#Applying a function to mydata list
mydata2 <- lapply(mydata, function(x) mean(unlist(x)))
# Renaming each dataset
n <- 1:length(mydata2)
noms <- names(mydata2)
for (i in 1:n){
mynewlist <- lapply(mydata2, function(x) {names(x) <-("_mean", sep ="");
return(x))}
Please any help will be deeply apreciated.
We can use list2env if we need to create multiple objects in the global environment (though not recommended as most of the operations can be done within the list itself).
We change the names of the list by pasteing a suffix substring and then use list2env
list2env(setNames(mydata2, paste0(names(mydata2),
"_newname")), envir=.GlobalEnv)

Resources