I have 40 different data.frames with no systems in there names, like: nat69eqte, nahi_il and nahc_cpwtre. I want to create a function/macro in R which can proceed the following code in a easy way for all the data frames:
nat69eqte_wide <- spread(nat69eqte, key = time, value = values)
attr(nat69eqte_wide, "Symname") <- "nat69eqte"
lst_nat69eqte_wide <- wgdx.reshape(nat69eqte_wide, 2)
In each data.frame there are the columns time and values to be passed to spread.
Without any knowledge of your data, I can try to guess:
the following function assumes that in each data.frame there are the columns time and values to be passed to spread. If it is not the case you can add these columns as arguments of the function.
NB spread is now deprecated, instead, use pivot_wider
myfun <- function(df){
name <- as_label(enquo(df))
df_wide <- spread(df, key = time, value = values)
attr(df_wide, "Symname") <- name
lst_df_wide <- wgdx.reshape(df_wide, 2)
return(lst_df_wide )
}
Related
I am only a few days old in the R ecosystem and trying to figure out a way to add dynamic column for each numeric column found in the original dataframe.
I have succesfully written a way to change the value in the existing column in the dataframe but what I need is to put those calculated values into a new column rather than overwriting the existing one.
Here is what I've done so far,
myDf <- read.csv("MyData.csv",header = TRUE)
normalize <- function(x) {
return ((x - min(x,na.rm = TRUE)) / (max(x,na.rm = TRUE) - min(x,na.rm = TRUE)))
}
normalizeAllCols <- function(df){
df[,sapply(x, is.numeric)] <- lapply(df[,sapply(df, is.numeric)], normalize)
df
}
normalizedDf<-normalizeAllCols(myDf)
I came with above snippet (with a lot of help from the internet) to apply normalize function to all numeric columns in the given data frame. I want to know how to put those calculated values into a new column in the data frame. (in the given snippet I'd like to know how to put normalized value in a new column like "norm" + colname ).
You can find the column names which are numeric and use paste0 create new columns.
normalizeAllCols <- function(df){
cols <- names(df)[sapply(df, is.numeric)]
df[paste0('norm_', cols)] <- lapply(df[cols], normalize)
df
}
normalizedDf<-normalizeAllCols(myDf)
In dplyr you can use across to apply a function to only numeric columns directly.
library(dplyr)
normalizeAllCols <- function(df){
df %>%
mutate(across(where(is.numeric), list(norm = ~normalize)))
}
I have a quite big data.frame with non updated names and I want to get the correct names that are stored in another data.frame.
I am using stringdist function to find the closest match between the two columns and then I want to put the new names in the original data.frame.
I am using a code based on sapply function, as in the following example :
dat1 <- data.frame("name" = paste0("abc", seq(1:5)),
"value" = round(rnorm(5), 1))
dat2 <- data.frame("name" = paste0("abd", seq(1:5)),
"other_info" = seq(11:15))
dat1$name2 <- sapply(dat1$name,
function(x){
char_min <- stringdist::stringdist(x, dat2$name)
dat2[which.min(char_min), "name"]
})
dat1
However, this code is too slow considering the size of my data.frame.
Is there a more optimized alternative solution, using for example data.table R package?
First convert the data frames into data tables:
dat1 <- data.table(dat1)
dat2 <- data.table(dat2)
Then use the ":=" and "amatch" command to create a new column that approximately matches the two names:
dat1[,name2 := dat2[stringdist::amatch(name, dat2$name)]$name]
This should be much faster than the sapply function. Hope this helps!
I am curious that why the following code doesn't work for adding column data to a data frame.
a <- c(1:3)
b <- c(4:6)
df <- data.frame(a,b) # create a data frame example
add <- function(df, vector){
df[[3]] <- vector
} # create a function to add column data to a data frame
d <- c(7:9) # a new vector to be added to the data frame
add(df,d) # execute the function
If you run the code in R, the new vector doesn't add to the data frame and no error also.
R passes parameters to functions by value - not by reference - that means inside the function you work on a copy of the data.frame df and when returning from the function the modified data.frame "dies" and the original data.frame outside the function is still unchanged.
This is why #RichScriven proposed to store the return value of your function in the data.frame df again.
Credits go to #RichScriven please...
PS: You should use cbind ("column bind") to extend your data.frame independently of how many columns already exist and ensure unique column names:
add <- function(df, vector){
res <- cbind(df, vector)
names(res) <- make.names(names(res), unique = T)
res # return value
}
PS2: You could use a data.table instead of a data.frame which is passed by reference (not by value).
I would like to build a function that adds many columns of random variables or other function to a a dataframe. Here I am trying to append it to map data.
library(plyr)
add <- function(name, df){
new.df = mutate(df, name = runif(length(df[,1])))
new.df
}
The function works to add a column of data...
add("e", iris)
iris2<- add("f", iris)
The apply does not work...
I am trying to add 26 columns from the list of letters so that df$a, df$b, df$c are all random vectors.
new <- lapply(letters, add, df = tx)
What is the most efficient way to columns from a list of col names?
I would like to later loop through all of the column names in another function.
It's not very clear to me, what you want to achieve. This adds multiple columns of random numbers to a data.frame:
cbind(iris,
matrix(runif(nrow(iris)*5), ncol=5))
I don't see a reason to use an *apply function.
I'd like to learn how to apply functions on specific columns of my dataframe without "excluding" the other columns from my df. For example i'd like to multiply some specific columns by 1000 and leave the other ones as they are.
Using the sapply function for example like this:
a<-as.data.frame(sapply(table.xy[,1], function(x){x*1000}))
I get new dataframes with the first column multiplied by 1000 but without the other columns that I didn't use in the operation. So my attempt was to do it like this:
a<-as.data.frame(sapply(table.xy, function(x) if (colnames=="columnA") {x/1000} else {x}))
but this one didn't work.
My workaround was to give both dataframes another row with IDs and later on merge the old dataframe with the newly created to get a complete one. But I think there must be a better solution. Isn't it?
If you only want to do a computation on one or a few columns you can use transform or simply do index it manually:
# with transfrom:
df <- data.frame(A = 1:10, B = 1:10)
df <- transform(df, A = A*1000)
# Manually:
df <- data.frame(A = 1:10, B = 1:10)
df$A <- df$A * 1000
The following code will apply the desired function to the only the columns you specify.
I'll create a simple data frame as a reproducible example.
(df <- data.frame(x = 1, y = 1:10, z=11:20))
(df <- cbind(df[1], apply(df[2:3],2, function(x){x*1000})))
Basically, use cbind() to select the columns you don't want the function to run on, then use apply() with desired functions on the target columns.
In dplyr we would use mutate_at in which you can select or exclude (by preceding variable name with "-" minus sign) specific variables.
You can just name a function
df <- df %>%
mutate_at(vars(columnA), scale)
or create your own
df <- df %>%
mutate_at(vars(columnA, columnC), function(x) {do this})