Create new columns using mutate and across beside existing columns in R - r

I have the following sample data where I am trying to the new columns to be directly beside the existing columns and not at the end of the data frame. I do not want to use sort as I need to keep the order.
library (dplyr)
df <- data.frame(data_in= 2:10, #Data frame
data_ft= 3:11,
data_mile= 4:12)
df`
data_in data_ft data_mile
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
8 9 10 11
9 10 11 12
df_new<- df%>%
mutate(across(contains("in"), #Why this does not work?
~cbind(.x * 25.4),
.names = "{sub('in', 'mm', col)}")) # ETC
How can I let the new columns be directly beside the existing columns they came from and NOT at the end of the data frame? Also I do NOT want to use sort as I have many columns and need to maintain the order of the data frame.
I'm new to R so please bare with me.
I have tried using add_column as well.
I'm expecting the data frame to look like this:
data_in data_mm data_ft data_cm data_mile data_km
1 25.4 2 60.96 4 6.4
This gets the desired order:
df <- data.frame(data_in= 2:10, #Data framedata_ft= 3:11,data_mile= 4:12)
df<- df %>%mutate(across(everything(), ~ cbind(., .*2))) #Works like this
df
But when I used contains() and names() in the first code above, the outcome is different.

You can specify the position by setting the .after argument in mutate. See documentation here: https://dplyr.tidyverse.org/reference/mutate.html
df_new <- df %>%
mutate(across(contains("in"),
~ .x * 25.4,
.names = "{sub('in', 'mm', col)}"), .after = 1)
This will produce the desired output:
> df_new
data_in data_mm data_ft data_mile
1 2 50.8 3 4
2 3 76.2 4 5
3 4 101.6 5 6
4 5 127.0 6 7
5 6 152.4 7 8
6 7 177.8 8 9
7 8 203.2 9 10
8 9 228.6 10 11
9 10 254.0 11 12

Related

How to use a fulljoin on my dataframes and rename columns with the same name R

I have two dataframes and they both have the exact same column names, however the data in the columns is different in each dataframe. I am trying to join the two frames (as seen below) by a full join. However, the hard part for me is the fact that I have to rename the columns so that the columns corresponding to my one dataset have some text added to the end while adding different text to the end of the columns that correspond to the second data set.
combined_df <- full_join(any.drinking, binge.drinking, by = ?)
A look at one of my df's:
Without custom function and shorter:
df <- cbind(cars, cars)
colnames(df) <- c(paste0(colnames(cars), "_any"), paste0(colnames(cars), "_binge"))
Output:
> head(df)
speed_any dist_any speed_binge dist_binge
1 4 2 4 2
2 4 10 4 10
3 7 4 7 4
4 7 22 7 22
5 8 16 8 16
6 9 10 9 10
Certainly not the most elegant way but maybe it is what you want:
custom_bind <- function(df1, suffix1, df2, suffix2){
colnames(df1) <- paste(colnames(df1), suffix1, sep = "_")
colnames(df2) <- paste(colnames(df2), suffix2, sep = "_")
df <- cbind(df1, df2)
return(df)
}
custom_bind(cars, "any", cars, "binge")
I made it as a function in case you want to do it with other tables. If not then it is not necessary.
Output:
> head(custom_bind(cars, "any", cars, "binge"))
speed_any dist_any speed_binge dist_binge
1 4 2 4 2
2 4 10 4 10
3 7 4 7 4
4 7 22 7 22
5 8 16 8 16
6 9 10 9 10

Moving down columns in data frames in R

Suppose I have the next data frame:
df<-data.frame(step1=c(1,2,3,4),step2=c(5,6,7,8),step3=c(9,10,11,12),step4=c(13,14,15,16))
step1 step2 step3 step4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
and what I have to do is something like the following:
df2<-data.frame(col1=c(1,2,3,4,5,6,7,8,9,10,11,12),col2=c(5,6,7,8,9,10,11,12,13,14,15,16))
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
How can I do that? consider that more steps can be included (example, 20 steps).
Thanks!!
We can design a function to achieve this task. df_final is the final output. Notice that bin is an argument that the users can specify how many columns to transform together.
# A function to conduct data transformation
trans_fun <- function(df, bin = 3){
# Calculate the number of new columns
new_ncol <- (ncol(df) - bin) + 1
# Create a list to store all data frames
df_list <- lapply(1:new_ncol, function(num){
return(df[, num:(num + bin - 1)])
})
# Convert each data frame to a vector
dt_list2 <- lapply(df_list, unlist)
# Convert dt_list2 to data frame
df_final <- as.data.frame(dt_list2)
# Set the column and row names of df_final
colnames(df_final) <- paste0("col", 1:new_ncol)
rownames(df_final) <- 1:nrow(df_final)
return(df_final)
}
# Apply the trans_fun
df_final <- trans_fun(df)
df_final
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
Here is a method using dplyr and reshape2 - this assumes all of the columns are the same length.
library(dplyr)
library(reshape2)
Drop the last column from the dataframe
df[,1:ncol(df)-1]%>%
melt() %>%
dplyr::select(col1=value) -> col1
Drop the first column from the dataframe
df %>%
dplyr::select(-step1) %>%
melt() %>%
dplyr::select(col2=value) -> col2
Combine the dataframes
bind_cols(col1, col2)
This should do the work:
df2 <- data.frame(col1 = 1:(length(df$step1) + length(df$step2)))
df2$col1 <- c(df$step1, df$step2, df$step3)
df2$col2 <- c(df$step2, df$step3, df$step4)
Things to point:
The important thing to see in the first line of the code, is the need for creating a table with the right amount of rows
Calling a columns that does not exist will create one, with that name
Deleting columns in R should be done like this df2$col <- NULL
Are you not just looking to do:
df2 <- data.frame(col1 = unlist(df[,-nrow(df)]),
col2 = unlist(df[,-1]))
rownames(df2) <- NULL
df2
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16

Split a dataset into a list of dataframes with equal number of columns

I have a data set with 36 columns and single observation. I want to split it into a list with each dataframe having 3 columns and then rbind them into a single data frame.
I have been using the following code:
m=12
nc<-ncol(df)
df1<-lapply(split(as.list(df), cut(1:nc, m, labels = FALSE)), as.data.frame)
df1<-do.call("rbind",df1)
This code is working. But the problem comes when I try to run this code in shiny app.
Can someone suggest a replacement for above code
We can split the one row dataframe by generating a specific sequence
do.call("rbind", split(c(t(df)), rep(seq(1, ncol(df)/3), each = 3)))
where
rep(seq(1, ncol(df)/3), each = 3)
would generate
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8
9 9 9 10 10 10 11 11 11 12 12 12

R Looking up closest value in data.frame less than equal to another value

I have two data.frames, lookup_df and values_df. For each row in lookup_df I want to lookup the closest value in the values_df that is less than or equal to an index value.
Here's my code so far:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
What I'm wanting for the result_df is the following:
> result_df
ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8
I know how to do this with SQL fairly easily but I'm curious if there is an R way that is straightforward. I could iterate the the rows of the lookup_df and then loop through the rows of the values_df but that is not computationally efficient. I'm open to using dplyr library if someone knows how to use that to solve the problem.
If values_df is sorted by idx ascending, then findInterval will work:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
lookup_df$values <- values_df$values[findInterval(lookup_df$ids,values_df$idx)]
lookup_df
> ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8

Excel OFFSET function in r

I am trying to simulate the OFFSET function from Excel. I understand that this can be done for a single value but I would like to return a range. I'd like to return a group of values with an offset of 1 and a group size of 2. For example, on row 4, I would like to have a group with values of column a, rows 3 & 2. Sorry but I am stumped.
Is it possible to add this result to the data frame as another column using cbind or similar? Alternatively, could I use this in a vectorized function so I could sum or mean the result?
Mockup Example:
> df <- data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> #PROCESS
> df
a b
1 1 NA
2 2 (1)
3 3 (1,2)
4 4 (2,3)
5 5 (3,4)
6 6 (4,5)
7 7 (5,6)
8 8 (6,7)
9 9 (7,8)
10 10 (8,9)
This should do the trick:
df$b1 <- c(rep(NA, 1), head(df$a, -1))
df$b2 <- c(rep(NA, 2), head(df$a, -2))
Note that the result will have to live in two columns, as columns in data frames only support simple data types. (Unless you want to resort to complex numbers.) head with a negative argument cuts the negated value of the argument from the tail, try head(1:10, -2). rep is repetition, c is concatenation. The <- assignment adds a new column if it's not there yet.
What Excel calls OFFSET is sometimes also referred to as lag.
EDIT: Following Greg Snow's comment, here's a version that's more elegant, but also more difficult to understand:
df <- cbind(df, as.data.frame((embed(c(NA, NA, df$a), 3))[,c(3,2)]))
Try it component by component to see how it works.
Do you want something like this?
> df <- data.frame(a=1:10)
> b=t(sapply(1:10, function(i) c(df$a[(i+2)%%10+1], df$a[(i+4)%%10+1])))
> s = sapply(1:10, function(i) sum(b[i,]))
> df = data.frame(df, b, s)
> df
a X1 X2 s
1 1 4 6 10
2 2 5 7 12
3 3 6 8 14
4 4 7 9 16
5 5 8 10 18
6 6 9 1 10
7 7 10 2 12
8 8 1 3 4
9 9 2 4 6
10 10 3 5 8

Resources