Rename multiple columns with series index using dplyr in R - r

My data frame looks like this
X0 <- c(11,2,3,4)
X1 <- c(10,2,3,4)
X2 <- c(8,2,3,4)
X3 <- c(4,6,3,4)
test <- data.frame(X0,X1,X2,X3)
X0 X1 X2 X3
1 11 10 8 4
2 2 2 2 6
3 3 3 3 3
4 4 4 4 4
I would like to rename the first three columns using the character "t" and the series from 1:3.
I want my data frame to look like this
t0 t1 t2 X3
1 11 10 8 4
2 2 2 2 6
3 3 3 3 3
4 4 4 4 4
EDIT
It works like this
test %>%
rename_at(vars(X0:X2), list(~paste0("t", 0:2)))

Or using rename_with
library(dplyr)
library(stringr)
test %>%
rename_with(~ str_c('t', 0:2), X0:X2)

Here is a data.table option with setnames
setnames(setDT(test),1:3,function(v) gsub("X","t",v))

Related

Add ID column to a list of data frames

I have a list of 142 dataframes file_content and a list from id_list <- list(as.character(1:length(file_content)))
I am trying to add a new column period to each data frame in file_content.
All data frames are similar to 2021-03-16 below.
`2021-03-16` <- file_content[[1]] # take a look at 1/142 dataframes in file_content
head(`2021-03-16`)
author_id created_at id tweet
1 3.304380e+09 2018-12-01 22:58:55+00:00 1.069003e+18 #Acosta I hope he didn’t really say “muckâ€\u009d.
2 5.291559e+08 2018-12-01 22:57:31+00:00 1.069003e+18 #Acosta I like Mattis, but why does he only speak this way when Individual-1 isn't around?
3 2.195313e+09 2018-12-01 22:56:41+00:00 1.069002e+18 #Acosta What did Mattis say about the informal conversation between Trump and Putin at the G20?
4 3.704188e+07 2018-12-01 22:56:41+00:00 1.069002e+18 #Acosta Good! Tree huggers be damned!
5 1.068995e+18 2018-12-01 22:56:11+00:00 1.069002e+18 #Acosta #NinerMBA_01
6 9.983321e+17 2018-12-01 22:55:13+00:00 1.069002e+18 #Acosta Really?
I have tried to add the period column using the following code but it adds all 142 values from the id_list to every row in every data frame in file_content.
for (id in length(id_list)) {
file_content <- lapply(file_content, function(x) { x$period <- paste(id_list[id], sep = "_"); x })
}
You were close, the mistake is you need double brackets in id_list[[id]].
for (id in length(id_list)) {
file_content <- lapply(file_content, function(x) {
x$period <- paste(id_list[[id]], sep = "_")
x
})
}
# $`1`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`2`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`3`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
You could also try Map() and save a few lines.
Map(`[<-`, file_content, 'period', value=id_list)
# $`1`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`2`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`3`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
Data:
file_content <- replicate(3, data.frame(matrix(1:12, 3, 4)), simplify=F) |> setNames(1:3)
id_list <- list(as.character(1:length(file_content)))
We may use imap
library(purrr)
library(dplyr)
imap(file_content, ~ .x %>%
mutate(period = .y))
Or with Map from base R
Map(cbind, file_content, period = names(file_content))
In the OP's code, the id_list is created as a single list element by wrapping with list i.e.
list(1:5)
vs
as.list(1:5)
Here, we don't need to convert to list as a vector is enough
id_list <- seq_along(file_content)
Also, the for loop is looping on a single element i.e. the last element with length
for (id in length(id_list)) {
^^
instead, it would be 1:length. In addition, the assignment should be on the single list element file_content[[id]] and not on the entire list
for(id in seq_along(id_list)) {
file_content[[id]]$period <- id_list[id]
}

variable names in for loop

x_names <-c("x1","x2","x3")
data <- c(1,2,3,4)
fake <- c(2,3,4,5)
for (i in x_names)
{
x = fake
data = as.data.frame(cbind(data,x))
#data <- data %>% rename(x_names = x)
}
I made a toy example. This code will generate a data frame with 1 column called data, and 3 columns called x. Instead of calling the columns x, I want them with the name x1, x2, x3 (stored in x_names). I put the x_name in the code (comment out), but it does not work. Could you help me with it?
We can also use map_dfc from tidyverse:
library(tidyverse)
cbind(data, map_dfc(x_names, ~ tibble(!!.x := fake)))
Output:
data x1 x2 x3
1 1 2 2 2
2 2 3 3 3
3 3 4 4 4
4 4 5 5 5
We can avoid the for loop and use replicate to repeat fake data using setNames to name the dataframe with x_names.
cbind(data, setNames(data.frame(replicate(length(x_names), fake)), x_names))
# data x1 x2 x3
#1 1 2 2 2
#2 2 3 3 3
#3 3 4 4 4
#4 4 5 5 5
Ideally one should avoid growing objects in a loop, however one way to solve OP's problem in loop is
for (i in seq_along(x_names)) {
data = cbind.data.frame(data, fake)
names(data)[i + 1] <- x_names[i]
}
An option is just to assign the 'fake' to create the new columns in base R
data[x_names] <- fake
data
# data x1 x2 x3
#1 1 2 2 2
#2 2 3 3 3
#3 3 4 4 4
#4 4 5 5 5
EDIT: Based on comments from #avid_useR
data
data <- data.frame(data)
When you exchange your out-commented line
#data <- data %>% rename(x_names = x)
with
colnames(data)[ncol(data)] <- i
it should set the right colnames.

How to split a list and save objects individually?

I am trying to add a new column to multiple data frames, and then replace the original data frame with the new one. This is how I am creating the new data frames:
df1 <- data.frame(X1=c(1,2,3),X2=c(1,2,3))
df2 <- data.frame(X1=c(4,5,6),X2=c(4,5,6))
groups <- list(df1,df2)
groups <- lapply(groups,function(x) cbind(x,X3=x[,1]+x[,2]))
groups
[[1]]
X1 X2 X3
1 1 1 2
2 2 2 4
3 3 3 6
[[2]]
X1 X2 X3
1 4 4 8
2 5 5 10
3 6 6 12
I'm satisfied with how the new data frames have been created. What I'm stuck on is then breaking up my groups list and then saving the list elements back into their respective original data frames.
Desired Output
Essentially, I want to do something like df1,df2 <- groups[[1]],groups[[2]] but that is of course not syntatically valid. I have more than 2 data frames, which is why I'm hoping for a more programmatic approach than simply typing out N lines of code.
for (i in 1:length(groups)){
assign(paste("df",i,sep=""),as.data.frame(groups[[i]]))
}
should do it. Try it out, please.
#Rockbar led me to a general solution as well:
for(i in 1:length(groups)){
assign(names(groups)[i],as.data.frame(groups[[i]]))
}
> df1
X1 X2 X3
1 1 1 2
2 2 2 4
3 3 3 6
> df2
x1 X3 X3
1 4 4 8
2 5 5 10
3 6 6 12
I should note that this only works if the objects in the list are all named. Thank you again #Rockbar for guiding me to this.

R - Output of aggregate and range gives 2 columns for every column name - how to restructure?

I am trying to produce a summary table showing the range of each variable by group. Here is some example data:
df <- data.frame(group=c("a","a","b","b","c","c"), var1=c(1:6), var2=c(7:12))
group var1 var2
1 a 1 7
2 a 2 8
3 b 3 9
4 b 4 10
5 c 5 11
6 c 6 12
I used the aggregate function like this:
df_range <- aggregate(df[,2:3], list(df$group), range)
Group.1 var1.1 var1.2 var2.1 var2.2
1 a 1 2 7 8
2 b 3 4 9 10
3 c 5 6 11 12
The output looked normal, but the dimensions are 3x3 instead of 5x3 and there are only 3 names:
names(df_range)
[1] "Group.1" "var1" "var2"
How do I get this back to the normal data frame structure with one name per column? Or alternatively, how do I get the same summary table without using aggregate and range?
That is the documented output of a matrix within the data frame. You can undo the effect with:
newdf <- do.call(data.frame, df_range)
# Group.1 var1.1 var1.2 var2.1 var2.2
#1 a 1 2 7 8
#2 b 3 4 9 10
#3 c 5 6 11 12
dim(newdf)
#[1] 3 5
Here's an approach using dplyr:
library(dplyr)
df %>%
group_by(group) %>%
summarise_each(funs(max(.) - min(.)), var1, var2)

Convert Univariate Contingecy Table to Data Frame in R

I have a univariate contingency table that I would like to convert to a data frame.
>t <- table(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4))
>t
1 2 3 4
4 4 4 4
But converting t to a data frame yields something I dont need:
>data.frame(t)
Var1 Freq
1 1 4
2 2 4
3 3 4
4 4 4
I would like a data frame that looks exactly like the table t, with 4 columns named 1, 2, 3 and 4 (or X1, X2, X3, X4), and one row. Any help I can find, using things like as.data.frame.matrix() return errors for me, I think because my data is univariate and not multivariate.
We can use as.data.frame.list()
tbl <- table(rep(1:4, 4))
as.data.frame.list(tbl)
# X1 X2 X3 X4
# 1 4 4 4 4
Or to use the original names, add optional = TRUE
as.data.frame.list(tbl, optional = TRUE)
# 1 2 3 4
# 1 4 4 4 4

Resources