merging every nth columns from different data frames as text files - r

I have three data frames with 50 columns. I would like to write 50 text files in a way that the first text file should contain the first columns from the three data frames, the second text file should contain the second columns from the data frames, and so on. How can I use the column headers, which are the same in the three data frames, as the names of the output text files? Kindly suggest an R solution.
df1 <- data.frame(C1=c(1,2),C2=c(3,4))
df2 <- data.frame(C1=c(5,6),C2=c(7,8))
df3 <- data.frame(C1=c(9,10),C2=c(11,12))

Sample data, a list of three identically-structured frames:
frames <- list(
frame1 = data.frame(x=1:3, y=4:6, z=7:9),
frame2 = data.frame(x=11:13, y=14:16, z=17:19),
frame3 = data.frame(x=21:23, y=24:26, z=27:29))
If instead you have three individual (not list of) frames, create the list with
frames <- list(Df1, Df2, Df3)
Transposing in a fashion:
newframes <- lapply(seq_len(ncol(frames[[1]])), function(i) {
as.data.frame(lapply(frames, `[[`, i))
})
names(newframes) <- names(frames[[1]])
newframes
# $x
# frame1 frame2 frame3
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# $y
# frame1 frame2 frame3
# 1 4 14 24
# 2 5 15 25
# 3 6 16 26
# $z
# frame1 frame2 frame3
# 1 7 17 27
# 2 8 18 28
# 3 9 19 29
Writing to files:
for (nm in names(newframes)) write.csv(newframes[[nm]], paste0(nm, ".csv"))
This implementation makes the column names the name of the original frame, which is less ambiguous (in my opinion) than having them all the same column name x,x,x, y,y,y, etc). If you really need that though (and don't care that R will complain about it if you try to do follow-on work with it), then before writing it,
newframes <- Map(function(nm, x) `colnames<-`(x, rep(nm, ncol(x))),
names(newframes), newframes)
newframes
# $x
# x x x
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# $y
# y y y
# 1 4 14 24
# 2 5 15 25
# 3 6 16 26
# $z
# z z z
# 1 7 17 27
# 2 8 18 28
# 3 9 19 29

Related

How to use a fulljoin on my dataframes and rename columns with the same name R

I have two dataframes and they both have the exact same column names, however the data in the columns is different in each dataframe. I am trying to join the two frames (as seen below) by a full join. However, the hard part for me is the fact that I have to rename the columns so that the columns corresponding to my one dataset have some text added to the end while adding different text to the end of the columns that correspond to the second data set.
combined_df <- full_join(any.drinking, binge.drinking, by = ?)
A look at one of my df's:
Without custom function and shorter:
df <- cbind(cars, cars)
colnames(df) <- c(paste0(colnames(cars), "_any"), paste0(colnames(cars), "_binge"))
Output:
> head(df)
speed_any dist_any speed_binge dist_binge
1 4 2 4 2
2 4 10 4 10
3 7 4 7 4
4 7 22 7 22
5 8 16 8 16
6 9 10 9 10
Certainly not the most elegant way but maybe it is what you want:
custom_bind <- function(df1, suffix1, df2, suffix2){
colnames(df1) <- paste(colnames(df1), suffix1, sep = "_")
colnames(df2) <- paste(colnames(df2), suffix2, sep = "_")
df <- cbind(df1, df2)
return(df)
}
custom_bind(cars, "any", cars, "binge")
I made it as a function in case you want to do it with other tables. If not then it is not necessary.
Output:
> head(custom_bind(cars, "any", cars, "binge"))
speed_any dist_any speed_binge dist_binge
1 4 2 4 2
2 4 10 4 10
3 7 4 7 4
4 7 22 7 22
5 8 16 8 16
6 9 10 9 10

How to merge subsets of data frames from a list (i.e., merge all of the first dfs from each list component)

I have seen a number of answers as to how to merge dataframes from a list when each list element is a single data frame. However, in my case, each list element contains two data frames. I want to merge all of the first together and all of the second. As a dummy example:
lst<-list()
lst[[1]]<-list(data.frame(cat=c(1:5), type=c(11:15)), data.frame(group=c("A","B","C"), num=c(1:3)))
lst[[2]]<-list(data.frame(cat=c(22:26), type=c(50:54)), data.frame(group=c("H","I","J"), num=c(7:9)))
I want to merge the first elements together and the second elements together, to yield two data frames:
df1:
cat type
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 22 50
7 23 51
8 24 52
9 25 53
10 26 54
df2:
group num
1 A 1
2 B 2
3 C 3
4 H 7
5 I 8
6 J 9
I am sure there is some straightforward way to do this (somehow with do.call and rbind??) but I cannot figure out how to reference the various elements within each list properly.
Clearly with this small example I could just do it manually by:
df1<-rbind(lst[[1]][[1]], lst[[2]][[1]])
However, my actual list includes hundreds of data frames. I can do it by creating a loop and rbinding in one at a time sequentially, but I'm sure there is a more efficient way...Thanks for any help!
You can use Reduce function(where you can customize how to reduce it) to rbind data frames. Reduce takes two elements from the list every time and reduce it to one element based on your function, and for the customized rbind since each two data frames need to be bound separately, you can use Map, put them together:
Reduce(function(x, y) Map(rbind, x, y), lst)
# [[1]]
# cat type
# 1 1 11
# 2 2 12
# 3 3 13
# 4 4 14
# 5 5 15
# 6 22 50
# 7 23 51
# 8 24 52
# 9 25 53
# 10 26 54
# [[2]]
# group num
# 1 A 1
# 2 B 2
# 3 C 3
# 4 H 7
# 5 I 8
# 6 J 9
Or maybe a faster way:
lapply(1:2, function(i) do.call(rbind, lapply(lst, `[[`, i)))

Create multiple data frames from one based off values with a for loop

I have a large data frame that I would like to convert in to smaller subset data frames using a for loop. I want the new data frames to be based on the the values in a column in the large/parent data frame. Here is an example
x<- 1:20
y <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","C","C","C")
df <- as.data.frame(cbind(x,y))
ok, now I want three data frames, one will be columns x and y but only where y == "A", the second where y==
"B" etc etc. So the end result will be 3 new data frames df.A, df.B, and df.C. I realize that this would be easy to do out of a for loop but my actual data has a lot of levels of y so using a for loop (or similar) would be nice.
Thanks!
If you want to create separate objects in a loop, you can use assign. I used unique because you said you had many levels.
for(i in unique(df$y)) {
nam <- paste("df", i, sep = ".")
assign(nam, df[df$y==i,])
}
> df.A
x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 8 A
> df.B
x y
9 9 B
10 10 B
11 11 B
12 12 B
13 13 B
14 14 B
I think you just need the split function:
split(df, df$y)
$A
x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 8 A
$B
x y
9 9 B
10 10 B
11 11 B
12 12 B
13 13 B
14 14 B
15 15 B
16 16 B
17 17 B
$C
x y
18 18 C
19 19 C
20 20 C
It is just a matter of properly subsetting the output to split and store the results to objects like dfA <- split(df, df$y)[[1]] and dfB <- split(df, df$y)[[2]] and so on.

automating a normal transformation function in R over multiple columns

I have a data frame m with:
>m
id w y z
1 2 5 8
2 18 5 98
3 1 25 5
4 52 25 8
5 5 5 4
6 3 3 5
Below is a general function for normally transforming a variable that I need to apply to columns w,y,z.
y<-qnorm((rank(x,na.last="keep")-0.5)/sum(!is.na(x))
For example, if I wanted to run this function on "column w" to get the output column appended to dataframe "m" then:
m$w_n<-qnorm((rank(m$w,na.last="keep")-0.5)/sum(!is.na(m$w))
Can someone help me automate this to run on multiple columns in data frame m?
Ideally, I would want an output data frame with the following columns:
id w y z w_n y_n z_n
Note this is a sample data frame, the one I have is much larger and I have more letter columns to run this function on other than w, y,z.
Thanks!
Probably a way to do it in a single step, but what about:
df <- data.frame(id = 1:6, w = sample(50, 6), z = sample(50, 6) )
df
id w z
1 1 39 40
2 2 20 26
3 3 43 11
4 4 4 37
5 5 36 24
6 6 27 14
transCols <- function(x) qnorm((rank(x,na.last="keep")-0.5)/sum(!is.na(x)))
tmpdf <- lapply(df[, -1], transCols)
names(tmpdf) <- paste0(names(tmpdf), "_n")
df_final <- cbind(df, tmpdf)
df_final
df_final
id w z w_n z_n
1 1 39 40 -0.2104284 -1.3829941
2 2 20 26 1.3829941 1.3829941
3 3 43 11 0.2104284 0.6744898
4 4 4 37 -1.3829941 0.2104284
5 5 36 24 0.6744898 -0.6744898
6 6 27 14 -0.6744898 -0.2104284

Keep columns of a data frame based on a data frame

I have a data frame, called df, which contains 4000 values. I have a list of 1000 column numbers, in a data frame called list, which is 1000 rows by 1 column. How can I keep the rows with the numbers in list in the data frame df and throw the rest out. I already tried using:
listv <- as.vector(list)
and then using
dfnew <- df[,listv]
but I get the error
Error in .subset(x, j) : invalid subscript type 'list'
You're mixing up rows and columns subsetting. Here is a minimal example:
df <- data.frame(matrix(1:21, ncol = 3))
df
# X1 X2 X3
# 1 1 8 15
# 2 2 9 16
# 3 3 10 17
# 4 4 11 18
# 5 5 12 19
# 6 6 13 20
# 7 7 14 21
list <- data.frame(V1 = c(1, 4, 6))
list
# V1
# 1 1
# 2 4
# 3 6
df[list[, 1], ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
df[unlist(list), ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
Note also that as.vector(list) doesn't create a vector, as you thought it would. You need unlist here (as I used in the last example).

Resources