How to stack columns of data-frame in r? - r

I have a data-frame with these characteristics:
Z Y X1 X2 X3 X4 X5 ... X30
A n1 1 2 1 2 1 2 1 2
B n2 1 2 1 2 1 2 1 2
C n3 1 2 1 2 1 2 1 2
D n4 1 2 1 2 1 2 1 2
.
.
.
My purpose is to stack the column x1, x2, … x30, and associated the new column with columns z, y, and x. Some like this:
Newcolumn zyx
1 x-y-z
... I need a data-frame like this:
colum1 colum2
1 A+n1+X1.headername 1
2 B+n2+X2.headernam 2
3 C+n3X3.headername 1
4 D+n4X4.headername 2
. .
. .
. .
I’m trying to build a function, but I have some troubles
I follow this code for the data-frame:
df$zy <- paste(df$z,"-",df$y)
After that, I eliminate the columns “z” and “y”:
df$z <- NULL
df$y <- NULL
And save column df$zy as data-frame for use later:
df_zy <- as.data.frame(df$zy)
Then eliminate df$xy of original dataframe:
df$xy <- NULL
After that, I save as data-frame the column x1, and incorporate df_zy and name of column x1 (the name is “1”):
a <- as.data.frame(df$`1`)
b <- cbind(a, df_xy, x_column= 1)
b$zy <- paste(b$x_column,"-",b$` df$zy`)
b$` df$zy ` <- NULL
b$ x_column <- NULL
colnames(b)
names(b)[names(b) == "b$`1`"] <- "new_column"
This works, but only for the column x1 and I need this for x1 to x30, and stack all new column
Does anybody have an answer to this problem? Thanks!

You can use tidyr and dplyr librairies:
library(dplyr)
library(tidyr)
df_zy = df %>% pivot_longer(., cols = starts_with("X"), names_to = "Variables", values_to = "Value") %>%
mutate(NewColumn = paste0(Z,"-",Y,"-",Variables)) %>% select(NewColumn, Value)
And you get:
> df_zy
# A tibble: 8 x 2
NewColumn Value
<chr> <dbl>
1 A-n1-X1 1
2 A-n1-X2 2
3 B-n2-X1 1
4 B-n2-X2 2
5 C-n3-X1 1
6 C-n3-X2 2
7 D-n4-X1 1
8 D-n4-X2 2
Data
df = data.frame("Z" = LETTERS[1:4],
"Y" = c("n1","n2","n3","n4"),
"X1" = c(1,1,1,1),
"X2" = c(2,2,2,2))
Is it what you are looking for ?

Related

Move several chunks of columns dynamically to another position

My data is:
df <- data.frame(a = 1:2,
x = 1:2,
b = 1:2,
y = 3:4,
x_2 = 1:2,
y_2 = 3:4,
c = 1:2,
x_3 = 5:6,
y_3 = 1:2)
I now want to put together the x vars, and the y vars so that the order of columns would be:
a, x, x_2, x_3, b, y, y_2, y_3, c
I thought, I could use tidyverse's relocate function in combination with lapply or map or reduce (?), but it doesn't work out.
E.g. if I do:
move_names <- c("x", "y")
library(tidyverse)
moved_data <- lapply(as.list(move_names), function(x)
{
df <- df |>
relocate(!!!syms(paste0(x, "_", 2:3)),
.after = all_of(x))
}
)
It does the moving for x and y separately, but it creates separate list, but I want to have just my original df with relocated columns.
Update:
I should have been clear that my real data frame has ~500 columns where the to-be-moved columns are all over the place. So providing the full vector of desired column name order won't be feasible.
What I instead have: I have the names of my original columns, i.e. x and y, and I have the names of the to-be-moved columns, i.e. x_2, x_3, y_2, y_3.
In base R:
df[match(c('a', 'x', 'x_2', 'x_3', 'b', 'y', 'y_2', 'y_3', 'c'), names(df))]
#> a x x_2 x_3 b y y_2 y_3 c
#> 1 1 1 1 5 1 3 3 1 1
#> 2 2 2 2 6 2 4 4 2 2
Not sure if it's what you want.
Vector with order of column names
Let's say you have a vector relocate_name that contains the order of your columns:
library(tidyverse)
relocate_name <- c("a", "x", "x_2", "x_3", "b", "y", "y_2", "y_3", "c")
df %>% relocate(any_of(relocate_name))
Vector with prefix of column names
Or if you only have the prefix of the order, let's call it relocate_name2:
relocate_name2 <- c("a", "x", "b", "y", "c")
df %>% relocate(starts_with(relocate_name2))
Group x and y together
Or if you only want to "group" x and y together:
df %>%
relocate(starts_with("x"), .after = "x") %>%
relocate(starts_with("y"), .after = "y")
Output
All of the above output is the same.
a x x_2 x_3 b y y_2 y_3 c
1 1 1 1 5 1 3 3 1 1
2 2 2 2 6 2 4 4 2 2
library(rlist)
# split based in colname-part before _
L <- split.default(df, f = gsub("(.*)_.*", "\\1", names(df)))
# remove names with an underscore
# this is the new order, it should match the names of list L !!
neworder <- names(df)[!grepl("_", names(df))]
# [1] "a" "x" "b" "y" "c"
# cbind list elements together
ans <- rlist::list.cbind(L[neworder])
# a x.x x.x_2 x.x_3 b y.y y.y_2 y.y_3 c
# 1 1 1 1 5 1 3 3 1 1
# 2 2 2 2 6 2 4 4 2 2
# create tidy names again
names(ans) <- gsub(".*\\.(.*)", "\\1", names(ans))
# a x x_2 x_3 b y y_2 y_3 c
# 1 1 1 1 5 1 3 3 1 1
# 2 2 2 2 6 2 4 4 2 2
Ok, this is probably the worst workaround ever and I don't really understand what exactly I'm doing (especially with the <<-), but it is does the trick.
My general idea after realizing the problem a bit more with the help of you guys here was to "loop" through both of my x and y names, remove these new _2 and _3 columns from the vector of column names and re-append them after their "base" x and y columns.
search_names <- c("x", "y")
df_names <- names(df)
new_names <- lapply(search_names, function(x)
{
start <- which(df_names == x)
without_new_names <- setdiff(df_names, paste0(x, "_", 2:3))
df_names <<- append(without_new_names, values = paste0(x, "_", 2:3), after = start)
})[[length(search_names)]]
df |>
relocate(any_of(new_names))
a x x_2 x_3 b y y_2 y_3 c
1 1 1 1 5 1 3 3 1 1
2 2 2 2 6 2 4 4 2 2

Rename specific column names in a data frame (in R)

I have a data frame where I would like to put in front of a column name the following words: "high_" and "low_". The name of the columns from X2-X4 should be renamed eg.high_X2 and X5-X7 eg. low_X6.
Please see an example below.
X1 X2 X3 X4 X5 X6 X7
a 1 0 1 1 1 1 0
b 2 2 1 1 1 1 0
result
X1 high_X2 high_X3 high_X4 low_X5 low_X6 low_X7
a 1 0 1 1 1 1 0
b 2 2 1 1 1 1 0
You can use rep and paste -
names(df)[-1] <- paste(rep(c('high', 'low'), each = 3), names(df)[-1], sep = '_')
df
# X1 high_X2 high_X3 high_X4 low_X5 low_X6 low_X7
#a 1 0 1 1 1 1 0
#b 2 2 1 1 1 1 0
If you want to rely on range of columns then dplyr code would be easier.
library(dplyr)
df %>%
rename_with(~paste('high', ., sep = '_'), X2:X4) %>%
rename_with(~paste('low', ., sep = '_'), X5:X7)
The base solution (which is more straitforward for these kind of things imo)
df <- data.frame(X1=c(a=1L,b=2L),
X2=c(a=0L,b=2L),
X3=c(a=1L,b=1L),
X4=c(a=1L,b=1L),
X5=c(a=1L,b=1L),
X6=c(a=1L,b=1L),
X7=c(a=1L,b=1L))
cn <- colnames(df)
cond <- as.integer(substr(cn,2L,nchar(cn))) %% 2L == 0L
colnames(df)[cond] <- paste0(cn[cond],"_is_pair")
A tidyverse solution (a bit more awkward due to the tidyeval)
library(dplyr)
library(stringr)
library(tidyselect)
df <- data.frame(X1=c(a=1L,b=2L),
X2=c(a=0L,b=2L),
X3=c(a=1L,b=1L),
X4=c(a=1L,b=1L),
X5=c(a=1L,b=1L),
X6=c(a=1L,b=1L),
X7=c(a=1L,b=1L))
is_pair <- function(vars = peek_vars(fn = "is_pair")) {
vars[as.integer(str_sub(vars,2L,nchar(vars))) %% 2L == 0L]
}
df %>% rename_with(~paste0(.x,"_is_pair"),
is_pair())

Adding column with information from another dataframe R

I have two dataframes and I need to join informations.
Here the first df where I have different points (1,2,3..):
eleno elety resno
1 N 1
2 CA 1
3 C 1
4 O 1
5 CB 1
6 CG 1
The second one indicates distances between points, "eleno" represents the first point and "ele2" the second one:
eleno ele2 values
<chr> <chr> <dbl>
1 2 1.46
1 3 2.46
1 4 2.86
1 5 2.46
1 6 3.83
1 7 4.47
I'd like to have in the 1st df a new column with info from df 2. For example, for point 1 I'd like to have -2(second point):1.46(distance) , -3:2.46, -4:2.86 and so on, preferable in a one column.
Something like this
eleno elety resno dist
1 N 1 -2:1.46, -3:2.46, -4:2.86 ...
2 CA 1
3 C 1
4 O 1
5 CB 1
6 CG 1
Thank you!
If I understand your preference to one column, then a possibility without dplyr is as follows. First, we create the new column by concatenating the ele2 and values columns from df2 using the paste() function, with a colon as the separator:
new_column <- paste(-df2$ele2, df2$values, sep = ":")
Then, we use cbind() to bind it to df1:
new_df1 <- cbind(df1, ele2_values = new_column)
This will give us a new data frame like so:
eleno elety resno ele2_values
1 1 N 1 -2:1.46
2 2 CA 1 -3:2.46
3 3 C 1 -4:2.86
4 4 O 1 -5:2.46
5 5 CB 1 -6:3.83
6 6 CG 1 -7:4.47
Here is the data that I used, based on what you have given:
df1 <- data.frame(
eleno = 1:6,
elety = c("N", "CA", "C", "O", "CB", "CG"),
resno = rep(1, 6)
)
df2 <- data.frame(
eleno = rep(1, 6),
ele2 = 2:7,
values = c(1.46, 2.46, 2.86, 2.46, 3.83, 4.47)
)
If we want to get this column as a single element for each point, we can modify our code in the following manner:
Instantiate new_column as an empty vector:
new_column <- vector()
Then call some variant of *apply() or use a for loop to subset the original data frame by points, while applying our original code and appending our singular character elements back to new_column:
lapply(unique(df2$eleno), FUN = function(x) {
subset <- subset(df2, eleno == x)
new_elem <- paste(-subset$ele2, subset$values, sep = ":", collapse = ", ")
new_column <<- c(new_column, new_elem)
})
Once this operation is complete, we use cbind() as before to bind new_column to df1:
new_df1 <- cbind(df1, ele2_values = new_column)
Our output is as follows,
eleno elety resno ele2_values
1 1 N 1 -2:1.13703411305323, -3:6.22299404814839, -4:6.09274732880294, -5:6.23379441676661, -6:8.60915383556858, -7:6.40310605289415
2 2 CA 1 -2:0.094957563560456, -3:2.32550506014377, -4:6.66083758231252, -5:5.14251141343266, -6:6.93591291783378, -7:5.44974835589528
3 3 C 1 -2:2.82733583590016, -3:9.23433484276757, -4:2.92315840255469, -5:8.37295628152788, -6:2.86223284667358, -7:2.66820780001581
4 4 O 1 -2:1.86722789658234, -3:2.32225910527632, -4:3.16612454829738, -5:3.02693370729685, -6:1.59046002896503, -7:0.399959180504084
5 5 CB 1 -2:2.18799541005865, -3:8.10598552459851, -4:5.25697546778247, -5:9.14658166002482, -6:8.3134504687041, -7:0.45770263299346
6 6 CG 1 -2:4.56091482425109, -3:2.65186671866104, -4:3.04672203026712, -5:5.0730687007308, -6:1.81096208281815, -7:7.59670635452494
Here is my random data that I used for df2 in this case:
set.seed(1234)
df2 <- data.frame(
eleno = rep(1:6, rep(6, 6)),
ele2 = 2:7,
values = runif(length(rep(1:6, rep(6, 6)))) * 10
)

R Split dataframe into dataframe and matrix after column X using dplyr

I am trying to split a dataframe vertically after certain column. Preferably by name. The first half of the split should remain a dataframe and the second should become a matrix. Here is an example.
pp <- rep(1:4,each=4)
cond <- rep(c("A","B"),each=2)
time <- rep(1:2,8)
value <- rnorm(16,1)
df <- data.frame(pp,cond,time,value)
as.data.frame(df %>%
pivot_wider(names_from = c(time), values_from = value))
pp cond 1 2
1 1 A 0.4121770 2.13178625
2 1 B 2.8638453 -0.64314357
3 2 A 2.2587738 1.74448028
4 2 B 0.2737670 0.89784427
5 3 A 0.5831763 2.37123498
6 3 B 0.5158274 1.40670718
7 4 A -0.6313988 1.06272354
8 4 B 2.0142500 0.01102302
Now I'd like to continue piping and split the cols pp and cond into a new dataframe and cols 1 and 2 into a matrix. Any suggestions?
You can try this :
tidyr::pivot_wider(df, names_from = time, values_from = value) %>%
split.default(rep(c(1, 2), each = 2)) -> data1
#change cols 1 and 2 into matrix.
data1[[2]] <- as.matrix(data1[[2]])
data1
#$`1`
# A tibble: 8 x 2
# pp cond
# <int> <chr>
#1 1 A
#2 1 B
#3 2 A
#4 2 B
#5 3 A
#6 3 B
#7 4 A
#8 4 B
#$`2`
# 1 2
#[1,] 1.4442871 1.43913039
#[2,] 2.0406232 1.48409939
#[3,] 0.7551162 1.91599206
#[4,] 1.8006224 0.06343097
#[5,] -0.4007874 1.16027754
#[6,] 0.7260376 0.01446089
#[7,] 1.0839307 -0.31999653
#[8,] 1.1612264 0.37507161
If you want data as two separate objects instead of a list using the column names. Try :
col1 <- c('pp', 'cond')
col2 <- c('1', '2')
df1 <- tidyr::pivot_wider(df, names_from = time, values_from = value)
data1 <- subset(df1, select = col1)
data2 <- subset(df1, select = col2)
Or
data1 <- df1 %>% dplyr::select(all_of(col1))
data2 <- df1 %>% dplyr::select(all_of(col2)) %>% as.matrix()

writing table from a list in R

I have a SNP file and i want to count how many they in each column. while writing a table from the list it shows error as "arguments imply differing number of rows". I want a solution so that i can write the list into a table.
Please help me.
input file : image file is added
input file contain 830 row and 210 column
#1 R code
require(gdata)
library(plyr)
df = read.xls ("jTest_file.xlsx", sheet = 1, header = TRUE)
combine = c()
for(i in 1:v){
vec = count(df[,i])
colnames(vec) <- c (colnames(df[i]),"freq")
combine = c(combine,vec)
}
write.table(combine,file="test_output.xls",sep="\t",quote=FALSE,row.names =FALSE)
but there are some blank values in the input so i substitued the blank with XX so that the row number can be maintain but it does not worked.
#2 R code
require(gdata)
library(plyr)
df = read.xls ("jTest_file.xlsx", sheet = 1, header = TRUE)
combine = c()
for(i in 1:v){
data=sub("^$", "XX", df[,i])
vec = count(data)
colnames(vec) <- c (colnames(df[i]),"freq")
combine = c(combine,vec)
}
write.table(combine,file="test_output.xls",sep="\t",quote=FALSE,row.names =FALSE)
There is a much cleaner way to do these counts using the dplyr and tidyr packages.
Since you did not provide sample data, I will make some first:
#Make sample data
li = lapply(1:10, function(X) {
sample(x = c("A", "C", "G", "T"), size = 10,
replace = TRUE)
})
df = data.frame(li, stringsAsFactors = FALSE)
names(df) = paste("X", 1:10, sep = "")
head(df, 3)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 T G C T C A T T C T
# 2 A A A G G G T G C A
# 3 C C A T A A C A T G
Now the actual answer - doing the counts:
library(tidyr)
library(dplyr)
df_long = gather(df, var, value)
df_groups = group_by(df_long, var, value)
df_counts = summarise(df_groups, count = n())
df_wide = spread(df_counts, value, count, fill = 0)
df_wide
# Source: local data frame [10 x 5]
# Groups: var [10]
#
# var A C G T
# * <chr> <dbl> <dbl> <dbl> <dbl>
# 1 X1 3 4 0 3
# 2 X10 5 0 2 3
# 3 X2 3 2 2 3
# 4 X3 4 3 1 2
# 5 X4 2 1 4 3
# 6 X5 2 3 3 2
# 7 X6 4 2 1 3
# 8 X7 2 4 2 2
# 9 X8 2 3 2 3
# 10 X9 2 2 2 4
I encourage you to explore individual steps (df_long, df_groups, df_counts, df_wide). This will give you a sense of what is going on with the data.

Resources