Merging all the colunms in R with colunm names - r

I have the following data
> df
X1 X2 X3
1 3 4
1 0 0
1 1 0
and I want to merge all the column so that the final output will be
new colName
1 X1
1 X1
1 X1
3 X2
0 X2
1 X2
4 X3
0 X3
0 X3

You can try stack
> setNames(stack(df),c("new","colName"))
new colName
1 1 X1
2 1 X1
3 1 X1
4 3 X2
5 0 X2
6 1 X2
7 4 X3
8 0 X3
9 0 X3
Data
> dput(df)
structure(list(X1 = c(1L, 1L, 1L), X2 = c(3L, 0L, 1L), X3 = c(4L,
0L, 0L)), class = "data.frame", row.names = c(NA, -3L))

library (tidyverse)
pivot_longer(df,X1:X3)

You can try gathering the column names with tidyr
library(tidyr)
X1 <- c(1,1,1)
X2 <- c(3,0,1)
X3 <- c(4,0,0)
df <- data.frame(X1, X2, X3)
df <- df %>%
gather(new, colname, X1, X2, X3)
print(df)
new colname
1 X1 1
2 X1 1
3 X1 1
4 X2 3
5 X2 0
6 X2 1
7 X3 4
8 X3 0
9 X3 0

Related

Creating new column in data frame based on value matched to participant ID

I know there is a simple solution to this problem, as I solved it a couple of months ago, but have since lost the relevant file, and cannot for the life of me work out how I did it.
My data is in a long form, where each row represents a participant's answer to one question, with all rows for one participant sharing a common participant ID - e.g.
ParticipantID Question Resp
1 Age x1
1 Gender x2
1 Education x3
1 Q1 x4
1 Q2 x5
...
2 Age y1
2 Gender y2
...
etc
I want to add new columns to the data to associate the various demographic values with each answer provided by a given participant. So in the example above, I would have a new column "Age" which would take the value x1 for all rows where ParticipantID = 1, y1 for all rows where ParticipantID = 2, etc., like so:
ParticipantID Question Resp Age Gender ...
1 Age x1 x1 x2
1 Gender x2 x1 x2
1 Education x3 x1 x2
1 Q1 x4 x1 x2
1 Q2 x5 x1 x2
...
2 Age y1 y1 y2
2 Gender y2 y1 y2
...
etc
Importantly, I can't just rotate the table from long to wide, because I need the study questions (represented as Q1, Q2, ... above) to remain in long form.
Any help that can be offered is greatly appreciated!
As long as each participant has the same questions in the same order, you can do
cbind(df, do.call(rbind, lapply(split(df, df$ParticipantID), function(x) {
setNames(as.data.frame(t(x[-1])[rep(2, nrow(x)),]), x[[2]])
})), row.names = NULL)
#> ParticipantID Question Resp Age Gender Education Q1 Q2
#> 1 1 Age x1 x1 x2 x3 x4 x5
#> 2 1 Gender x2 x1 x2 x3 x4 x5
#> 3 1 Education x3 x1 x2 x3 x4 x5
#> 4 1 Q1 x4 x1 x2 x3 x4 x5
#> 5 1 Q2 x5 x1 x2 x3 x4 x5
#> 6 2 Age y1 y1 y2 y3 y4 y5
#> 7 2 Gender y2 y1 y2 y3 y4 y5
#> 8 2 Education y3 y1 y2 y3 y4 y5
#> 9 2 Q1 y4 y1 y2 y3 y4 y5
#> 10 2 Q2 y5 y1 y2 y3 y4 y5
Data used
df <- structure(list(ParticipantID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), Question = c("Age", "Gender", "Education", "Q1",
"Q2", "Age", "Gender", "Education", "Q1", "Q2"), Resp = c("x1",
"x2", "x3", "x4", "x5", "y1", "y2", "y3", "y4", "y5")), class = "data.frame",
row.names = c(NA, -10L))
df
#> ParticipantID Question Resp
#> 1 1 Age x1
#> 2 1 Gender x2
#> 3 1 Education x3
#> 4 1 Q1 x4
#> 5 1 Q2 x5
#> 6 2 Age y1
#> 7 2 Gender y2
#> 8 2 Education y3
#> 9 2 Q1 y4
#> 10 2 Q2 y5
Created on 2022-09-19 with reprex v2.0.2

how to group_by one variable and count based on another variable?

Is it possible to use group_by to group one variable and count the target variable based on another variable?
For example,
x1
x2
x3
A
1
0
B
2
1
C
3
0
B
1
1
A
1
1
I want to count 0 and 1 of x3 with grouped x1
x1
x3=0
x3=1
A
1
1
B
0
2
C
1
0
Is it possible to use group_by and add something to summarize? I tried group_by both x1 and x3, but that gives x3 as the second column which is not what we are looking for.
If it's not possible to just use group_by, I was thinking we could group_by both x1 and x3, then split by x3 and cbind them, but the two dataframes after split have different lengths of rows, and there's no cbind_fill. What should I do to cbind them and fill the extra blanks?
using the data.table package:
library(data.table)
dat <- as.data.table(dataset)
dat[, x3:= paste0("x3=", x3)]
result <- dcast(dat, x1~x3, value.var = "x3", fun.aggregate = length)
A tidyverse approach to achieve your desired result using dplyr::count + tidyr::pivot_wider:
library(dplyr)
library(tidyr)
df %>%
count(x1, x3) %>%
pivot_wider(names_from = "x3", values_from = "n", names_prefix = "x3=", values_fill = 0)
#> # A tibble: 3 × 3
#> x1 `x3=0` `x3=1`
#> <chr> <int> <int>
#> 1 A 1 1
#> 2 B 0 2
#> 3 C 1 0
DATA
df <- data.frame(
x1 = c("A", "B", "C", "B", "A"),
x2 = c(1L, 2L, 3L, 1L, 1L),
x3 = c(0L, 1L, 0L, 1L, 1L)
)
Yes, it is possible. Here is an example:
dat = read.table(text = "x1 x2 x3
A 1 0
B 2 1
C 3 0
B 1 1
A 1 1", header = TRUE)
dat %>% group_by(x1) %>%
count(x3) %>%
pivot_wider(names_from = x3,
names_glue = "x3 = {x3}",
values_from = n) %>%
replace(is.na(.),0)
# A tibble: 3 x 3
# Groups: x1 [3]
# x1 `x3 = 0` `x3 = 1`
# <chr> <int> <int>
#1 A 1 1
#2 B 0 2
#3 C 1 0

Extract data based on a time series column in R

I have an annual daily timeseries pixel data in a data frame in such a way that each date occurs multiple times for each of the pixel. Now I would like to extract/subset this data based on a set of dates stored in another data frame. How can I do this in R using dplyr?
Sample data
X Y T Value
X1 Y1 1/1/2004 1
X2 Y2 1/1/2004 2
X3 Y3 1/1/2004 3
X1 Y1 1/2/2004 4
X2 Y2 1/2/2004 5
X3 Y3 1/2/2004 6
X1 Y1 1/3/2004 7
X2 Y2 1/3/2004 8
X3 Y3 1/3/2004 9
Dates of interest
1/1/2004
1/2/2004
Code
library(dplyr)
X = c("X1", "X2", "X3", "X1", "X2", "X3", "X1", "X2", "X3")
Y = c("Y1", "Y2", "Y3", "Y1", "Y2", "Y3", "Y1", "Y2", "Y3")
T = c("1/1/2004", "1/2/2004", "1/3/2004", "1/1/2004", "1/2/2004", "1/3/2004","1/1/2004", "1/2/2004", "1/3/2004")
Value = c("1", "2", "3", "4", "5", "6", "7", "8", "9")
df = data.frame(X, Y, T, Value)
# Desired dates
TS = read.csv("TS.csv")
TS
"1/1/2004", "1/2/2004"
#stuck...___
If your TS is TS = c("1/1/2004", "1/2/2004"), simply using filter,
library(dplyr)
df %>%
filter(T %in% TS)
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
3 X1 Y1 1/1/2004 4
4 X2 Y2 1/2/2004 5
5 X1 Y1 1/1/2004 7
6 X2 Y2 1/2/2004 8
if your TS is TS = ("1/1/2004, 1/2/2004")
library(stringr)
df %>%
filter(T %in% str_split(gsub("\\s+", "", TS), ",", simplify = TRUE))
Base R:
> df[df$T %in% TS,]
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
4 X1 Y1 1/1/2004 4
5 X2 Y2 1/2/2004 5
7 X1 Y1 1/1/2004 7
8 X2 Y2 1/2/2004 8
>
If TS is
"1/1/2004, 1/2/2004", use stringr:
> df[df$T %in% stringr::str_split(TS, ", ", simplify=TRUE),]
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
4 X1 Y1 1/1/2004 4
5 X2 Y2 1/2/2004 5
7 X1 Y1 1/1/2004 7
8 X2 Y2 1/2/2004 8
>

how to combine different data into one row?

I want to combine two records of datafram "df" with ID of "A" and "B" which lacks some data (NA)into one row with ID "C" (goal). I know matrix [ , ] can do this kind of work. But in the dataframe no row number is not available.
Below is my data.
df
ID Y1 Y2 Y3 Y4 Y5 Y6
A 7 4 NA NA NA NA
B NA NA 5 5 4 4
goal:
ID Y1 Y2 Y3 Y4 Y5 Y6
C 7 4 5 5 4 4
We can use
df1 %>%
summarise(ID = 'C', across(where(is.numeric), na.omit))
# ID Y1 Y2 Y3 Y4 Y5 Y6
#1 C 7 4 5 5 4 4
data
df1 <- structure(list(ID = c("A", "B"), Y1 = c(7L, NA), Y2 = c(4L, NA
), Y3 = c(NA, 5L), Y4 = c(NA, 5L), Y5 = c(NA, 4L), Y6 = c(NA,
4L)), class = "data.frame", row.names = c(NA, -2L))
We could use adorn_totals from janitor package:
library(dplyr)
library(janitor)
df1 %>%
adorn_totals("row") %>%
slice(3)
Output:
ID Y1 Y2 Y3 Y4 Y5 Y6
Total 7 4 5 5 4 4
Does this work:
as.data.frame(cbind(ID = 'C',t(apply(df[-1], 2, sum, na.rm = TRUE))))
ID Y1 Y2 Y3 Y4 Y5 Y6
1 C 7 4 5 5 4 4
Some base R options
colSums
> cbind(ID = "C", data.frame(t(colSums(df[-1], na.rm = TRUE))))
ID Y1 Y2 Y3 Y4 Y5 Y6
1 C 7 4 5 5 4 4
na.omit + list2DF
> list2DF(c(ID = "C", Map(na.omit, df[-1])))
ID Y1 Y2 Y3 Y4 Y5 Y6
1 C 7 4 5 5 4 4
If in any case, you have pair of rows which you want to coalesce into each other, you may follow this simple strategy
df <- structure(list(ID = c("A", "B", "C", "E"), Y1 = c(7L, NA, NA,
7L), Y2 = c(4L, NA, 5L, NA), Y3 = c(NA, 5L, NA, 5L), Y4 = c(NA,
5L, NA, 5L), Y5 = c(NA, 4L, 14L, NA), Y6 = c(NA, 4L, 5L, NA)), row.names = c(NA,
-4L), class = "data.frame")
df
#> ID Y1 Y2 Y3 Y4 Y5 Y6
#> 1 A 7 4 NA NA NA NA
#> 2 B NA NA 5 5 4 4
#> 3 C NA 5 NA NA 14 5
#> 4 E 7 NA 5 5 NA NA
library(dplyr)
df %>% group_by(ID = (row_number()+1) %/% 2) %>%
summarise(across(everything(), sum, na.rm =T))
#> # A tibble: 2 x 7
#> ID Y1 Y2 Y3 Y4 Y5 Y6
#> <dbl> <int> <int> <int> <int> <int> <int>
#> 1 1 7 4 5 5 4 4
#> 2 2 7 5 5 5 14 5
Created on 2021-05-30 by the reprex package (v2.0.0)

Repeated values are stored when copied to data frame

I have a data frame like x,
> x
x1 x2 x3 x4
1 3 5 7
3 4 7 2
1 7 8 7
2 3 7 4
I want to change each row based on some calculations. The resulting rows are not of same size. say, I have to want to copy a row of length 2,
y <- c(1,2)
x[1,] <- y
Then the values stored repeatedly in x,
> x
x1 x2 x3 x4
1 2 1 2
3 4 7 2
1 7 8 7
2 3 7 4
But my output should be,
> x
x1 x2 x3 x4
1 2 NA NA
3 4 7 2
1 7 8 7
2 3 7 4
How to do this?
You could pad the NAs based on the number of columns of 'x' by assigning length of 'y' to ncol(x). If the length of 'y' is less than ncol(x), it will pad the additional elements with NA.
x[1,] <- `length<-`(y, ncol(x))
x
# x1 x2 x3 x4
#1 1 2 NA NA
#2 3 4 7 2
#3 1 7 8 7
#4 2 3 7 4
Just for easier understanding, this is similar to the two step process #mpalanco mentioned in the comments, i..e first we change the length(y) to be the length(x) (or ncol(x) - in a 'data.frame', length and ncol are the same) to pad NAs and then replace the first row value of 'x' with that of 'y'.
length(y) <- length(x)
x[1,] <- y
data
x <- structure(list(x1 = c(1L, 3L, 1L, 2L), x2 = c(3L, 4L, 7L, 3L),
x3 = c(5L, 7L, 8L, 7L), x4 = c(7L, 2L, 7L, 4L)), .Names = c("x1",
"x2", "x3", "x4"), class = "data.frame", row.names = c(NA, -4L))
No clever solution came to mind, so here is a small function which pads y with NA. Using the padded y gives the intended behavior.
pad_with_NA <- function(y, dim_x) {
if(length(y)<dim_x) {
y <- c(y, rep(NA, dim_x-length(y)))
}
y
}
x[1,] <- pad_with_NA(y, dim(x)[2])

Resources