Combine the contents of two columns into one column using R [duplicate] - r

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 1 year ago.
I got some data like this
structure(list(id = c(1, 1, 1), time1 = c(10, 20, 30), time2 = c(15, 25, 35)), row.names = c(NA, 3L), class = "data.frame")
and I want to create a single column from the two columns in the above data
structure(list(id = c(1, 1, 1, 1, 1, 1), time = c(10, 15, 20, 25, 30, 35)), row.names = c(NA, 6L), class = "data.frame")
I dont think its the same as converting into long format because I dont want two columns as a result of gather(), with one the names of the columns used and one the values.

We can use pivot_longer and this should be more general as it can also do reshaping based on other patterns and multiple columns as well. Note that pivot_longer succeeds the reshape2 function melt with more enhanced capabilities and bug fixes
library(dplyr)
library(tidyr)
pivot_longer(df1, cols = time1:time2, values_to = 'time') %>%
select(-name)
-output
# A tibble: 6 x 2
# id time
# <dbl> <dbl>
#1 1 10
#2 1 15
#3 1 20
#4 1 25
#5 1 30
#6 1 35
Or using base R with stack
transform(stack(df1[-1])[1], id = rep(df1$id, 2))[2:1]
Or can use data.frame with unlist
data.frame(id = df1$id, value = unlist(df1[-1], use.names = FALSE))

Alternative to tidyr, though that's a good way to do it:
reshape2::melt(dat, "id")[,-2]
# id value
# 1 1 10
# 2 1 20
# 3 1 30
# 4 1 15
# 5 1 25
# 6 1 35
(Normally it includes the pivoted column names as a column itself, so the [,-2] removes that since your expected output didn't have it. You can do just melt(.) if you want/need to keep it.)

Related

Writing a function with variable times in the repeat function in R

I'm hoping someone can help me write a more eloquent function to do the following:
Let's say I have a data frame looking approximately like the following:
library(tidyverse)
d =
tibble(
ID = as.factor(c("1", "2")),
dialect_TCU = as.numeric(c(8, 12)),
standard_TCU = as.numeric(c(12, 9)),
mixture_TCU = as.numeric(c(14, 5))
)
I cannot, for the life of me, figure out how to write a function that does the following:
Repeats each header the amount of times listed for each participant and
repeats the participant ID the amount of times the headers are repeated.
The ending data frame should look like this:
d2 =
tibble(
ID = c(rep("1", 34),
rep("2", 26)),
successfulRow = c(rep("dialect_TCU", 8),
rep("standard_TCU", 12),
rep("mixture_TCU", 14),
rep("dialect_TCU", 12),
rep("standard_TCU", 9),
rep("mixture_TCU", 5))
)
If anyone could help me out in writing a function that does this (it's probably really easy and I'm just overthinking the whole thing...), that would be extremely helpful!
Thanks!
We can reshape to 'long' with pivot_longer and then use uncount for replicating the rows
library(dplyr)
library(tidyr)
d %>%
pivot_longer(cols = -ID, names_to = "successfulRow") %>%
uncount(value)
-output
# A tibble: 60 × 2
ID successfulRow
<fct> <chr>
1 1 dialect_TCU
2 1 dialect_TCU
3 1 dialect_TCU
4 1 dialect_TCU
5 1 dialect_TCU
6 1 dialect_TCU
7 1 dialect_TCU
8 1 dialect_TCU
9 1 standard_TCU
10 1 standard_TCU
# … with 50 more rows

Create multiple columns in R using a formula

I'm a bit new to R and trying to find a simplified way of creating multiple columns based on a formula.
I have a dataset that has a base date followed by scores that were taken weekly (score1 = score from 1 week after base date). I would like to generate a date for each week i.e. adding X*7 to the base date. I have found a way to do this by simply creating each date variable one at a time (see below) but since I have over 500 scores, I was wondering if there is a simplified way of doing this that does not take up hundreds of lines of code.
Dataset$score1_date <- Dataset$base_date + (1*7)
Dataset$score2_date <- Dataset$base_date + (2*7)
Dataset$score3_date <- Dataset$base_date + (3*7)
Here is an example dataset:
Dataset <- structure(list(id = c(1, 2, 3), base_date = structure(c(18628, 18633, 18641), class = "Date"), score1 = c(4, 5, 5), score2 = c(6, 5, 2), score3 = c(5, 5, 1)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
Thank you!
We can use lapply to loop over the multiplier index i.e 1:3 in the OP's post, multiply by 7 and add to base_date, then assign the list of vectors to new columns by pasteing the 'score' with the index and '_date'
Dataset[paste0('score', 1:3, '_date')] <- lapply(1:3,
function(i) Dataset$base_date + i*7)
Or using dplyr, loop across the 'score' columns, extract the numeric part from the column name (cur_column()) with parse_number, multiply by 7 and add to 'base_date' while modifying the column names in .names by adding the '_date' to create new columns
library(dplyr)
Dataset <- Dataset %>%
mutate(across(starts_with('score'), ~ base_date +
(readr::parse_number(cur_column())) * 7, .names = '{.col}_date'))
-output
Dataset
# A tibble: 3 x 8
# id base_date score1 score2 score3 score1_date score2_date score3_date
# <dbl> <date> <dbl> <dbl> <dbl> <date> <date> <date>
#1 1 2021-01-01 4 6 5 2021-01-08 2021-01-15 2021-01-22
#2 2 2021-01-06 5 5 5 2021-01-13 2021-01-20 2021-01-27
#3 3 2021-01-14 5 2 1 2021-01-21 2021-01-28 2021-02-04
You can try using a for loop and indicating a column of a data.frame using double brackets (i.e. [[.]]). For example:
for (i in c(1:500)){
Dataset[[paste0("score", i, "_date")]] <- Dataset$base_date + (i*7)
}

R data frame convert column names to rows based on dates, the column names which has a common string and there non 0 values to 1 row item [duplicate]

This question already has answers here:
Tidying data with several repeating variables in R
(2 answers)
Stack dataframe columns with two distinct suffix into two columns, preferably using tidyverse [duplicate]
(1 answer)
Function to filter data equal to or greater than a certain value
(2 answers)
How to identify an ID with values in at least one column for all rows?
(2 answers)
Closed 1 year ago.
I want to convert the R data frame column names to rows by looking at conditions like,
If two column names are having partial common name separated by '_' like x_a01, y_a01 convert it to 1
row item with common name as a01 based on date.
Ex: x_a01, y_a01 -> a01, x_b01, y_b01 -> b01
These converted column names to row values should have non zero values.
Ex: x_c01, y_c01 have 0 values in 1st row these should be ignored while converting to row items
The dataframe:
Convert the above dataframe to:
We can use pivot_longer to reshape the data into 'long' format and then with filter remove any rows having both x and y values as 0
library(dplyr)
library(dplyr)
df1 %>%
pivot_longer(cols = -date, names_to = c(".value", "colname"),
names_sep = "_", values_drop_na = TRUE)%>%
filter(if_any(c(x, y), ~ . > 0))
-output
# A tibble: 5 x 4
# date colname x y
# <chr> <chr> <dbl> <dbl>
#1 01-01-2021 a01 1 2
#2 01-01-2021 b01 0 4
#3 01-01-2021 d01 3 4
#4 02-01-2021 b01 3.1 1.1
#5 02-01-2021 c01 4.5 6.2
data
df1 <- structure(list(date = c("01-01-2021", "02-01-2021"), x_a01 = c(1,
0), y_a01 = c(2, 0), x_b01 = c(0, 3.1), y_b01 = c(4, 1.1), x_c01 = c(0,
4.5), y_c01 = c(0, 6.2), x_d01 = c(3, 0), y_d01 = c(4, 0)),
class = "data.frame", row.names = c(NA,
-2L))

Combining multiple across() in a single mutate() sentence, while controlling the variables names in R

I have the following dataframe:
df = data.frame(a = 10, b = 20, a_sd = 2, b_sd = 3)
a b a_sd b_sd
1 10 20 2 3
I want to compute a/a_sd, b/b_sd, and to add the results to the dataframe, and name them ratio_a, ratio_b. In my dataframe I have a lot of variables so I need a 'wide' solution. I tried:
df %>%
mutate( across(one_of( c('a','b')))/across(ends_with('_sd')))
This gave:
a b a_sd b_sd
1 5 6.666667 2 3
So this worked but the new values took the place of the old ones. How can I add the results to the data frame and to control the new names?
You can use the .names argument inside across
df %>%
mutate(across(one_of(c('a','b')), .names = 'ratio_{col}')/across(ends_with('_sd')))
# a b a_sd b_sd ratio_a ratio_b
# 1 10 20 2 3 5 6.666667

R code to generate numbers in sequence and insert rows [duplicate]

This question already has answers here:
R code to insert rows based on a column's value and increment it by 1
(3 answers)
Closed 6 years ago.
I have a dataset with 2 columns. First column is an ID and the 2nd will column is the total number of quarters. If the Col B(quarters) has the value 8, then the 8 rows should be created starting from 1 to 8. The ID in col A should be the same for all rows. The dataset shown below is an example.
ID Quarters
A 5
B 2
C 1
Expected output
ID Quarters
A 1
A 2
A 3
A 4
A 5
B 1
B 2
C 1
Here is what I tried.
library(data.table)
setDT(df.WQuarter)[, (Quarters=1:Quarters), ID]
I get this error. Can you please help. I am really stuck at this for the whole day. I am just learning the basics of R.
We can use base R to replicate the 'ID' by 'Quarters' and create the 'Quarters' by taking the sequence of that column.
with(df1, data.frame(ID= rep(ID, Quarters), Quarters = sequence(Quarters)))
# ID Quarters
#1 A 1
#2 A 2
#3 A 3
#4 A 4
#5 A 5
#6 B 1
#7 B 2
#8 C 1
If we are using data.table, convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', get the sequence of 'Quarters' or just seq(Quarters).
library(data.table)
setDT(df1)[, .(Quarters=sequence(Quarters)) , by = ID]
As #PierreLaFortune commented on the post, if we have NA values, then we need to remove it
setDT(df1)[, .(Quarters = seq_len(Quarters[!is.na(Quarters)])), by = ID]
Or using the dplyr/tidyr
library(dplyr)
library(tidyr)
df1 %>%
group_by(ID) %>%
mutate(Quarters = list(seq(Quarters))) %>%
ungroup() %>%
unnest(Quarters)
If the OP's "Quarters" column is non-numeric, it should be converted to 'numeric' before proceeding
df1$Quarters <- as.numeric(as.character(df1$Quarters))
The as.character is in case if the column is factor, but if it is character class, as.numeric is enough.
data
df1 <- structure(list(ID = c("A", "B", "C"), Quarters = c(5L, 2L, 1L
)), .Names = c("ID", "Quarters"), class = "data.frame", row.names = c(NA,
-3L))

Resources