R convert df from wide to long by splitting column names [duplicate] - r

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I am trying to convert the below df_original data.frame to the form in df_goal. The columns from the original data.frame shall be split, with the latter part acting as a key, while the first part shall be kept as a variable name. Preferably I would like to use a tidyverse-solution, but am open to every aproach. Thank you very much!
df_original <-
data.frame(id = c(1,2,3),
variable1_partyx = c(4,5,6),
variable1_partyy = c(14,15,16),
variable2_partyx = c(24,25,26),
variable2_partyy = c(34,35,36))
df_goal <-
data.frame(id = c(1,1,2,2,3,3),
key = c("partyx","partyy","partyx","partyy","partyx","partyy"),
variable1 = c(4,14,5,15,6,16),
variable2 = c(24,34,25,35,26,36))

df_original %>%
tidyr::gather(key, value, -id) %>%
tidyr::separate(key, into = c("var", "key"), sep = "_") %>%
tidyr::spread(var, value)

Related

arrange data based on user defined variables order? [duplicate]

This question already has answers here:
Order data frame rows according to vector with specific order
(6 answers)
Closed 1 year ago.
I have the following data.frame and would like to change the order of the rows in such a way that rows with variable == "C" come at the top followed by rows with "A" and then those with "B".
library(tidyverse)
set.seed(123)
D1 <- data.frame(Serial = 1:10, A= runif(10,1,5),
B = runif(10,3,6),
C = runif(10,2,5)) %>%
pivot_longer(-Serial, names_to = "variables", values_to = "Value" ) %>%
arrange(-desc(variables))
D1 %>%
mutate(variables = ordered(variables, c('C', 'A', 'B'))) %>%
arrange(variables)
Perhaps I did not get the question. If you want C then A then B, you could do:
D1 %>%
arrange(Serial, variables)
#Onyambu's answer is probably the most "tidyverse-ish" way to do it, but another is:
D1[order(match(D1$variables,c("C","A","B"))),]
or
D1 %>% slice(order(match(variables,c("C","A","B"))))
or
D1 %>% slice(variables %>% match(c("C","A","B")) %>% order())

Remove a pattern in a string and mutate these values to a new column [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 3 years ago.
Let's say I have this data frame:
df <- as.data.frame(c("77111","77039","5005","4032"))
and I want to create a new column where if the values start with "77", then remove the "77" and extract the remaining numbers. Otherwise, keep the values as is so that the new column looks like this:
df <- df %>% mutate(new_numbers =c("111","039","5005","4032"))
We can use str_remove to remove the 77 from the start (^) of the column
library(dplyr)
library(stringr)
df <- df %>%
mutate(col = str_remove(col, "^77"))
data
df <- data.frame(col= c("77111","77039","5005","4032"))
Another...
df <- df %>%
mutate(new_numbers = gsub('^77', '', original_column))
For an approach in base R, just use gsub:
df$new <- gsub(pattern = "^77",
replacement = "",
string = df[,1])

group by a id and concatenate where matches into a new features [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 4 years ago.
sample_data <- data.frame(id = c("123abc", "def456", "789ghi", "123abc"),
some_str = c("carrots", "bananas", "apples", "cabbage"))
I would like to know how to wrangle sample df to be like this:
desired_df <- data.frame(id = c("123abc", "def456", "789ghi"),
some_str_concat = c("carrots, cabbage", "bananas", "apples"))
Each id may appear multiple times. In that case I would like to get the corresponding value from some_str and concatenate into a new feature, where the new df is grouped on id.
In the example above, id 123abc appears twice. First with a value of "carrots" and then again with a value of "apples". Thus, the desired data frame has a single row for abc123 with the value "carrots, cabbage".
How can I do this? Ideally within either base r or dplyr.
sample_data %>%
+ group_by(id) %>%
+ mutate(some_str = paste(some_str, collapse = ", ")) %>%
+ distinct()

Compress rows of data by summing values, if they have the same ID [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 years ago.
What my data usually looks like when the AS 400 produced schedules
df <- data.frame(
Date = c(rep("Dec 4", 10)),
Line = c(rep(1,7),rep(2,3)),
Style = c(rep(24510,7),rep(18605,3)),
Qty = c(1,1,3,1,2,1,1,2,1,3))
This is what I want my data to look like. If you notice, the rows with style number 24510, have no been compressed to one row, with a quantity of 10. Before there were 7 individual rows with different quantities.
df_goal <- data.frame(
Date_goal = c(rep("Dec 4", 2)),
Line_goal = c(1,2),
Style_goal = c(24510,18605),
Qty_goal = c(10,6))
Pretty easy with dplyr
library(dplyr)
df_goal<-df %>%
group_by(Date,Line, Style ) %>%
summarize(Qty=sum(Qty)) %>%
rename(Date_Goal =Date, Line_Goal=Line, Style_Goal=Style, Qty_Goal= Qty)
If you just want a total-count, this one is the simplest:
plyr::count(df, vars=c('Date', 'Line', 'Style'), wt_var = 'Qty')
If you don't have the plyr-package yet, run install.packages('plyr') first.

Reshaping a data frame from wide to long in R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 7 years ago.
I have the following data frame with temperature and pressure data from 3 sensors:
df <- data.frame(
Test = 1:10,
temperature_sensor1=rnorm(10,25,5),
temperature_sensor2 = rnorm(10,25,5),
temperature_sensor1 = rnorm(10,25,5),
pressure_sensor1 = rnorm(10,10,2),
pressure_sensor2 = rnorm(10,10,2),
pressure_sensor3 = rnorm(10,10,2))
How can I reshape it into the long format, such that each row has temperature and pressure data for a single sensor
Test Sensor Temperature Pressure
Thanks!
Here are a couple of approaches:
1) dplyr/tidyr Convert df to long form using gather and then separate the generated variable column by underscore into two columns. Finally convert from long to wide based on the variable column (which contains the strings pressure and temperature and value column (which contains the number):
library(dplyr)
library(tidyr)
df %>%
gather("variable", "value", -Test) %>%
separate(variable, c("variable", "sensor"), sep = "_") %>%
spread(variable, value)
2) Can use reshape. No packages needed. The line marked optional removes the row names. It could be omitted if that does not matter.
unames <- grep("_", names(df), value = TRUE)
varying <- split(unames, sub("_.*", "", unames))
sensors <- unique(sub(".*_", "", unames))
long <- reshape(df, dir = "long", varying = varying, v.names = names(varying),
times = sensors, timevar = "sensor")
rownames(long) <- NULL # optional
If df has fixed columns then we could simplify the above a bit by hard coding varying and sensors using these definitions in place of the more complex but general code above:
varying <- list(pressure = 2:4, temperature = 5:7)
sensors <- c("sensor1", "sensor2", "sensor3")
Note: To create df reproducibly we must set the seed first because random numbers were used so to be definite we created df like this. Also note that in the question temperature_sensor1 was used on two columns and we assumed that the second occurrence was intended to be temperature_sensor3.
set.seed(123)
df <- data.frame(
Test = 1:10,
temperature_sensor1=rnorm(10,25,5),
temperature_sensor2 = rnorm(10,25,5),
temperature_sensor3 = rnorm(10,25,5),
pressure_sensor1 = rnorm(10,10,2),
pressure_sensor2 = rnorm(10,10,2),
pressure_sensor3 = rnorm(10,10,2))

Resources