I have the following dataframe:
df1 <- data.frame(
date = c("14-Mar-20", "14-Mar-20", "14-Mar-20", "15-Mar-20", "15-Mar-20", "15-Mar-20"),
status = c("new", "progress", "completed", "new", "progress", "completed"),
count = c("1", "2", "3", "4", "5", "6"),
stringsAsFactors = FALSE
)
I want to reshape it into the following format:
How can I do so? I am trying to use "melt" function but I am unable to make any headway!
We can use pivot_wider from tidyr
library(dplyr)
library(tidyr)
df1 %>%
pivot_wider(names_from = status, values_from = count)
# A tibble: 2 x 4
# date new progress completed
# <chr> <chr> <chr> <chr>
#1 14-Mar-20 1 2 3
#2 15-Mar-20 4 5 6
dcast from data.table:
setDT(df1)
dcast(df1, date ~ status, value.var = 'count')
Here is a base R solution using reshape
res <- reshape(df1,direction = "wide",idvar = "date",timevar = "status")
> res
date count.new count.progress count.completed
1 14-Mar-20 1 2 3
4 15-Mar-20 4 5 6
Related
After years of using your advices to another users, here is my for now unsolvable issue...
I have a dataset with thousands of rows and hundreds of column, that have one column with a possible value in common. Here is a subset of my dataset :
ID <- c("A", "B", "C", "D", "E")
Dose <- c("1", "5", "3", "4", "5")
Value <- c("x1", "x2", "x3", "x2", "x3")
mat <- cbind(ID, Dose, Value)
What I want is to assign a unique value to the rows that have the "Value" column in common, like that :
ID <- c("A", "B", "C", "D", "E")
Dose <- c("1", "5", "3", "4", "5")
Value <- c("153254", "258634", "896411", "258634", "896411")
Code <- c("1", "2", "3", "2", "3")
mat <- cbind(ID, Dose, Value, Code)
Does anyone have an idea that could help me a little ?
Thanks !
We may use match here
library(dplyr)
mat %>%
mutate(Code = match(Value, unique(Value)))
-output
ID Dose Value Code
1 A 1 153254 1
2 B 5 258634 2
3 C 3 896411 3
4 D 4 258634 2
5 E 5 896411 3
data
mat <- data.frame(ID, Dose, Value)
You should consider using a data.frame:
mat <- data.frame(ID, Dose, Value)
Using dplyr you could create the desired output:
library(dplyr)
mat %>%
group_by(Value) %>%
mutate(Code = cur_group_id()) %>%
ungroup()
This returns
# A tibble: 5 x 4
ID Dose Value Code
<chr> <chr> <chr> <int>
1 A 1 153254 1
2 B 5 258634 2
3 C 3 896411 3
4 D 4 258634 2
5 E 5 896411 3
I have a dataset with ids and associated values:
df <- data.frame(id = c("1", "2", "3"), value = c("12", "20", "16"))
I have a lookup table that matches the id to another reference label ref:
lookup <- data.frame(id = c("1", "1", "1", "2", "2", "3", "3", "3", "3"), ref = c("a", "b", "c", "a", "d", "d", "e", "f", "a"))
Note that id to ref is a many-to-many match: the same id can be associated with multiple ref, and the same ref can be associated with multiple id.
I'm trying to split the value associated with the df$id column equally into the associated ref columns. The output dataset would look like:
output <- data.frame(ref = "a", "b", "c", "d", "e", f", value = "18", "4", "4", "14", "4", "4")
ref
value
a
18
b
4
c
4
d
14
e
4
f
4
I tried splitting this into four steps:
calling pivot_wider on lookup, turning rows with the same id value into columns (e.g., a, b, c.)
merging the two datasets based on id
dividing each df$value equally into a, b, c, etc. columns that are not empty
transposing the dataset and summing across the id columns.
I can't figure out how to make step (3) work, though, and I suspect there's a much easier approach.
A variation of #thelatemail's answer with base pipes.
merge(df, lookup) |> type.convert(as.is=TRUE) |>
transform(value=ave(value, id, FUN=\(x) x/length(x))) |>
with(aggregate(list(value=value), list(ref=ref), sum))
# ref value
# 1 a 18
# 2 b 4
# 3 c 4
# 4 d 14
# 5 e 4
# 6 f 4
Here's a potential logic. Merge value from df into lookup by id, divide value by number of matching rows, then group by ref and sum. Then take your pick of how you want to do it.
Base R
tmp <- merge(lookup, df, by="id", all.x=TRUE)
tmp$value <- ave(as.numeric(tmp$value), tmp$id, FUN=\(x) x/length(x) )
aggregate(value ~ ref, tmp, sum)
dplyr
library(dplyr)
lookup %>%
left_join(df, by="id") %>%
group_by(id) %>%
mutate(value = as.numeric(value) / n() ) %>%
group_by(ref) %>%
summarise(value = sum(value))
data.table
library(data.table)
setDT(df)
setDT(lookup)
lookup[df, on="id", value := as.numeric(value)/.N, by=.EACHI][
, .(value = sum(value)), by=ref]
# ref value
#1: a 18
#2: b 4
#3: c 4
#4: d 14
#5: e 4
#6: f 4
This may work
lookup %>%
left_join(lookup %>%
group_by(id) %>%
summarise(n = n()) %>%
left_join(dummy, by = "id") %>%
mutate(value = as.numeric(value)) %>%
mutate(repl = value/n) %>%
select(id, repl) ,
by = "id"
) %>% select(ref, repl) %>%
group_by(ref) %>% summarise(value = sum(repl))
ref value
<chr> <dbl>
1 a 18
2 b 4
3 c 4
4 d 14
5 e 4
6 f 4
If my data looks like this:
q2_3 q2_4 q2_5
<chr> <chr> <chr>
1 1A 2B 3C
2 4D 5E 6F
How can I delete only texts?
I want only numbers to remain!
You could remove all the characters which are not digits using \\D.
Using dplyr
library(dplyr)
df %>% mutate_all(~gsub('\\D', '', .))
# q2_3 q2_4 q2_5
#1 1 2 3
#2 4 5 6
Or in base R :
df[] <- lapply(df, function(x) gsub('\\D', '', x))
data
df <- structure(list(q2_3 = c("1A", "4D"), q2_4 = c("2B", "5E"), q2_5 = c("3C",
"6F")), class = "data.frame", row.names = c("1", "2"))
Also you can use parse_number() from readr package (which will extract first number from values):
library(readr)
data <- data.frame(q2_3 = c("1A", "4D"),
q2_4 = c("2B", "5E"),
q2_5 = c("3C", "6F"))
data[] <- lapply(data, parse_number)
results in
> print(data)
q2_3 q2_4 q2_5
1 1 2 3
2 4 5 6
Another option with mutate/across
library(dplyr)
library(stringr)
df1 %>%
mutate(across(everything(), str_remove_all, "\\D+"))
# q2_3 q2_4 q2_5
#1 1 2 3
#2 4 5 6
data
df1 <- structure(list(q2_3 = c("1A", "4D"), q2_4 = c("2B", "5E"), q2_5 = c("3C",
"6F")), class = "data.frame", row.names = c("1", "2"))
I have a data set of teachers as follows:
df <- data.frame(
teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
claim = c(
"beth",
'john',
'john',
'beth',
'summer',
'summer',
"hannah",
"hannah"
)
)
I would ideally like to spread my dataset like this:
Desired output.
Any ideas for how I can use either spread or pivot_wide to achieve this? The issue is that there are two grouping variables here (teacher and segment). Some teachers may have multiple of the same segment, but some teachers don't.
One option would be to create a sequence column grouped by 'teacher', 'seg', and then use pivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
group_by(teacher, seg) %>%
mutate(segN = c("", "double")[row_number()]) %>%
ungroup %>%
mutate(seg = str_c("seg", seg, segN)) %>%
select(-segN) %>%
pivot_wider(names_from = seg, values_from = claim)
# A tibble: 3 x 5
# teacher seg1 seg1double seg2 seg2double
# <fct> <fct> <fct> <fct> <fct>
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
It can be simplified with rowid from data.table
library(data.table)
df %>%
mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
pivot_wider(names_from = seg, values_from = claim)
#or use spread
# spread(seg, claim)
# teacher seg1 seg_double1 seg2 seg_double2
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
You can also use a base R way with the powerful reshape function and some minor data preparation
# find duplicate values
dups <- duplicated(df[, 1:2])
# assign new names to duplicates
df[dups, 2] <- paste0(df[dups, 2], "double")
# use base r reshape function that automatically builds suitable names
wide <- reshape(df, v.names = "claim", idvar = "teacher",
timevar = "seg", direction = "wide", sep = "")
# change varnames to the desired output
names(wide) <- gsub("claim", "seg", names(wide))
wide
I have a similar problem than the following, but the solution presented in the following link does not work for me:
tidyr spread does not aggregate data
I have a df in the following structure:
UndesiredIndex DesiredIndex DesiredRows Result
1 x1A x1 A 50,32
2 x1B x2 B 7,34
3 x2A x1 A 50,33
4 x2B x2 B 7,35
Using the code below:
dftest <- bd_teste %>%
select(-UndesiredIndex) %>%
spread(DesiredIndex, Result)
I expected the following result:
DesiredIndex A B
A 50,32 50,33
B 7,34 7,35
Although, I keep getting the following result:
DesiredIndex x1 x2
1 A 50.32 NA
2 B 7.34 NA
3 A NA 50.33
4 B NA 7.35
PS: Sometimes I force the column UndesiredIndex out with select(-UndesiredIndex), but I keep getting the following message:
Adding missing grouping variables: UndesiredIndex
Might be something easy to stack those rows, but I'm new to R and have been trying so hard to solve this but without success.
Thanks in advance!
We group by DesiredIndex, create a sequence column and then do the spread:
library(tidyverse)
df1 %>%
select(-UndesiredIndex) %>%
group_by(DesiredIndex) %>%
mutate(new = LETTERS[row_number()]) %>%
ungroup %>%
select(-DesiredIndex) %>%
spread(new, Result)
# A tibble: 2 x 3
# DesiredRows A B
# <chr> <chr> <chr>
#1 A 50,32 50,33
#2 B 7,34 7,35
Data
df1 <- structure(
list(
UndesiredIndex = c("x1A", "x1B", "x2A", "x2B"),
DesiredIndex = c("x1", "x2", "x1", "x2"),
DesiredRows = c("A", "B", "A", "B"),
Result = c("50,32", "7,34", "50,33", "7,35")
),
class = "data.frame",
row.names = c("1", "2", "3", "4")
)
Shorter, but more theoretically round-about.
Data
(Thanks to #akrun!)
df1 <- structure(
list(
UndesiredIndex = c("x1A", "x1B", "x2A", "x2B"),
DesiredIndex = c("x1", "x2", "x1", "x2"),
DesiredRows = c("A", "B", "A", "B"),
Result = c("50,32", "7,34", "50,33", "7,35")
),
class = "data.frame",
row.names = c("1", "2", "3", "4")
)
This is a great technique for concatenating rows.
df1 %>%
group_by(DesiredRows) %>%
summarise(Result = paste(Result, collapse = "|")) %>% #<Concatenate rows
separate(Result, into = c("A", "B"), sep = "\\|") #<Separate by '|'
#> # A tibble: 2 x 3
#> DesiredRows A B
#> <chr> <chr> <chr>
#> 1 A 50,32 50,33
#> 2 B 7,34 7,35
Created on 2018-08-06 by the reprex package (v0.2.0).