This question already has answers here:
How to reshape data from long to wide format
(14 answers)
dcast warning: ‘Aggregation function missing: defaulting to length’
(2 answers)
Closed 5 years ago.
I want to create several variables which count the occurrence times of var's value for each user.id. Here is an example:
user.id var
1 A
1 B
2 A
2 A
2 C
3 C
Expected result:
user.id var_A var_B var_C
1 1 1 0
2 2 0 1
3 0 0 1
We can do this with tidyverse
library(tidyverse)
df1 %>%
count(user.id, var) %>%
spread(var, n, fill = 0)
# A tibble: 3 x 4
# user.id A B C
#* <int> <dbl> <dbl> <dbl>
#1 1 1 1 0
#2 2 2 0 1
#3 3 0 0 1
Or a more efficient approach with data.table
library(data.table)
dcast(setDT(df1), user.id ~ var)
Related
This question already has an answer here:
Split a column into multiple binary dummy columns [duplicate]
(1 answer)
Closed 9 months ago.
I made the stupid mistake of enabling people to select multiple categories in a survey question.
Now the data column for this question looks something along the lines of this.
respondent
answer_openq
1
a
2
a,c
3
b
4
a,d
using the following line in r,
datanum <- data %>% mutate(dummy=1) %>%
spread(key=answer_openq,value=dummy, fill=0)
I get the following:
However, I want the dataset to transform into this:
respondent
a
b
c
d
1
1
0
0
0
2
1
0
1
0
3
0
1
0
0
4
1
0
0
1
Any help is appreciated (my thesis depends on it). Thanks :)
Try this:
library(dplyr)
library(tidyr)
df %>%
separate_rows(answer_openq, sep = ',') %>%
pivot_wider(names_from = answer_openq, values_from = answer_openq,
values_fn = function(x) 1, values_fill = 0)
# A tibble: 4 × 5
respondent a c b d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 2 1 1 0 0
3 3 0 0 1 0
4 4 1 0 0 1
This question already has an answer here:
R code to assign a sequence based off of multiple variables [duplicate]
(1 answer)
Closed 3 years ago.
I have following kind of data and i need output as the second data frame...
a <- c(1,1,1,1,2,2,2,2,2,2,2)
b <- c(1,1,1,2,3,3,3,3,4,5,6)
d <- c(1,2,3,4,1,2,3,4,5,6,7)
df <- as.data.frame(cbind(a,b,d))
output <- c(1,1,1,2,1,1,1,1,2,3,4)
df_output <- as.data.frame(cbind(df,output))
I have tried cumsum and I am not able to get the desired results. Please guide. Regards, Enthu.
based on column a value cahnges and if b is to be reset starting from one.
the condition is if b has same value it should start with 1.
Like in the 5th record, col b has value as 3. It should reset to 1 and if all the values if col b is same ( as the case from ro 6,6,7,8 is same , then it should be 1 and any change should increment by 1).
We can do a group by column 'a' and then create the new column with either match the unique values in 'b'
library(dplyr)
df2 <- df %>%
group_by(a) %>%
mutate(out = match(b, unique(b)))
df2
# A tibble: 11 x 4
# Groups: a [2]
# a b d out
# <dbl> <dbl> <dbl> <int>
# 1 1 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 1 2 4 2
# 5 2 3 1 1
# 6 2 3 2 1
# 7 2 3 3 1
# 8 2 3 4 1
# 9 2 4 5 2
#10 2 5 6 3
#11 2 6 7 4
Or another option is to coerce a factor variable to integer
df %>%
group_by(a) %>%
mutate(out = as.integer(factor(b)))
data
df <- data.frame(a, b, d)
This question already has answers here:
Dummify character column and find unique values [duplicate]
(7 answers)
Closed 3 years ago.
Having a data structure into the comma separated format:
dframe = data.frame(id=c(1,2,43,53), title=c("text1,color","color,text2","text2","text3,text2"))
To convert it as a Boolean vector with exist or not in every row like this expected output:
dframe = data.frame(id=c(1,2,43,53), text1=c(1,0,0,0), color=c(1,1,0,0), text2=c(0,1,1,1), text3=c(0,0,0,1))
We can use separate_rows and spread from tidyverse:
library(tidyverse)
dframe %>%
separate_rows(title, sep = ",") %>%
mutate(id2 = 1) %>%
spread(title, id2, fill = 0)
Output:
# A tibble: 4 x 5
# Groups: id [4]
id color text1 text2 text3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 0
2 2 1 0 1 0
3 43 0 0 1 0
4 53 0 0 1 1
This question already has answers here:
R: reshaping wide to long [duplicate]
(1 answer)
Using tidyr to combine multiple columns [duplicate]
(1 answer)
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I'm hoping to reshape a dataframe in R so that a set of columns read in with duplicated names, and then renamed as var, var.1, var.2, anothervar, anothervar.1, anothervar.2 etc. can be treated as independent observations. I would like the number appended to the variable name to be used as the observation so that I can melt my data.
For example,
dat <- data.frame(ID=1:3, var=c("A", "A", "B"),
anothervar=c(5,6,7),var.1=c(C,D,E),
anothervar.1 = c(1,2,3))
> dat
ID var anothervar var.1 anothervar.1
1 1 A 5 C 1
2 2 A 6 D 2
3 3 B 7 E 3
How can I reshape the data so it looks like the following:
ID obs var anothervar
1 1 A 5
1 2 C 1
2 1 A 6
2 2 D 2
3 1 B 7
3 2 E 3
Thank you for your help!
We can use melt from data.table that takes multiple patterns in the measure
library(data.table)
melt(setDT(dat), measure = patterns("^var", "anothervar"),
variable.name = "obs", value.name = c("var", "anothervar"))[order(ID)]
# ID obs var anothervar
#1: 1 1 A 5
#2: 1 2 C 1
#3: 2 1 A 6
#4: 2 2 D 2
#5: 3 1 B 7
#6: 3 2 E 3
As for a tidyverse solution, we can use unite with gather
dat %>%
unite("1", var, anothervar) %>%
unite("2", var.1, anothervar.1) %>%
gather(obs, value, -ID) %>%
separate(value, into = c("var", "anothervar"))
# ID obs var anothervar
#1 1 1 A 5
#2 2 1 A 6
#3 3 1 B 7
#4 1 2 C 1
#5 2 2 D 2
#6 3 2 E 3
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 5 years ago.
I have data, stored as a data.table dt or a data.frame df, with multiple observations per id-month combination. I want to store the row number in a variable, let's call it row.
I know how to do this in dplyr but want to learn how to do it in (pure) data.table. I assume it is a trivially easy operation, but I can't seem to find a solution that works.
Reprex:
library(dplyr)
library(data.table)
df <- data_frame(id = c(1, 1, 1, 2, 2, 2), month = c(1, 1, 2, 1, 1, 2))
dt <- data.table(df)
My dplyr solution gives the expected output:
df %>%
group_by(id, month) %>%
mutate(row = row_number(id))
# A tibble: 6 x 3
# Groups: id, month [4]
id month row
<dbl> <dbl> <int>
1 1 1 1
2 1 1 2
3 1 2 1
4 2 1 1
5 2 1 2
6 2 2 1
Doing similar operations on a data.table yields something different:
dt[, row := row_number(id), by = c("id", "month")]
id month row
1: 1 1 1
2: 1 1 1
3: 1 2 1
4: 2 1 1
5: 2 1 1
6: 2 2 1
Or:
dt[, row := .I, by = c("id", "month")]
id month row
1: 1 1 1
2: 1 1 2
3: 1 2 3
4: 2 1 4
5: 2 1 5
6: 2 2 6
I assume I understand why this happens (row_number(id) simply refers to the first row number of the first row of each group), but do not know how to get the expected result in pure data.table.
dt[, row := row.names(dt), by = c("id", "month")]
dt
id month row
1: 1 1 1
2: 1 1 2
3: 1 2 1
4: 2 1 1
5: 2 1 2
6: 2 2 1