This question already has answers here:
Generate a dummy-variable
(17 answers)
Closed 7 years ago.
A while ago, I asked a question about creating a categorical variable from mutually exclusive dummy variables. Now, it turns out I want to do the opposite.
How would one go about creating dummy variables in a long-form dataset from a single categorical variable (time)? e.g. the dataframe below...
id time
1 1
1 2
1 3
1 4
would become...
id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
I'm sure this is trivial (and please let me know if this question is a duplicate -- I'm not sure it is, but will happily remove if so). Thanks!
You can try the dummies library.
R Code:
# Creating the data frame
# id <- c(1,1,1,1)
# time <- c(1,2,3,4)
# data <- data.frame(id, time)
install.packages("dummies")
library(dummies)
data <- cbind(data, dummy(data$time))
Output:
id time data1 data2 data3 data4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
Further you can rename the newly added dummy variable headers to suit your needs
R Code:
# Rename column headers
colnames(data)[colnames(data)=="data1"] <- "time_dummy_1"
colnames(data)[colnames(data)=="data2"] <- "time_dummy_2"
colnames(data)[colnames(data)=="data3"] <- "time_dummy_3"
colnames(data)[colnames(data)=="data4"] <- "time_dummy_4"
Output:
id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
Hope this helps.
If your data is
id <- c(1,1,1,1)
time <- c(1,2,3,4)
df <- data.frame(id,time)
you can try
time <- as.character(time)
unique.time <- as.character(unique(df$time))
# Create a dichotomous dummy-variable for each time
x <- sapply(unique.time, function(x)as.numeric(df$time == x))
or
time.f = factor(time)
dummies = model.matrix(~time.f)
Related
This question already has an answer here:
Get a square matrix out of a non symetric data frame
(1 answer)
Closed 5 years ago.
I have a data frame looking like this:
ID cat1 cat2 cat3
1 cat1_A cat2_A cat3_A
2 cat1_B cat2_A cat3_B
3 cat1_B cat2_B cat3_A
I would now like to convert this to a kind of transposed table using all values in each column as new column names, and a 0/1 (presence/absence) call for the respective column name as new value:
ID cat1_A cat1_B cat2_A cat2_B cat3_A cat3_B
1 1 0 1 0 1 0
2 0 1 1 0 0 1
3 0 1 0 1 1 0
I hope it's clear what I'd like to do, not sure how to explain it in a better way. Any help would be greatly appreciated!
Thanks!
We can use mtabulate from qdapTools
res <- cbind(df1[1], mtabulate(as.data.frame(t(df1[-1]))))
row.names(res) <- NULL
res
# ID cat1_A cat2_A cat3_A cat1_B cat3_B cat2_B
#1 1 1 1 1 0 0 0
#2 2 0 1 0 1 1 0
#3 3 0 0 1 1 0 1
Suppose I have matrix D which consists of death counts per year by specific ages.
I want to fill this matrix with appropriate death counts that is stored in
vector Age, but the following code gives me wrong answer. How should I write the code without making a loop?
# Year and age grid for tables
Years=c(2007:2017)
Ages=c(60:70)
#Data.frame of deaths
D=data.frame(matrix(ncol=length(Years),nrow=length(Ages))); D[is.na(D)]=0
colnames(D)=Years
rownames(D)=Ages
Age=c(60,61,62,65,65,65,68,69,60)
year=2010
D[as.character(Age),as.character(year)]<-
D[as.character(Age),as.character(year)]+1
D[,'2010'] # 1 1 1 0 0 1 0 0 1 1 0
# Should be 2 1 1 0 0 3 0 0 1 1 0
You need to use table
AgeTable = table(Age)
D[names(AgeTable), as.character(year)] = AgeTable
D[,'2010']
[1] 2 1 1 0 0 3 0 0 1 1 0
This question already has answers here:
Generate a dummy-variable
(17 answers)
How do I make a dummy variable in R?
(3 answers)
Create new dummy variable columns from categorical variable
(8 answers)
How to create dummy variables?
(3 answers)
One-Hot Encoding in [R] | Categorical to Dummy Variables [duplicate]
(1 answer)
Closed 5 years ago.
I've survey results (categorical) stored in csv file with multiple responses within the same cell. I'd like to split it into separate column (dummy variables)
The data looks like
response <-c(1,2,3,123)
df <-data.frame(response)
I tried the code below
for(t in unique(df$response))
{df[paste("response",t,sep="")] <- ifelse(df$response==t,1,0)}
the result is here, but it created a new column for 123
head(df)
response response1 response2 response3 response123
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 123 0 0 0 1
I'd like the data to look as below
response response1 response2 response3
1 1 1 0 0
2 2 0 1 0
3 3 0 0 1
4 123 1 1 1
Appreciate your help and advice :)
We can do
df1 <- cbind(df, +(sapply(1:3, grepl, x = df$response)))
colnames(df1)[-1] <- paste0("response", colnames(df1)[-1])
df1
# response response1 response2 response3
#1 1 1 0 0
#2 2 0 1 0
#3 3 0 0 1
#4 123 1 1 1
Reference:
Transpose and create categorical values in R
Follow-up to this question. While both model.matrix and data.table work very well with values already in it, how can we use them to simulate a column?
Meaning, from data in the same data frame,
data <- read.table(header=T, text='
subject weight sex test
1 2 M control
2 3 F cond1
3 2 F cond2
4 4 M control
5 3 F control
6 2 F control
')
If I were to simulate the case statement with OR condition from SQL in R, how do I go about it? In SQL I would do:
case when ( sex = 'F' OR sex = 'M') AND CONTROL IS NOT NULL THEN 1 ELSE 0 AS F_M_CONTROL
case when (sex = 'F' OR sex = 'M') AND COND1 IS NOT NULL THEN 1 ELSE 0 AS F_M_COND1
bringing the output to:
subject weight control_F_M control_M condtrol_F cond1_F_M cond1_F cond1_M
1 2 0 1 0 0 0 0
2 3 0 0 1 0 0 0
3 2 0 0 0 0 1 0
4 4 0 1 0 0 0 0
5 3 1 0 0 0 0 0
6 2 1 0 0 0 0 0
Any idea how I can generate the "Control_F_M" and Cond1_F_M columns in R?
Thanks in advance,
Bee
Edit:
To generate the afore mentioned output, i'm using the data table & dcast as suggested before.
I can use If-Else if I knew all the values in the column: test. I apologize for not clarifying this earlier. The challenge ofcourse is that the column is dynamic and so I'm hoping to generate that many columns dynamically as an extension to the below or using a similar approach.
dcast(data, subject+weight~test+sex, fun=length, drop=c(TRUE,FALSE))
I d like to create a new variable that contains 1 and 0. A 1 represents agreement between the rater (both raters 1 or both raters 0) and a zero represents disagreement.
rater_A <- c(1,0,1,1,1,0,0,1,0,0)
rater_B <- c(1,1,0,0,1,1,0,1,0,0)
df <- cbind(rater_A, rater_B)
The new variable would be like the following vector I created manually:
df$agreement <- c(1,0,0,0,1,0,1,1,1,1)
Maybe there's a package or a function I don't know. Any help would be great.
You could create df as a data.frame (instead of using cbind) and use within and ifelse:
rater_A <- c(1,0,1,1,1,0,0,1,0,0)
rater_B <- c(1,1,0,0,1,1,0,1,0,0)
df <- data.frame(rater_A, rater_B)
##
df <- within(df,
agreement <- ifelse(
rater_A==rater_B,1,0))
##
> df
rater_A rater_B agreement
1 1 1 1
2 0 1 0
3 1 0 0
4 1 0 0
5 1 1 1
6 0 1 0
7 0 0 1
8 1 1 1
9 0 0 1
10 0 0 1