This question already has answers here:
Generate a dummy-variable
(17 answers)
How do I make a dummy variable in R?
(3 answers)
Create new dummy variable columns from categorical variable
(8 answers)
How to create dummy variables?
(3 answers)
One-Hot Encoding in [R] | Categorical to Dummy Variables [duplicate]
(1 answer)
Closed 5 years ago.
I've survey results (categorical) stored in csv file with multiple responses within the same cell. I'd like to split it into separate column (dummy variables)
The data looks like
response <-c(1,2,3,123)
df <-data.frame(response)
I tried the code below
for(t in unique(df$response))
{df[paste("response",t,sep="")] <- ifelse(df$response==t,1,0)}
the result is here, but it created a new column for 123
head(df)
response response1 response2 response3 response123
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 123 0 0 0 1
I'd like the data to look as below
response response1 response2 response3
1 1 1 0 0
2 2 0 1 0
3 3 0 0 1
4 123 1 1 1
Appreciate your help and advice :)
We can do
df1 <- cbind(df, +(sapply(1:3, grepl, x = df$response)))
colnames(df1)[-1] <- paste0("response", colnames(df1)[-1])
df1
# response response1 response2 response3
#1 1 1 0 0
#2 2 0 1 0
#3 3 0 0 1
#4 123 1 1 1
Related
This question already has answers here:
What is the right way to multiply data frame by vector?
(6 answers)
Most efficient way to multiply a data frame by a vector
(4 answers)
Closed 1 year ago.
I'd like to multiply the df with coefficients, here is some example data
set.seed(123)
df <- data.frame(var1=sample(0:1,10,TRUE),var2=sample(0:1,10,TRUE),var3=sample(0:1,10,TRUE) )
coef <- c(1,2,3)
df is the variables, the coefs are coeficients. I tried multiplying it by doing the following but i get this
> df*coef
var1 var2 var3
1 0 2 0
2 0 3 1
3 0 1 0
4 1 0 0
5 0 3 0
6 3 0 0
7 1 2 3
8 2 0 1
9 0 0 0
10 0 0 3
I would have expected column1 to be multiply with 1, column two multiply against value 2 etc.
Is there a way to do this multiplication? Any help greatly appreciated. Thanks
You could do:
df * coef[col(df)]
or eve
data.frame(t(t(df) * coef))
Try "df .* coef" to do a bitwise multiplication which multiplies each element by the corresponding element.
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 4 years ago.
id <- c('1','1','1','2','2','3')
name <- c('myfile_1','myfile_2','myfile_4','myfile_1','myfile_2','myfile_3')
count <- c(5,4,2,1,3,1)
input <- data.frame(id, name, count)
Having a dataframe as input as the previous one.
id name count
1 myfile_1 5
1 myfile_2 4
1 myfile_4 2
2 myfile_1 1
2 myfile_2 3
3 myfile_3 1
How is it possible to have a new dataframe like this:
id myfile_1 myfile_2 myfile_3 myfile_4
1 5 4 0 2
2 1 2 0 0
3 0 0 1 0
library(tidyverse);
input %>%
spread(name, count, fill = 0);
# id myfile_1 myfile_2 myfile_3 myfile_4
#1 1 5 4 0 2
#2 2 1 3 0 0
#3 3 0 0 1 0
More details (other than the link given in the duplicate flag) on long-to-wide conversion can be found here.
This question already has an answer here:
Get a square matrix out of a non symetric data frame
(1 answer)
Closed 5 years ago.
I have a data frame looking like this:
ID cat1 cat2 cat3
1 cat1_A cat2_A cat3_A
2 cat1_B cat2_A cat3_B
3 cat1_B cat2_B cat3_A
I would now like to convert this to a kind of transposed table using all values in each column as new column names, and a 0/1 (presence/absence) call for the respective column name as new value:
ID cat1_A cat1_B cat2_A cat2_B cat3_A cat3_B
1 1 0 1 0 1 0
2 0 1 1 0 0 1
3 0 1 0 1 1 0
I hope it's clear what I'd like to do, not sure how to explain it in a better way. Any help would be greatly appreciated!
Thanks!
We can use mtabulate from qdapTools
res <- cbind(df1[1], mtabulate(as.data.frame(t(df1[-1]))))
row.names(res) <- NULL
res
# ID cat1_A cat2_A cat3_A cat1_B cat3_B cat2_B
#1 1 1 1 1 0 0 0
#2 2 0 1 0 1 1 0
#3 3 0 0 1 1 0 1
This question already has an answer here:
R: apply simple function to specific columns by grouped variable
(1 answer)
Closed 5 years ago.
I'm trying to convert a dataset that has multiple observations per person over a period of time. For example, person 1 can be obese and not obese (just overweight) during this time. Here's an example from person 1:
ID Obese Overweight
1 NA NA
1 NA NA
1 0 1
1 1 0
1 0 0
2 NA 0
2 0 1
2 0 NA
I need to replace the values in each column to "1" if a 1 appears at all WITHIN THAT COLUMN, across a specified number of columns (there are 700+; e.g. c(5:749)) BY "ID". Ideally, the output would look like:
ID Obese Overweight
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
2 0 1
2 0 1
2 0 1
First I changed all the NAs to 0's; I then figured I could take the maximum along each column and replace (by ID), but can't find documentation on how to do this by group ("ID") AND a given set of columns (i.e. c(5:749)). Also I would not want to create new columns, but rather just replace values within columns already existing within the data frame.
I got it to work for a single variable, but couldn't translate this into a loop to go through a set of variables...
dat2 <- dat[, Obese:= max(Obese), by=ID]
Also I think a loop would take too long given the data size. Any other recommendations? Thanks in advance. Here's an example dataset:
dat <- as.data.frame(matrix(NA,18))
dat$id <- as.character(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3))
dat$ob1 <- as.character(c(NA,NA,0,1,0,NA,0,1,0,0,0,0,0,0,0,0,0,0))
dat$ob2 <- as.character(c(NA,NA,1,0,0,NA,0,0,1,0,0,0,0,1,0,0,0,0))
dat <- dat[,-1]
As far as the linked paged using "lapply", it doesn't seem to work in the case where all values are NA (or 0) for a given individual. In this scenario, it seems to "fill in" / impute with values from other columns (which never appeared in the column in the original dataset); this was clearly spotted when a binary variable was imputed/replaced with a continuous value. Any idea why this may be happening?
I think tapply is helpful for this case.
You can find the max for each id by
with(dat, tapply(ob1, id, max))
My solution is:
dat$ob1 <- as.numeric(dat$ob1)
dat$ob2 <- as.numeric(dat$ob2)
dat[is.na(dat)] <- 0
dat$ob1 <- with(dat,tapply(ob1,id,max)[id])
dat$ob2 <- with(dat,tapply(ob2,id,max)[id])
dat
id ob1 ob2
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 1 1
7 2 1 1
8 2 1 1
9 2 1 1
10 2 1 1
11 2 1 1
12 2 1 1
13 3 0 1
14 3 0 1
15 3 0 1
16 3 0 1
17 3 0 1
18 3 0 1
This question already has answers here:
Generate a dummy-variable
(17 answers)
Closed 7 years ago.
A while ago, I asked a question about creating a categorical variable from mutually exclusive dummy variables. Now, it turns out I want to do the opposite.
How would one go about creating dummy variables in a long-form dataset from a single categorical variable (time)? e.g. the dataframe below...
id time
1 1
1 2
1 3
1 4
would become...
id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
I'm sure this is trivial (and please let me know if this question is a duplicate -- I'm not sure it is, but will happily remove if so). Thanks!
You can try the dummies library.
R Code:
# Creating the data frame
# id <- c(1,1,1,1)
# time <- c(1,2,3,4)
# data <- data.frame(id, time)
install.packages("dummies")
library(dummies)
data <- cbind(data, dummy(data$time))
Output:
id time data1 data2 data3 data4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
Further you can rename the newly added dummy variable headers to suit your needs
R Code:
# Rename column headers
colnames(data)[colnames(data)=="data1"] <- "time_dummy_1"
colnames(data)[colnames(data)=="data2"] <- "time_dummy_2"
colnames(data)[colnames(data)=="data3"] <- "time_dummy_3"
colnames(data)[colnames(data)=="data4"] <- "time_dummy_4"
Output:
id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
1 1 1 0 0 0
1 2 0 1 0 0
1 3 0 0 1 0
1 4 0 0 0 1
Hope this helps.
If your data is
id <- c(1,1,1,1)
time <- c(1,2,3,4)
df <- data.frame(id,time)
you can try
time <- as.character(time)
unique.time <- as.character(unique(df$time))
# Create a dichotomous dummy-variable for each time
x <- sapply(unique.time, function(x)as.numeric(df$time == x))
or
time.f = factor(time)
dummies = model.matrix(~time.f)