There is a vector and a data frame. The vector is a raw data that recorded the respondent's response from the survey (they had to choose one choicest among three) so it indicates from 1-3. The data frame is to organize the result from the first d.f. What I need is to drag the result from the vector for each Trial and indicate in the d.f with a new column 'chosen'.
I am going to create a new column 'chosen' in the d.f. For each trial, the chosen alternative from a respondent would have 1 in the 'chosen' column and 0 otherwise. I first need to find the chosen value that matches d.f$trial with the digit that comes right after the column name "conjoint_full_info" in the vector. After finding the value, I need to indicate with "1" in "chosen" column along with the corresponding alternative row. (By looking at the vector, in trial 1, the respondent chose alternative 1. So, indicate "1" in the chosen column along with column alternative=1 row. And the remaining 2 rows with "0") I am looking for a way to apply to every set below but I am not sure how to code this in an efficient way. Maybe using for loops? Sorry for the unclear explanations and thanks in advance!
These are how the two datasets
Vector
conjoint_full_info.1. conjoint_full_info.2. conjoint_full_info.3. conjoint_full_info.4.
1 2 2 2
d.f
Ind Trial alternative price privacy battery stars
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1
and this is what I want
d.f
Ind Trial alternative price privacy battery stars chosen
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2 0
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3 0
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1 0
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2 1
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3 0
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1 0
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2 1
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3 0
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2 1
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3 0
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1 0
your data:
Vector <- c(conjoint_full_info.1. = 1, conjoint_full_info.2. = 2, conjoint_full_info.3. = 2,
conjoint_full_info.4. = 2)
d.f <- structure(list(Ind = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "R_2Xb32PAT3WjGBnc", class = "factor"),
Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L),
alternative = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), price = c(2L, 3L, 1L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L,
3L), privacy = c(3L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 3L, 1L,
2L, 3L), battery = c(1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L,
1L, 2L, 3L), stars = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
2L, 3L, 1L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
We create a mapping vector:
mapVector = Vector
names(mapVector) = sapply(strsplit(names(Vector),"[.]"),"[[",2)
# now mapVector has names that match trial
mapVector
1 2 3 4
1 2 2 2
If we do mapVector[as.character(d.f$Trial)], we get the chosen alternative for each row:
head(cbind(d.f,mapVector[as.character(d.f$Trial)]))
Ind Trial alternative price privacy battery stars
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3
mapVector[as.character(d.f$Trial)]
1 1
2 1
3 1
4 2
5 2
6 2
So it's a matter of creating another column that checks whether this, agrees with the alternative column:
library(dplyr)
d.f %>%
mutate(chosen=as.numeric(alternative == mapVector[as.character(Trial)]))
Ind Trial alternative price privacy battery stars chosen
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2 0
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3 0
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1 0
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2 1
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3 0
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1 0
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2 1
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3 0
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2 0
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3 1
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1 0
Related
See table below: I want to assign 1 or 0 to a new_col but the sum of 1s per unique hhid column should not exceed the value of any element in the column "nets" as seen in the table below, assuming new_col doesn't exist
hhid nets new_col
1 1 3 1
1 1 3 1
1 1 3 1
1 1 3 0
1 2 2 1
1 2 2 1
1 2 2 0
1 3 2 1
1 3 2 1
1 3 2 0
1 3 2 0
I tried code below
df %>% group_by(hhid) %>% mutate(new_col = ifelse(summarise(across(new_col), sum)<= df$nets),1,0)
Try this:
Data:
df <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-11L))
hhid nets
1 1 3
2 1 3
3 1 3
4 1 3
5 2 2
6 2 2
7 2 2
8 3 2
9 3 2
10 3 2
11 3 2
Code:
df %>%
group_by(hhid) %>%
mutate(new_col = ifelse(row_number() <= nets,1,0))
Output:
# A tibble: 11 x 3
# Groups: hhid [3]
hhid nets new_col
<int> <int> <dbl>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 3 0
5 2 2 1
6 2 2 1
7 2 2 0
8 3 2 1
9 3 2 1
10 3 2 0
11 3 2 0
Same solution but using data.table instead of dplyr
dt <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA,
-11L), class = c("data.frame"))
library(data.table)
setDT(dt)
dt[, new_col := +(seq_len(.N) <= nets), by = hhid]
dt
hhid nets new_col
1: 1 3 1
2: 1 3 1
3: 1 3 1
4: 1 3 0
5: 2 2 1
6: 2 2 1
7: 2 2 0
8: 3 2 1
9: 3 2 1
10: 3 2 0
11: 3 2 0
I am working on conjoint analysis and trying to create a choice-task dataframe. So far, I created orthogonal dataframe using caEncodedDesign() in conjoint package and now trying to create a choice-task dataframe. I am struggling to find ways to add two additional rows under each row of design2 dataframe.
All the values in the first added row should be +1 of the original value and the second added row is +2 of the original values. what the value is 4, it has to become 1.
This is the orginal design2 d.f
> design2
price color privacy battery stars
17 2 3 2 1 1
21 3 1 3 1 1
34 1 3 1 2 1
60 3 2 1 3 1
64 1 1 2 3 1
82 1 1 1 1 2
131 2 2 3 2 2
153 3 3 2 3 2
171 3 3 1 1 3
175 1 2 2 1 3
201 3 1 2 2 3
218 2 1 1 3 3
241 1 3 3 3 3
I did the first row by hand, and I am looking for R code that could apply to the whole rows below.
>design2
price color privacy battery stars
17 2 3 2 1 1
3 1 3 2 2
1 2 1 3 3
21 3 1 3 1 1
34 1 3 1 2 1
60 3 2 1 3 1
64 1 1 2 3 1
82 1 1 1 1 2
131 2 2 3 2 2
153 3 3 2 3 2
171 3 3 1 1 3
175 1 2 2 1 3
201 3 1 2 2 3
218 2 1 1 3 3
241 1 3 3 3 3
Here's an attempt, based on duplicating rows, adding 0:2 to each column, and then replacing anything >= 4 by subtracting 3
design2 <- design2[rep(seq_len(nrow(design2)), each=3),]
design2 <- design2 + 0:2
sel <- design2 >= 4
design2[sel] <- (design2 - 3)[sel]
design2
# price color privacy battery stars
#17 2 3 2 1 1
#17.1 3 1 3 2 2
#17.2 1 2 1 3 3
#21 3 1 3 1 1
#21.1 1 2 1 2 2
#21.2 2 3 2 3 3
#34 1 3 1 2 1
#34.1 2 1 2 3 2
#34.2 3 2 3 1 3
# ..
We can use apply row-wise and for every value in the row include the missing values using setdiff
out_df <- do.call(rbind, apply(design2, 1, function(x)
data.frame(sapply(x, function(y) c(y, setdiff(1:3, y))))))
rownames(out_df) <- NULL
out_df
# price color privacy battery stars
#1 2 3 2 1 1
#2 1 1 1 2 2
#3 3 2 3 3 3
#4 3 1 3 1 1
#5 1 2 1 2 2
#6 2 3 2 3 3
#7 1 3 1 2 1
#8 2 1 2 1 2
#9 3 2 3 3 3
#.....
data
design2 <- structure(list(price = c(2L, 3L, 1L, 3L, 1L, 1L, 2L, 3L, 3L,
1L, 3L, 2L, 1L), color = c(3L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 3L,
2L, 1L, 1L, 3L), privacy = c(2L, 3L, 1L, 1L, 2L, 1L, 3L, 2L,
1L, 2L, 2L, 1L, 3L), battery = c(1L, 1L, 2L, 3L, 3L, 1L, 2L,
3L, 1L, 1L, 2L, 3L, 3L), stars = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L)), class = "data.frame", row.names = c("17",
"21", "34", "60", "64", "82", "131", "153", "171", "175", "201", "218", "241"))
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have a data frame.
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 4 2
1 1 5 2
1 2 1 1
1 2 2 1
1 2 3 2
2 1 1 1
2 1 2 1
2 1 3 2
2 1 4 2
for each person in each household I want to change some of index in column trip as below:
when loop is changed I want the trip index Strats from 1 agin.
output
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 1 2
1 1 2 2
1 2 1 1
1 2 2 1
1 2 1 2
2 1 1 1
2 1 2 1
2 1 1 2
2 1 2 2
We can use
library(dplyr)
df1 %>%
group_by(household, person, loop) %>%
mutate(trip = row_number())
# A tibble: 12 x 4
# Groups: household, person, loop [6]
# household person trip loop
# <int> <int> <int> <int>
# 1 1 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 1 1 1 2
# 5 1 1 2 2
# 6 1 2 1 1
# 7 1 2 2 1
# 8 1 2 1 2
# 9 2 1 1 1
#10 2 1 2 1
#11 2 1 1 2
#12 2 1 2 2
data
df1 <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), person = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L), trip = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L, 2L,
3L, 4L), loop = c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
2L)), class = "data.frame", row.names = c(NA, -12L))
Using data.table :
library(data.table)
df <- setDT(df) # Making sure your data is a data table
df[, trip := seq_len(.N), by = .(household, person, loop)]
Hoping to create the new variable X based on three existing variables: "SubID" "Day" and "Time". I used to have three sorting functions in excel to do this manually: first sort by the "SubID," and then sort by the "Day," and lastly sort by "Time." X should be from 1 to the largest number of rows for each SubID, based on the order of Day and Time.
SubID: assigned subject number
Day: each subject's day number (1,2,3...21)
Time: 1, 2, 3
X: the number of rows marked as the same SubID
SubID Day Time X
1 1 1 1
1 1 2 2
1 1 3 3
1 2 1 4
1 2 2 5
2 1 1 1
2 1 2 2
2 1 3 3
2 2 3 6
2 2 2 5
2 2 1 4
I have been doing this manually in excel and I am sure there must be a smarter way to do it in R, but I am new to R and don't know how. Thank you in advance!
May be with data.table package. You will have to install it in case you haven't already. I have commented the command.
# install.packages("data.table")
library(data.table)
we can generate your data in the following way.
df <- data.frame(SubId=sample(1:2,10,replace=TRUE),
Day=sample(1:2,10,replace=TRUE),
Time=sample(1:2,10,replace=TRUE))
Then convert the data.frame into data.table.
setDT(df)
##> df
## SubId Day Time
## 1: 1 2 1
## 2: 1 1 1
## 3: 1 1 2
## 4: 2 2 1
## 5: 2 1 1
## 6: 1 2 2
## 7: 1 2 1
## 8: 1 2 2
## 9: 2 1 1
## 10: 2 1 2
Finally we can order my SubId, Day ,Time. As the table is ordered as we wanted, we just have to number the rows from 1 to the number of observations in each SubId.
df[order(SubId,Day,Time),X:=1:.N,SubId]
##> df
## SubId Day Time X
## 1: 1 2 1 3
## 2: 1 1 1 1
## 3: 1 1 2 2
## 4: 2 2 1 4
## 5: 2 1 1 1
## 6: 1 2 2 5
## 7: 1 2 1 4
## 8: 1 2 2 6
## 9: 2 1 1 2
## 10: 2 1 2 3
May be this helps
library(dplyr)
df1 %>%
group_by(SubID) %>%
mutate(X1 = row_number(as.numeric(paste0(Day, Time))))
# A tibble: 11 x 5
# Groups: SubID [2]
# SubID Day Time X X1
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 1 1 2 2 2
# 3 1 1 3 3 3
# 4 1 2 1 4 4
# 5 1 2 2 5 5
# 6 2 1 1 1 1
# 7 2 1 2 2 2
# 8 2 1 3 3 3
# 9 2 2 3 6 6
#10 2 2 2 5 5
#11 2 2 1 4 4
Or using order
df1 %>%
group_by(SubID) %>%
mutate(X1 = order(Day, Time))
Or with data.table
library(data.table)
setDT(df1)[, X1 := order(Day, Time), by = SubID]
data
df1 <- structure(list(SubID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), Day = c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L),
Time = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 1L), X = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 6L, 5L, 4L)), class = "data.frame",
row.names = c(NA,
-11L))
I have a dataset which is in longformat in which Measurements (Time) are nested in Networkpartners (NP) which are nested in Persons (ID), here is an example of what it looks like (the real dataset has over thousands of rows):
ID NP Time Outcome
1 11 1 4
1 11 2 3
1 11 3 NA
1 12 1 2
1 12 2 3
1 12 3 3
2 21 1 2
2 21 2 NA
2 21 3 NA
2 22 1 4
2 22 2 4
2 22 3 4
Now I would like to create 3 new variables:
a) The Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) has Time 1
b) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 2
c) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 3
So I would like to create a dataset like this:
ID NP Time Outcome NP.T1 NP.T2 NP.T3
1 11 1 4 2 2 1
1 11 2 3 2 2 1
1 11 3 NA 2 2 1
1 12 1 2 2 2 1
1 12 2 3 2 2 1
1 12 3 3 2 2 1
2 21 1 2 2 1 1
2 21 2 NA 2 1 1
2 21 3 NA 2 1 1
2 22 1 4 2 1 1
2 22 2 4 2 1 1
2 22 3 4 2 1 1
I would really appreciate your help.
You can just create one variable rather than three. I am using ddply from plyr package for
that.
mydata<-structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), NP = c(11L, 11L, 11L, 12L, 12L, 12L, 21L, 21L, 21L,
22L, 22L, 22L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), Outcome = c(4L, 3L, NA, 2L, 3L, 3L, 2L, NA, NA,
4L, 4L, 4L)), .Names = c("ID", "NP", "Time", "Outcome"), class = "data.frame", row.names = c(NA,
-12L))
library(plyr)
mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]))
>mydata1
ID NP Time Outcome NP.T
1 1 11 1 4 2
2 1 12 1 2 2
3 1 11 2 3 2
4 1 12 2 3 2
5 1 11 3 NA 1
6 1 12 3 3 1
7 2 21 1 2 2
8 2 22 1 4 2
9 2 21 2 NA 1
10 2 22 2 4 1
11 2 21 3 NA 1
12 2 22 3 4 1
Updated: You can also use interaction to create the unique variable that combines ID and Time (comb)
mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]),comb=interaction(ID,Time))