creating additional rows in R - r

I am working on conjoint analysis and trying to create a choice-task dataframe. So far, I created orthogonal dataframe using caEncodedDesign() in conjoint package and now trying to create a choice-task dataframe. I am struggling to find ways to add two additional rows under each row of design2 dataframe.
All the values in the first added row should be +1 of the original value and the second added row is +2 of the original values. what the value is 4, it has to become 1.
This is the orginal design2 d.f
> design2
price color privacy battery stars
17 2 3 2 1 1
21 3 1 3 1 1
34 1 3 1 2 1
60 3 2 1 3 1
64 1 1 2 3 1
82 1 1 1 1 2
131 2 2 3 2 2
153 3 3 2 3 2
171 3 3 1 1 3
175 1 2 2 1 3
201 3 1 2 2 3
218 2 1 1 3 3
241 1 3 3 3 3
I did the first row by hand, and I am looking for R code that could apply to the whole rows below.
>design2
price color privacy battery stars
17 2 3 2 1 1
3 1 3 2 2
1 2 1 3 3
21 3 1 3 1 1
34 1 3 1 2 1
60 3 2 1 3 1
64 1 1 2 3 1
82 1 1 1 1 2
131 2 2 3 2 2
153 3 3 2 3 2
171 3 3 1 1 3
175 1 2 2 1 3
201 3 1 2 2 3
218 2 1 1 3 3
241 1 3 3 3 3

Here's an attempt, based on duplicating rows, adding 0:2 to each column, and then replacing anything >= 4 by subtracting 3
design2 <- design2[rep(seq_len(nrow(design2)), each=3),]
design2 <- design2 + 0:2
sel <- design2 >= 4
design2[sel] <- (design2 - 3)[sel]
design2
# price color privacy battery stars
#17 2 3 2 1 1
#17.1 3 1 3 2 2
#17.2 1 2 1 3 3
#21 3 1 3 1 1
#21.1 1 2 1 2 2
#21.2 2 3 2 3 3
#34 1 3 1 2 1
#34.1 2 1 2 3 2
#34.2 3 2 3 1 3
# ..

We can use apply row-wise and for every value in the row include the missing values using setdiff
out_df <- do.call(rbind, apply(design2, 1, function(x)
data.frame(sapply(x, function(y) c(y, setdiff(1:3, y))))))
rownames(out_df) <- NULL
out_df
# price color privacy battery stars
#1 2 3 2 1 1
#2 1 1 1 2 2
#3 3 2 3 3 3
#4 3 1 3 1 1
#5 1 2 1 2 2
#6 2 3 2 3 3
#7 1 3 1 2 1
#8 2 1 2 1 2
#9 3 2 3 3 3
#.....
data
design2 <- structure(list(price = c(2L, 3L, 1L, 3L, 1L, 1L, 2L, 3L, 3L,
1L, 3L, 2L, 1L), color = c(3L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 3L,
2L, 1L, 1L, 3L), privacy = c(2L, 3L, 1L, 1L, 2L, 1L, 3L, 2L,
1L, 2L, 2L, 1L, 3L), battery = c(1L, 1L, 2L, 3L, 3L, 1L, 2L,
3L, 1L, 1L, 2L, 3L, 3L), stars = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L)), class = "data.frame", row.names = c("17",
"21", "34", "60", "64", "82", "131", "153", "171", "175", "201", "218", "241"))

Related

Assigning 1 or 0 in new column based on similar ID BUT sum not to exceed value in another column in R

See table below: I want to assign 1 or 0 to a new_col but the sum of 1s per unique hhid column should not exceed the value of any element in the column "nets" as seen in the table below, assuming new_col doesn't exist
hhid nets new_col
1 1 3 1
1 1 3 1
1 1 3 1
1 1 3 0
1 2 2 1
1 2 2 1
1 2 2 0
1 3 2 1
1 3 2 1
1 3 2 0
1 3 2 0
I tried code below
df %>% group_by(hhid) %>% mutate(new_col = ifelse(summarise(across(new_col), sum)<= df$nets),1,0)
Try this:
Data:
df <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-11L))
hhid nets
1 1 3
2 1 3
3 1 3
4 1 3
5 2 2
6 2 2
7 2 2
8 3 2
9 3 2
10 3 2
11 3 2
Code:
df %>%
group_by(hhid) %>%
mutate(new_col = ifelse(row_number() <= nets,1,0))
Output:
# A tibble: 11 x 3
# Groups: hhid [3]
hhid nets new_col
<int> <int> <dbl>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 3 0
5 2 2 1
6 2 2 1
7 2 2 0
8 3 2 1
9 3 2 1
10 3 2 0
11 3 2 0
Same solution but using data.table instead of dplyr
dt <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA,
-11L), class = c("data.frame"))
library(data.table)
setDT(dt)
dt[, new_col := +(seq_len(.N) <= nets), by = hhid]
dt
hhid nets new_col
1: 1 3 1
2: 1 3 1
3: 1 3 1
4: 1 3 0
5: 2 2 1
6: 2 2 1
7: 2 2 0
8: 3 2 1
9: 3 2 1
10: 3 2 0
11: 3 2 0

R offset all the values in certain columns by one unit [duplicate]

This question already has answers here:
R - Subtract the same value from multiple columns
(3 answers)
Closed 1 year ago.
This is my dataset below. I am trying offset all the values in Col1, Col2, Col3, Col4 by 1
ID Col1 Col2 Col3 Col4
314 3 4 4 3
820 1 3 1 2
223 1 1 3 1
915 1 3 4 4
542 1 2 3 4
521 4 1 3 4
978 4 2 4 2
260 3 3 1 2
892 1 4 1 2
The final dataset should looks like this below
ID Col1 Col2 Col3 Col4
314 2 3 3 2
820 0 2 0 1
223 0 0 2 0
915 0 2 3 3
542 0 1 2 3
521 3 0 2 3
978 3 1 3 1
260 2 2 0 1
892 0 3 0 1
I know a few ways to do this but I am worried they may not give me accurate results. Any suggestions on this is much appreciated. Thanks in advance.
Arithmetic operations are vectorized. We can directly subtract from 1 from the numeric columns and assign it back
df1[-1] <- df1[-1] - 1
-output
> df1
ID Col1 Col2 Col3 Col4
1 314 2 3 3 2
2 820 0 2 0 1
3 223 0 0 2 0
4 915 0 2 3 3
5 542 0 1 2 3
6 521 3 0 2 3
7 978 3 1 3 1
8 260 2 2 0 1
9 892 0 3 0 1
data
df1 <- structure(list(ID = c(314L, 820L, 223L, 915L, 542L, 521L, 978L,
260L, 892L), Col1 = c(3L, 1L, 1L, 1L, 1L, 4L, 4L, 3L, 1L), Col2 = c(4L,
3L, 1L, 3L, 2L, 1L, 2L, 3L, 4L), Col3 = c(4L, 1L, 3L, 4L, 3L,
3L, 4L, 1L, 1L), Col4 = c(3L, 2L, 1L, 4L, 4L, 4L, 2L, 2L, 2L)),
class = "data.frame", row.names = c(NA,
-9L))

How to extract specific value from a vector?

There is a vector and a data frame. The vector is a raw data that recorded the respondent's response from the survey (they had to choose one choicest among three) so it indicates from 1-3. The data frame is to organize the result from the first d.f. What I need is to drag the result from the vector for each Trial and indicate in the d.f with a new column 'chosen'.
I am going to create a new column 'chosen' in the d.f. For each trial, the chosen alternative from a respondent would have 1 in the 'chosen' column and 0 otherwise. I first need to find the chosen value that matches d.f$trial with the digit that comes right after the column name "conjoint_full_info" in the vector. After finding the value, I need to indicate with "1" in "chosen" column along with the corresponding alternative row. (By looking at the vector, in trial 1, the respondent chose alternative 1. So, indicate "1" in the chosen column along with column alternative=1 row. And the remaining 2 rows with "0") I am looking for a way to apply to every set below but I am not sure how to code this in an efficient way. Maybe using for loops? Sorry for the unclear explanations and thanks in advance!
These are how the two datasets
Vector
conjoint_full_info.1. conjoint_full_info.2. conjoint_full_info.3. conjoint_full_info.4.
1 2 2 2
d.f
Ind Trial alternative price privacy battery stars
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1
and this is what I want
d.f
Ind Trial alternative price privacy battery stars chosen
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2 0
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3 0
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1 0
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2 1
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3 0
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1 0
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2 1
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3 0
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2 1
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3 0
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1 0
your data:
Vector <- c(conjoint_full_info.1. = 1, conjoint_full_info.2. = 2, conjoint_full_info.3. = 2,
conjoint_full_info.4. = 2)
d.f <- structure(list(Ind = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "R_2Xb32PAT3WjGBnc", class = "factor"),
Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L),
alternative = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), price = c(2L, 3L, 1L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L,
3L), privacy = c(3L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 3L, 1L,
2L, 3L), battery = c(1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L,
1L, 2L, 3L), stars = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
2L, 3L, 1L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
We create a mapping vector:
mapVector = Vector
names(mapVector) = sapply(strsplit(names(Vector),"[.]"),"[[",2)
# now mapVector has names that match trial
mapVector
1 2 3 4
1 2 2 2
If we do mapVector[as.character(d.f$Trial)], we get the chosen alternative for each row:
head(cbind(d.f,mapVector[as.character(d.f$Trial)]))
Ind Trial alternative price privacy battery stars
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3
mapVector[as.character(d.f$Trial)]
1 1
2 1
3 1
4 2
5 2
6 2
So it's a matter of creating another column that checks whether this, agrees with the alternative column:
library(dplyr)
d.f %>%
mutate(chosen=as.numeric(alternative == mapVector[as.character(Trial)]))
Ind Trial alternative price privacy battery stars chosen
1 R_2Xb32PAT3WjGBnc 1 1 2 3 1 1 1
2 R_2Xb32PAT3WjGBnc 1 2 3 1 2 2 0
3 R_2Xb32PAT3WjGBnc 1 3 1 2 3 3 0
4 R_2Xb32PAT3WjGBnc 2 1 1 2 2 1 0
5 R_2Xb32PAT3WjGBnc 2 2 2 3 3 2 1
6 R_2Xb32PAT3WjGBnc 2 3 3 1 1 3 0
7 R_2Xb32PAT3WjGBnc 3 1 3 1 3 1 0
8 R_2Xb32PAT3WjGBnc 3 2 1 2 1 2 1
9 R_2Xb32PAT3WjGBnc 3 3 2 3 2 3 0
10 R_2Xb32PAT3WjGBnc 4 1 1 1 1 2 0
11 R_2Xb32PAT3WjGBnc 4 2 2 2 2 3 1
12 R_2Xb32PAT3WjGBnc 4 3 3 3 3 1 0

selecting the last row of a group [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have a data frame.
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 4 2
1 1 5 2
1 2 1 1
1 2 2 1
1 2 3 2
2 1 1 1
2 1 2 1
2 1 3 2
2 1 4 2
for each person in each household I want to change some of index in column trip as below:
when loop is changed I want the trip index Strats from 1 agin.
output
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 1 2
1 1 2 2
1 2 1 1
1 2 2 1
1 2 1 2
2 1 1 1
2 1 2 1
2 1 1 2
2 1 2 2
We can use
library(dplyr)
df1 %>%
group_by(household, person, loop) %>%
mutate(trip = row_number())
# A tibble: 12 x 4
# Groups: household, person, loop [6]
# household person trip loop
# <int> <int> <int> <int>
# 1 1 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 1 1 1 2
# 5 1 1 2 2
# 6 1 2 1 1
# 7 1 2 2 1
# 8 1 2 1 2
# 9 2 1 1 1
#10 2 1 2 1
#11 2 1 1 2
#12 2 1 2 2
data
df1 <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), person = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L), trip = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L, 2L,
3L, 4L), loop = c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
2L)), class = "data.frame", row.names = c(NA, -12L))
Using data.table :
library(data.table)
df <- setDT(df) # Making sure your data is a data table
df[, trip := seq_len(.N), by = .(household, person, loop)]

Creating a new columns from a data.frame

I have a dataset which is in longformat in which Measurements (Time) are nested in Networkpartners (NP) which are nested in Persons (ID), here is an example of what it looks like (the real dataset has over thousands of rows):
ID NP Time Outcome
1 11 1 4
1 11 2 3
1 11 3 NA
1 12 1 2
1 12 2 3
1 12 3 3
2 21 1 2
2 21 2 NA
2 21 3 NA
2 22 1 4
2 22 2 4
2 22 3 4
Now I would like to create 3 new variables:
a) The Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) has Time 1
b) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 2
c) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 3
So I would like to create a dataset like this:
ID NP Time Outcome NP.T1 NP.T2 NP.T3
1 11 1 4 2 2 1
1 11 2 3 2 2 1
1 11 3 NA 2 2 1
1 12 1 2 2 2 1
1 12 2 3 2 2 1
1 12 3 3 2 2 1
2 21 1 2 2 1 1
2 21 2 NA 2 1 1
2 21 3 NA 2 1 1
2 22 1 4 2 1 1
2 22 2 4 2 1 1
2 22 3 4 2 1 1
I would really appreciate your help.
You can just create one variable rather than three. I am using ddply from plyr package for
that.
mydata<-structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), NP = c(11L, 11L, 11L, 12L, 12L, 12L, 21L, 21L, 21L,
22L, 22L, 22L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), Outcome = c(4L, 3L, NA, 2L, 3L, 3L, 2L, NA, NA,
4L, 4L, 4L)), .Names = c("ID", "NP", "Time", "Outcome"), class = "data.frame", row.names = c(NA,
-12L))
library(plyr)
mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]))
>mydata1
ID NP Time Outcome NP.T
1 1 11 1 4 2
2 1 12 1 2 2
3 1 11 2 3 2
4 1 12 2 3 2
5 1 11 3 NA 1
6 1 12 3 3 1
7 2 21 1 2 2
8 2 22 1 4 2
9 2 21 2 NA 1
10 2 22 2 4 1
11 2 21 3 NA 1
12 2 22 3 4 1
Updated: You can also use interaction to create the unique variable that combines ID and Time (comb)
mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]),comb=interaction(ID,Time))

Resources