I am doing some computation as a part of a scientific research, and I stuck up in a problem. That has to do with data visualization.
I got a list of the sublists of a different length. Each of those sublists is a vector of a numeric values of the main variable for every single situation. The problem is this:
is there a way to display it in a 3D plotin the following way:
Let's say x-axis stands for one factor of experiment, y-axis stands for another factor of experiment, and z-axis is the axis the numerical values of our nnumeric variable. I need to display it in the way of vertical lines (that are parralel to z-axis). The number of those vertical lines is equal to the number of Factors combinations (the x-axis and y-axis). Here is the way it looked before with a smaller amount of values (when the lists were of the same size):
https://www.dropbox.com/s/wdcgihjcqzobsqs/sample0.jpeg
I would want to make it in the same layout, only with a bigger number of points. Each of thse sublists stands for one of those 6 situations of Factors combinations.
Or maybe there is a different way, a better way of 3D visualization of this kind of data.
And here is the list of sublists I need to make my visualization for (I do not know if this is relevant here):
`> temp
[[1]]
[1] 395 310 235 290 240 490 270 225 430 385 170 55 295 320 270 130 300 285 130 200 225 90 205
[24] 340
[[2]]
[1] 3 8
[[3]]
[1] 1 0 0 0 3 2 5 2 3 5 2 3
[[4]]
[1] 1 0 0 0 3 2 5 2 3 5 2 3
[[5]]
[1] 1 1 1 2 3 5 2 5 3 3 3 2 3 2 3
[[6]]
[1] 0 0 195 150 2 2 0 2 1 1 2 1 2 1 1 1 3 2 2 1 2 2 1
[24] 1 2 3 2 2 1 3 1 1
`
Any help/suggestions will be appreciated.
Here is an alternate visualization. Note that you don't have a 6D problem, it's really a 3D problem with 2 factor dimensions and one continuous one. There are 6 possible factor combinations. Note I had to make assumptions about what factor combination corresponds to what item in your list:
facs <- cbind(f1=rep(f1, length(f2)), f2=rep(f2, each=length(f1))) # create factor combos
lst <- list(c(395, 310, 235, 290, 240, 490, 270, 225, 430, 385, 170, 55, 295, 320, 270, 130, 300, 285, 130, 200, 225, 90, 205, 340 ), c(3, 8), c(1, 0, 0, 0, 3, 2, 5, 2, 3, 5, 2, 3), c(1, 0, 0, 0, 3, 2, 5, 2, 3, 5, 2, 3), c(1, 1, 1, 2, 3, 5, 2, 5, 3, 3, 3, 2, 3, 2, 3), c(0, 0, 195, 150, 2, 2, 0, 2, 1, 1, 2, 1, 2, 1, 1, 1, 3, 2, 2, 1, 2, 2, 1, 1, 2, 3, 2, 2, 1, 3, 1, 1))
library(data.table)
facs.dt <- as.data.table(facs)[,list(time=sort(lst[[.GRP]])), by=list(f1, f2)]
facs.dt[, id:=seq_along(time), by=list(f1, f2)]
library(ggplot2)
ggplot(facs.dt, aes(x=id, y=time)) +
geom_bar(stat="identity", position="dodge") +
scale_y_log10() + facet_grid(f1 ~ f2)
The resulting plot above displays, for each of the 6 factor combinations, the log all the time values. This makes it much easier to read the continuous variable than a 3D cube.
And an alternate view with free scales:
ggplot(facs.dt, aes(x=id, y=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ f1 + f2, scales="free") +
opts(axis.text.x=element_blank(), axis.ticks.x=element_blank())
Related
I have a data.frame (corresponding to a leaderboard) like this one:
structure(list(PJ = c(4, 4, 4, 4, 4, 4), V = c(4, 2, 2, 2, 1,
1), E = c(0, 0, 0, 0, 0, 0), D = c(0, 2, 2, 2, 3, 3), GF = c(182,
91, 92, 185, 126, 119), GC = c(84, 143, 144, 115, 141, 168),
Dif = c(98, -52, -52, 70, -15, -49), Pts = c(12, 6, 6, 6,
3, 3)), class = "data.frame", row.names = c("Player1", "Player2",
"Player3", "Player4", "Player5", "Player6"))
I would like to order the rows according to the number of points Pts. This can be done by df[order(df$Pts, decreasing=T),]. The issue appears when there is a tie between several players, then, what I want to do is to order the rows according to Dif.
How can this be done?
The order function which you are already using can take multiple arguments, each used sequentially to break ties in the previous one; see ?order
So you simply have to add Dif to you existing call:
df[order(df$Pts, df$Dif, decreasing=T),]
You can add further terms to break any remaining ties, e.g. Player2 and Player3 who have identical Pts and Dif.
If you want to specify which direction each argument should be ordered by (increasing or decreasing), you can either specify the decreasing argument as a vector, as in #r.user.05apr's comment, or my preferred lazy solution of adding - to any term that should be ordered in a decreasing direction
df[order(-df$Pts, df$Dif),]
(this will order by Pts decreasing and Dif increasing; it won't work if e.g. one of the ordering columns is character)
You can use sqldf or dplyr library
library (sqldf)
sqldf('select *
from "df"
order by "Pts" desc, "Dif" desc ')
Output
PJ V E D GF GC Dif Pts
1 4 4 0 0 182 84 98 12
2 4 2 0 2 185 115 70 6
3 4 2 0 2 91 143 -52 6
4 4 2 0 2 92 144 -52 6
5 4 1 0 3 126 141 -15 3
6 4 1 0 3 119 168 -49 3
I have a dataframe in wide format, and I want to subtract specific columns from different series of columns. Ideally I'd like the results to be in a new dataframe.
For example:
From this sample dataframe (dfOld), I would like columns A, B and C to each subtract D, and columns E, F and G to each subtract column H. In the real dataset, this keeps going and needs to be iterated.
image of dfOld as table
Sample Data:
dfOld <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), A = c(2, 3, 4,5,4,6,7,1,9,12), B = c(3, 4, 5,2,4,5,1,7,0,8), C = c(5, 6, 7,2,4,1,5,4,6,13), D = c(68, 7, 8,2,1,5,7,9,78,7), E = c(2, 3, 42,5,4,6,7,1,9,12), F = c(37, 4, 5,2,48,5,1,7,60,8), G = c(5, 6, 7,2,4,1,5,4,6,13), H = c(35, 7, 8,2,1,5,7,9,78,7))
The results would ideally be in a new dataframe, with columns that have values and names for A-D, B-D, C-D, E-H, F-H, G-H, and look like this:
image of dfNew as table
In Excel, the formula would be "=B2-$E2" dragged down the rows, and across 3 columns, and then repeated again for "F2-$I2" etc, using the "$" sign to lock the column
In R, I've only been able to do this manually, kind of like the answer previously posted for a similar question (Subtracting two columns to give a new column in R)
dfOld$A-D<-(dfOld$A-dfOld$D)
dfOld$B-D<-(dfOld$B-dfOld$D)
dfOld$C-D<-(dfOld$C-dfOld$D)
dfOld$E-H<-(dfOld$E-dfOld$H)
dfOld$F-H<-(dfOld$F-dfOld$H)
dfOld$G-H<-(dfOld$G-dfOld$H)
And then separated the new columns out into a new dataset.
However, this obviously isn't scalable for my much larger dataset, and I'd really like to learn how else to do this kind of operation that's so easy in Excel(although still not scalable for large datasets).
Part of the answer may already be here: Subtract a column in a dataframe from many columns in R
But this answer (an several other similar ones) changes the values in the same dataframe, and the columns keep the same names.
I haven't been able to adapt it so that the new values have new columns, with new names (and ideally in a new dataframe)
Another part of the answer may be here:
Iterative function to subtract columns from a specific column in a dataframe and have the values appear in a new column
These answers put the subtracted results in new columns with new names, but every column in this dataframe subtracts values of every other column (A,B,C,D,E,F,G,H each minus C). And I can't seem to adapt it so that it works over specific series of columns (A, B, C each minus D, then E, F, G each minus H, etc.)
Thanks in advance for your help.
Probably others have better ways - but here is one possibility.
load two libraries and set dfOld to data.table
library(data.table)
library(magrittr)
setDT(dfOld)
get information about the columns, and make into a list.
lv = names(dfOld)[-1][seq(1,ncol(dfOld)-1)%%4>0]
lv = split(lv, ceiling(seq_along(lv)/3))
names(lv) = names(dfOld)[-1][seq(1,ncol(dfOld)-1)%%4==0]
lv looks like this:
> lv
$D
[1] "A" "B" "C"
$H
[1] "E" "F" "G"
This is a bit convoluted, but basically, I'm taking each of the elements of the lv list, and I'm reshaping columns from dfOld, so I can do all subtractions at once. Then I'm retaining only the variables I need, and binding each of the resulting list of data.tables into a single datatable using rbindlist
res =rbindlist(lapply(names(lv), function(x) {
melt(dfOld,id=c("ID", x),measure.vars = lv[[x]]) %>%
.[,`:=`(nc=value-get(x),variable=paste0(variable,"-",x))] %>%
.[,.(ID,variable,nc)]
}))
Last step is simple - just dcast back
dcast(res,ID~variable, value.var="nc")
Output
ID A-D B-D C-D E-H F-H G-H
1: 1 -66 -65 -63 -33 2 -30
2: 2 -4 -3 -1 -4 -3 -1
3: 3 -4 -3 -1 34 -3 -1
4: 4 3 0 0 3 0 0
5: 5 3 3 3 3 47 3
6: 6 1 0 -4 1 0 -4
7: 7 0 -6 -2 0 -6 -2
8: 8 -8 -2 -5 -8 -2 -5
9: 9 -69 -78 -72 -69 -18 -72
10: 10 5 1 6 5 1 6
First, I create a function to do the simple calculation, where we have the dataframe, then the column names as the inputs. Then, I use purrr map2 to pass the function (which I replicate for the number of times needed, which in this case is 6). Then, I provide the list of parameters to apply that function for each column pair. Then, I use invoke to apply the function and parameter. Now, we are left with a list of dataframes (as the output is an individual column with the ID). Then, I use reduce` to combine them back into one dataframe, then update the column names.
library(tidyverse)
subtract <- function(x, a, b){
x %>%
mutate(!! a := !!rlang::parse_expr(a) - !!rlang::parse_expr(b)) %>%
dplyr::select(ID, which(colnames(x)==a))
}
col_names <- c("ID", "A-D", "B-D", "C-D", "E-H", "F-H", "G-H")
map2(
flatten(list(rep(list(
subtract
), 6))),
list(
expression(a = "A", b = "D"),
expression(a = "B", b = "D"),
expression(a = "C", b = "D"),
expression(a = "E", b = "H"),
expression(a = "F", b = "H"),
expression(a = "G", b = "H")
),
~ invoke(.x, c(list(dfOld), as.list(.y)))
) %>%
reduce(left_join, by = "ID") %>%
set_names(col_names)
Output
ID A-D B-D C-D E-H F-H G-H
1 1 -66 -65 -63 -33 2 -30
2 2 -4 -3 -1 -4 -3 -1
3 3 -4 -3 -1 34 -3 -1
4 4 3 0 0 3 0 0
5 5 3 3 3 3 47 3
6 6 1 0 -4 1 0 -4
7 7 0 -6 -2 0 -6 -2
8 8 -8 -2 -5 -8 -2 -5
9 9 -69 -78 -72 -69 -18 -72
10 10 5 1 6 5 1 6
Data
dfOld <- structure(
list(
ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
A = c(2,
3, 4, 5, 4, 6, 7, 1, 9, 12),
B = c(3, 4, 5, 2, 4, 5, 1, 7, 0,
8),
C = c(5, 6, 7, 2, 4, 1, 5, 4, 6, 13),
D = c(68, 7, 8, 2,
1, 5, 7, 9, 78, 7),
E = c(2, 3, 42, 5, 4, 6, 7, 1, 9, 12),
F = c(37,
4, 5, 2, 48, 5, 1, 7, 60, 8),
G = c(5, 6, 7, 2, 4, 1, 5, 4, 6,
13),
H = c(35, 7, 8, 2, 1, 5, 7, 9, 78, 7)
),
class = "data.frame",
row.names = c(NA,-10L)
)
I have educational data in R that looks like this:
df <- data.frame(
"StudentID" = c(101, 102, 103, 104, 105, 106, 111, 112, 113, 114, 115, 116, 121, 122, 123, 124, 125, 126),
"FedEthn" = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3),
"HIST.11.LEV" = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 5, 3, 3),
"HIST.11.SCORE" = c(96, 95, 95, 97, 88, 99, 89, 96, 79, 83, 72, 95, 96, 93, 97, 98, 96, 87),
"HIST.12.LEV" = c(2, 2, 1, 2, 1, 1, 2, 3, 2, 2, 2, 2, 4, 3, 3, 3, 3, 3),
"SCI.9.LEV" = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3),
"SCI.9.SCORE" = c(91, 99, 82, 95, 65, 83, 96, 97, 99, 94, 95, 96, 89, 78, 96, 95, 97, 90),
"SCI.10.LEV" = c(1, 2, 1, 2, 1, 1, 3, 3, 2, 2, 2, 3, 3, 3, 4, 3, 4, 3)
)
## StudentID FedEthn HIST.11.LEV HIST.11.SCORE HIST.12.LEV SCI.9.LEV SCI.9.SCORE SCI.10.LEV
## 1 101 1 1 96 2 1 91 1
## 2 102 1 1 95 2 1 99 2
## 3 103 2 1 95 1 1 82 1
## 4 104 2 1 97 2 1 95 2
## 5 105 3 1 88 1 1 65 1
## 6 106 3 1 99 1 1 83 1
## 7 111 1 2 89 2 2 96 3
## 8 112 1 2 96 3 2 97 3
## 9 113 2 2 79 2 2 99 2
## 10 114 2 2 83 2 2 94 2
## 11 115 3 2 72 2 2 95 2
## 12 116 3 2 95 2 2 96 3
## 13 121 1 3 96 4 3 89 3
## 14 122 1 3 93 3 3 78 3
## 15 123 2 3 97 3 3 96 4
## 16 124 2 3 98 3 3 95 3
## 17 125 3 3 96 3 3 97 4
## 18 126 3 3 87 3 3 90 3
HIST.11.LEV stands for the student's academic level in their 11th grade history course. (5 = highest academic level, 1 = lowest academic level. For example, 5 might be an AP or IB course.) HIST.11.SCORE indicates the student's score in the course.
When a student scores 95 or higher in a course, they're eligible to move up to a higher academic level in the following year (such that HIST.12.LEV = 1 + HIST.11.LEV). However, only some of these eligible students actually move up, and the teacher must agree to it. What I'm analyzing is whether these move-up rates for eligible students differ by reported federal ethnicity.
Here's how I'm achieving this so far:
var.level <- 1
var.ethn <- 1
actual.move.ups <-
(df %>% filter(FedEthn==var.ethn,
HIST.11.LEV==var.level,
HIST.11.SCORE>94,
HIST.12.LEV==var.level+1) %>%
count) +
(df %>% filter(FedEthn==var.ethn,
SCI.9.LEV==var.level,
SCI.9.SCORE>94,
SCI.10.LEV==var.level+1) %>%
count)
eligible.move.ups <-
(df %>% filter(FedEthn==var.ethn,
HIST.11.LEV==var.level,
HIST.11.SCORE>94) %>%
count) +
(df %>% filter(FedEthn==var.ethn,
SCI.9.LEV==var.level,
SCI.9.SCORE>94) %>%
count)
This works, and I could iterate var.level from 1:5 and var.ethnicity from 1:7 and store the results in a data frame. But in my actual data, this approach would require 15 iterations of df %>% filter(...) %>% count (and I'd sum them all). The reason is that, in my actual data, there are 15 opportunities to move up across 5 subjects (HIST, SCI, MATH, ENG, WL) and 4 grade levels (9, 10, 11, 12).
My question is whether there's a more compact way to filter and count all instances where COURSE.GRADE.LEV==i, COURSE.GRADE+1.LEV==i+1, and COURSE.GRADE.SCORE>94 without typing/hard-coding each course name (HIST, SCI, MATH, ENG, WL) and each grade level (9, 10, 11, 12). And, what's the best way to store the results in a data frame?
For my sample data above, here's the ideal output. The data frame doesn't need to have this exact structure, though.
## FedEthn L1.Actual L1.Eligible L2.Actual L2.Eligible L3.Actual L3.Eligible
## 1 1 3 3 3 3 1 1
## 2 2 2 3 0 1 1 3
## 3 3 0 1 1 3 1 2
*Note: I've read this helpful answer, but for my variable names, the grade level (9, 10, 11, 12) doesn't have a consistent string location (e.g., SCI.9 vs. HIST.11). Also, in some instances, I need to count a single row multiple times, since a single student could move up in multiple classes. Maybe the solution is to reshape the data from wide to long before performing the count?
Using this great answer from #akrun, I was able to come up with a solution. I think I'm still making it unnecessarily complicated, though, and I hope to accept someone else's more compact answer.
course.names <- c("HIST.","SCI.")
grade.levels <- 9:11
tally.actual <- function(var.ethn, var.level){
total.tally.actual <- NULL
for(i in course.names){
course.tally.actual <- NULL
for(j in grade.levels){
new.tally.actual <- df %>% filter(
FedEthn == var.ethn,
!!(rlang::sym(paste0(i,j,".LEV"))) == var.level,
!!(rlang::sym(paste0(i,(j+1),".LEV"))) == (var.level+1),
!!(rlang::sym(paste0(i,j,".SCORE"))) > 94
) %>% count
course.tally.actual <- c(new.tally.actual, course.tally.actual)
}
total.tally.actual <- c(total.tally.actual, course.tally.actual)
}
return(sum(unlist(total.tally.actual)))
}
tally.eligible <- function(var.ethn, var.level){
total.tally.eligible <- NULL
for(i in course.names){
course.tally.eligible <- NULL
for(j in grade.levels){
new.tally.eligible <- df %>% filter(
FedEthn == var.ethn,
!!(rlang::sym(paste0(i,j,".LEV"))) == var.level,
!!(rlang::sym(paste0(i,j,".SCORE"))) > 94
) %>% count
course.tally.eligible <- c(new.tally.eligible, course.tally.eligible)
}
total.tally.eligible <- c(total.tally.eligible, course.tally.eligible)
}
return(sum(unlist(total.tally.eligible)))
}
results <- data.frame("FedEthn" = 1:3,
"L1.Actual" = NA, "L1.Eligible" = NA,
"L2.Actual" = NA, "L2.Eligible" = NA,
"L3.Actual" = NA, "L3.Eligible" = NA)
for(var.ethn in 1:3){
for(var.level in 1:3){
results[var.ethn,(var.level*2)] <- tally.actual(var.ethn,var.level)
results[var.ethn,(var.level*2+1)] <- tally.eligible(var.ethn,var.level)
}
}
This approach works, but it requires df to contain every combination of course (SCI, MATH, HIST, ENG, WL) and year (9, 10, 11, 12). See below for how I added to the original df. Including all possible combinations isn't a problem for my actual data, but I'm hoping there's a solution that doesn't require adding a bunch of columns filled with NA:
df$HIST.9.LEV = NA
df$HIST.9.SCORE = NA
df$HIST.10.LEV = NA
df$HIST.10.SCORE = NA
df$HIST.12.SCORE = NA
df$SCI.10.SCORE = NA
df$SCI.11.LEV = NA
df$SCI.11.SCORE = NA
df$SCI.12.LEV = NA
df$SCI.12.SCORE = NA
I am reproducing some Stata code on R and I would like to perform a multinomial logistic regression with the mlogit function, from the package of the same name (I know that there is a multinom function in nnet but I don't want to use this one).
My problem is that, to use mlogit, I need my data to be formatted using mlogit.data and I can't figure out how to format it properly. Comparing my data to the data used in the examples in the documentation and in this question, I realize that it is not in the same form.
Indeed, the data I use is like:
df <- data.frame(ID = seq(1, 10),
type = c(2, 3, 4, 2, 1, 1, 4, 1, 3, 2),
age = c(28, 31, 12, 1, 49, 80, 36, 53, 22, 10),
dum1 = c(1, 0, 0, 0, 0, 1, 0, 1, 1, 0),
dum2 = c(1, 0, 1, 1, 0, 0, 1, 0, 1, 0))
ID type age dum1 dum2
1 1 2 28 1 1
2 2 3 31 0 0
3 3 4 12 0 1
4 4 2 1 0 1
5 5 1 49 0 0
6 6 1 80 1 0
7 7 4 36 0 1
8 8 1 53 1 0
9 9 3 22 1 1
10 10 2 10 0 0
whereas the data they use is like:
key altkey A B C D
1 201005131 1 2.6 118.17 117 0
2 201005131 2 1.4 117.11 115 0
3 201005131 3 1.1 117.38 122 1
4 201005131 4 24.6 NA 122 0
5 201005131 5 48.6 91.90 122 0
6 201005131 6 59.8 NA 122 0
7 201005132 1 20.2 118.23 113 0
8 201005132 2 2.5 123.67 120 1
9 201005132 3 7.4 116.30 120 0
10 201005132 4 2.8 118.86 120 0
11 201005132 5 6.9 124.72 120 0
12 201005132 6 2.5 123.81 120 0
As you can see, in their case, there is a column altkey that details every category for each key and there is also a column D showing which alternative is chosen by the person.
However, I only have one column (type) which shows the choice of the individual but does not show the other alternatives or the value of the other variables for each of these alternatives. When I try to apply mlogit, I have:
library(mlogit)
mlogit(type ~ age + dum1 + dum2, df)
Error in data.frame(lapply(index, function(x) x[drop = TRUE]), row.names = rownames(mydata)) :
row names supplied are of the wrong length
Therefore, how can I format my data so that it corresponds to the type of data mlogit requires?
Edit: following the advices of #edsandorf, I modified my dataframe and mlogit.data works but now all the other explanatory variables have the same value for each alternative. Should I set these variables at 0 in the rows where the chosen alternative is 0 or FALSE ? (in fact, can somebody show me the procedure from where I am to the results of the mlogit because I don't get where I'm wrong for the estimation?)
The data I show here (df) is not my true data. However, it is exactly the same form: a column with the choice of the alternative (type), columns with dummies and age, etc.
Here's the procedure I've made so far (I did not set the alternatives to 0):
# create a dataframe with all alternatives for each ID
qqch <- data.frame(ID = rep(df$ID, each = 4),
choice = rep(1:4, 10))
# merge both dataframes
df2 <- dplyr::left_join(qqch, df, by = "ID")
# change the values in stype by 1 or 0
for (i in 1:length(df2$ID)){
df2[i, "type"] <- ifelse(df2[i, "type"] == df2[i, "choice"], 1, 0)
}
# format for mlogit
df3 <- mlogit.data(df2, choice = "type", shape = "long", alt.var = "choice")
head(df3)
ID choice type age dum1 dum2
1.1 1 1 FALSE 28 1 1
1.2 1 2 TRUE 28 1 1
1.3 1 3 FALSE 28 1 1
1.4 1 4 FALSE 28 1 1
2.1 2 1 FALSE 31 0 0
2.2 2 2 FALSE 31 0 0
If I do :
mlogit(type ~ age + dum1 + dum2, df3)
I have the error:
Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number
Your data doesn't lend itself well to be estimated using an MNL model unless we make more assumptions. In general, since all your variables are individual specific and does not vary across alternatives (types), the model cannot be identified. All of your individual specific characteristics will drop out unless we treat them as alternative specific. By the sounds of it, each professional program carries meaning in an of itself. In that case, we could estimate the MNL model using constants only, where the constant captures everything about the program that makes an individual choose it.
library(mlogit)
df <- data.frame(ID = seq(1, 10),
type = c(2, 3, 4, 2, 1, 1, 4, 1, 3, 2),
age = c(28, 31, 12, 1, 49, 80, 36, 53, 22, 10),
dum1 = c(1, 0, 0, 0, 0, 1, 0, 1, 1, 0),
dum2 = c(1, 0, 1, 1, 0, 0, 1, 0, 1, 0))
Now, just to be on the safe side, I create dummy variables for each of the programs. type_1 refers to program 1, type_2 to program 2 etc.
qqch <- data.frame(ID = rep(df$ID, each = 4),
choice = rep(1:4, 10))
# merge both dataframes
df2 <- dplyr::left_join(qqch, df, by = "ID")
# change the values in stype by 1 or 0
for (i in 1:length(df2$ID)){
df2[i, "type"] <- ifelse(df2[i, "type"] == df2[i, "choice"], 1, 0)
}
# Add alternative specific variables (here only constants)
df2$type_1 <- ifelse(df2$choice == 1, 1, 0)
df2$type_2 <- ifelse(df2$choice == 2, 1, 0)
df2$type_3 <- ifelse(df2$choice == 3, 1, 0)
df2$type_4 <- ifelse(df2$choice == 4, 1, 0)
# format for mlogit
df3 <- mlogit.data(df2, choice = "type", shape = "long", alt.var = "choice")
head(df3)
Now we can run the model. I include the dummies for each of the alternatives keeping alternative 4 as my reference level. Only J-1 constants are identified, where J is the number of alternatives. In the second half of the formula (after the pipe operator), I make sure that I remove all alternative specific constants that the model would have created and I add your individual specific variables, treating them as alternative specific. Note that this only makes sense if your alternatives (programs) carry meaning and are not generic.
model <- mlogit(type ~ type_1 + type_2 + type_3 | -1 + age + dum1 + dum2,
reflevel = 4, data = df3)
summary(model)
I want to create several pie charts at once, I have a list of the names:
[1] 361 456 745 858 1294 1297 2360 2872 3034 5118 5189...
So the first pie chart should be labeled '361', and so on.
Then I have several lists with values for each pie chart
[1] 102 99 107 30 2 8 24 16 57 117 ...
[1] 1 1 2 1 0 0 0 1 1 2 ...
[1] 4 2 2 1 3 0 0 1 1 2 ...
So for '361', the first element is 102, the second is 1 and the third is 4. The total is 107.
I want to do all of the charts at once.
One way to get that is by setting par("mfrow"). I also adjusted the margins a bit to eliminate some unwanted whitespace around the charts.
par(mfrow=c(2,5), mar=rep(0, 4), oma=rep(0,4))
for(i in 1:length(names)) {
pie(df[i, ][df[i,] > 0], labels=(1:3)[df[i,] > 0])
title(names[i], line = -3) }
Data
## data
names = c(361, 456, 745, 858, 1294, 1297, 2360, 2872, 3034, 5118, 5189)
x = c(102, 99, 107, 30, 2, 8, 24, 16, 57, 117)
y = c(1, 1, 2, 1, 0, 0, 0, 1, 1, 2)
z = c(4, 2, 2, 1, 3, 0, 0, 1, 1, 2)
df = data.frame(x,y,z)