Related
I have three regressions in one plot that I am trying to display the equation of each for. I've been working off of this question to try and do this. However, the filtering doesn't seem to do anything and it displays the same equation 3 times.
The end goal is to compare cpue in relation to veg, while controlling for location (block), and get the slopes/r^2 values for each of the three regression lines.
Data
cpue<- structure(list(lake = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), veg = c(254.8026498, 219.9422136, 450.9662078, 484.8605026,
407.1662151, 286.7015617, 351.6441798, 179.9959443, 340.4276843,
247.2907435, 502.4119071, 336.4259995, 349.1543197, 281.7493811,
201.8284859, 325.6380404, 288.3855723, 230.8755861, 214.8890894,
326.6376698, 214.7468224, 132.0511504, 335.2727641, 336.8727253,
143.8923225, 277.3053436, 302.7005649, 355.0332852, 307.5736711,
371.8407176, 168.7645221, 365.9156811, 349.205548, 273.8392697,
171.4513348, 197.1067049, 350.5833827, 202.9605797, 365.3415045,
413.2762633, 329.8539209, 377.1415341, 180.8524994, 217.4007852,
258.5909286, 146.7092479, 258.7440138, 393.2014549, 492.6719497,
208.5002392, 219.1466664, 182.1366352, 308.0534171, 317.6037795,
131.7534807, 324.0011761, 469.5861988, 237.4492916, 318.6897863,
47.94967582, 223.5382632, 386.2227607, 343.7657123, 493.6393726,
204.2960349, 294.4218332, 178.7555635, 454.0358039, 207.1363947,
364.6063223, 462.8508521, 292.8613255, 330.3893897, 209.1769838,
237.4264742, 427.8856667), cpue = c(32.63512612, 47.98168449,
33.26735173, 14.41435377, 30.94664495, 40.26817963, 41.26204388,
31.63227286, 36.97932408, 21.54620143, 34.27556883, 6.506644061,
32.24677471, 38.24536746, 30.95968644, 24.86408391, 31.15438304,
21.69779047, 39.86223079, 27.92263229, 23.55684281, 34.6157024,
42.06943746, 24.70597527, 28.36396188, 50.34591832, 55.06361184,
48.69468021, 26.00084784, 44.77320597, 14.56328001, 33.29291085,
21.55078237, 29.95980975, 40.61006429, 43.46931237, 26.26407484,
15.87009067, 39.47297313, 20.50811378, 35.66157343, 35.64563497,
44.47319537, 42.06574907, 40.16356125, 35.57462201, 32.10051291,
34.1254268, 34.21084448, 28.18410732, 32.11249307, 38.39890418,
31.24778375, 29.76951583, 41.52508487, 34.48914051, 28.30923803,
29.33886042, 37.57268795, 59.29849175, 28.9317113, 41.27342427,
38.44878019, 44.53768204, 44.48611219, 33.15553274, 34.48894561,
34.86722967, 31.92515626, 50.04825584, 53.67528105, 37.53150868,
33.16255301, 33.22374846, 28.28172263, 42.5795616), block = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1",
"2", "3"), class = "factor")), row.names = c(NA, -76L), class = "data.frame")
Code
# Make lm() with blocking variable----------
lm_eqn2 <- function(df2){
m2 <- lmer(cpue ~ veg + (1|block), cpue);
eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2, # Write CPUE = a+b, r^2 = x
list(a = format(unname(coef(m2)[1]), digits = 4), # define 'a'
b = format(unname(coef(m2)[2]), digits = 2), # define 'b'
r2 = format(summary(m)$r.squared, digits = 3))) # define 'r2'
as.character(as.expression(eq)); # declare expression as a character
}
ggplot(cpue, aes(x=veg, y=cpue, col=block))+
geom_point()+
geom_smooth(method="lm", show.legend=F, se=F)+
annotate("text", x=100, y=20, label= lm_eqn2(cpue %>% filter(block==1)), parse=T)+
annotate("text", x=200, y=30, label= lm_eqn2(cpue %>% filter(block==2)), parse=T)+
annotate("text", x=300, y=40, label= lm_eqn2(cpue %>% filter(block==3)), parse=T)
When I try to view the equation for each line with the following code:
lm_eqn2(cpue %>% filter(block==2))
it returns the same equation for each blocking number that I filter it by. This makes me think there's something wrong with the code that I made the model and the equation with? The only thing different (that I can tell) from the linked question is that my model has a blocking variable. Not sure if that would actually affect anything though.
Any help would be greatly appreciated.
You have a few problems here.
Firstly, it isn't good practice to use the same name for the dataframe and a vector within. It makes lines like lmer(cpue ~ veg + (1|block), cpue); and ggplot(cpue, aes(x=veg, y=cpue, col=block))+ confusing to many.
But also, using cpue here for the dataframe within your function, means that your function doesn't care what you are passing to it later. Such that m2 <- lmer(cpue ~ veg + (1|block), cpue); is the same every time - hence the same equation is being produced. cpue %>% filter(block==2) is ignored as an argument because df2 doesn't exist within your function. So you need something like this:
lm_eqn2 <- function(df2){
m2 <- lmer(cpue ~ veg + (1|block), df2); ## note the change to df2 here
eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2,
list(a = format(unname(coef(m2)[1]), digits = 4),
b = format(unname(coef(m2)[2]), digits = 2),
r2 = format(summary(m2)$r.squared, digits = 3)))
as.character(as.expression(eq2));
}
** also note that m and eq were not found (in your original code), so I changed them to m2 and eq2 respectively.
This gives the error:
Error: grouping factors must have > 1 sampled level
which makes sense, because you've fit block as a random intercept in your model code, yet you are filtering your data by the blocking factor. So there is only one "type" of blocking factor in each of the lines cpue %>% filter(block==1), cpue %>% filter(block==2), and cpue %>% filter(block==3). That means there is no information added to your regression when you use (1|block), since block is now a constant.
You might want to explain what you are hoping to do with this blocking factor. Some relevant posts: https://stats.stackexchange.com/q/4700/238878 and https://stats.stackexchange.com/q/31569/238878
I am plotting a distribution of angles with the rose.diag function from the circular library. Input data are degrees.
circ <- circular(data2$mean_ang)
summary(circ)
rose.diag(circ, pch = 16, cex = 1, axes = TRUE, shrink = 1, col=3, prop = 2,
bins=36, upper=TRUE, ticks=TRUE, units="degrees")
I post a sample of data:
structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("F", "Fc", "M"), class = "factor"), area = structure(c(2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("AA", "AM", "AR"
), class = "factor"), mean_ang = c(37.3785, 54.1439666666667,
58.26328, 26.0818202247191, 16.500981981982, 58.7045, 64.6254,
88.7488, 68.0051315789474, 50.7701449275362, 71.9524307692308,
29.7501111111111, 21.7672323943662, 14.6700987654321, 15.4569794238683,
12.7011125, 13.0968235294118, 28.6825, 12.7437857142857, 16.0024827586207,
21.531, 6.09045454545455, 7.09880503144654, 33.8071123595506,
40.1566071428571, 34.6079540983607, 18.1940236686391, 26.8186338028169,
27.2129230769231, 75.331826446281, 92.2394705882353, 38.6603613445378,
80.2414871794872, 68.7810454545454, 57.3119345238095, 99.8082886597938,
50.8413857142857, 16.519125, 52.6062, 79.46416875, 55.1253798882682,
41.7809574468085, 65.9881707317073, 56.9886991869919, 66.7067129186603,
81.9102918918919, 52.7566941747573, 75.806781512605, 52.1151818181818,
54.1975875, 65.5264748201439, 47.1095353535354, 47.7130379746835,
89.4254302325581, 32.5949724770642, 62.4567419354839, 65.1905301204819,
53.4842941176471, 9.7815641025641, 10.3269556650246, 36.6245238095238,
35.7347155963303, 40.1823980582524, 29.6765, 30.5416129032258,
20.003, 40.4984444444444, 82.9355, 35.5801836734694, 8.4906,
82.3376666666667, 68.5343045977012, 69.924, 76.5723333333333,
97.1923333333333, 32.8840909090909, 50.603, 31.5014230769231,
42.2313333333333, 27.7946888888889, 53.2960545454545, 48.8556814814815,
40.6237714285714, 67.7999126984127, 66.855390625, 99.226275862069,
102.765611111111, 53.9172142857143, 66.6297692307692, 73.7972580645161
)), row.names = c(NA, -90L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = c("sex", "area"), drop = TRUE, indices = list(
0:4, 5:12, 13:27, 28:57, 58:65, 66:89), group_sizes = c(5L,
8L, 15L, 30L, 8L, 24L), biggest_group_size = 30L, labels = structure(list(
sex = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("F",
"Fc", "M"), class = "factor"), area = structure(c(2L, 3L,
2L, 3L, 2L, 3L), .Label = c("AA", "AM", "AR"), class = "factor")), row.names = c(NA,
-6L), class = "data.frame", vars = c("sex", "area"), drop = TRUE))
Apparently, the diagram does not match the results found.
I turned all angles positive, previously..
I would like to build a diagram (if possible with ggplot) that shows the frequency density of the angles from 0 to 180 degrees.
The calculated maximum angle was 102.77 degrees.
In addition, I would like the colors to differ for the different sexes analyzed and areas.
Something like this:
Although you can do a polar chart in ggplot, you can't easily do a 180-degree polar plot. It is possible only with a bit of hacking. In this example, I have made a full polar plot, shifted it down the page, removed the gridlines, drawn in new gridlines, and created the "slices" from a stacked polar histogram.
The code isn't pretty, but the end result is quite nice.
library(ggplot2)
ggplot() +
geom_line(aes(x = c(0, 180), y = c(4, 4)), colour = "gray75") +
geom_line(aes(x = c(0, 180), y = c(8, 8)), colour = "gray75") +
geom_line(aes(x = c(0, 180), y = c(12, 12)), colour = "gray75") +
geom_line(aes(x = c(0, 180), y = c(16, 16)), colour = "gray75") +
geom_line(aes(x = c(0, 180), y = c(20, 20)), colour = "gray75") +
geom_vline(aes(xintercept = 0:6 * 30), colour = "gray75") +
geom_histogram(data = data2, aes(x = mean_ang, fill = sex),
position = "stack", colour = "black", binwidth = 15,
boundary = 0) +
coord_polar(start = 3 * pi / 2) +
scale_x_continuous(limits = c(0, 360), breaks = 0:6 * 30) +
scale_y_continuous(limits = c(0, 20)) +
theme_bw() +
theme(panel.border = element_blank(),
legend.margin = margin(unit(c(0, 5.5, 100, 5.5), "pt")),
axis.title.y = element_text(hjust = 0.75),
axis.title.x = element_text(vjust = 5),
plot.margin = margin(unit(c(50, 5.5, -100, 5.5), "pt")),
panel.grid = element_blank()) +
labs(title = "Mean degrees by sex", y = "Count")
Created on 2020-05-12 by the reprex package (v0.3.0)
My colleague and I are trying to order a stacked bar graph based on the y-values instead of alphabetically by the x-values.
The sample data is:
library(ggplot2)
samp.data <- structure(list(fullname = c("LJ", "PR",
"JB", "AA", "NS",
"MJ", "FT", "DA", "DR",
"AB", "BA", "RJ", "BA2",
"AR", "GG", "RA", "DK",
"DA2", "BJ2", "BK", "HN",
"WA2", "AE2", "JJ2"), I = c(2L,
1L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), S = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 3L, 3L,
3L), D = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 3L, 3L), C = c(0L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
2L, 3L, 3L, 3L, 3L)), .Names = c("fullname", "I", "S", "D", "C"
), class = "data.frame", row.names = c(NA, 24L))
md <- reshape2::melt(samp.data, id = (c("fullname")))
ggplot(data = md, aes(x = fullname, y = value, fill = variable)) +
geom_col()
But I ultimately want to sort by the sum of the 4 variables (I, S, D, and C) instead of the alphabetical order of the fullnames.
The general (non ggplot-specific) answer is to use reorder() to reset the factor levels in a categorical column, based on some function of the other columns.
## Examine the default factor order
levels(samp.data$fullname)
## Reorder fullname based on the the sum of the other columns
samp.data$fullname <- reorder(samp.data$fullname, rowSums(samp.data[-1]))
## Examine the new factor order
levels(samp.data$fullname)
attributes(samp.data$fullname)
Then just replot, using code from the original question
md <- melt(samp.data, id=(c("fullname")))
temp.plot<-ggplot(data=md, aes(x=fullname, y=value, fill=variable) ) +
geom_bar()+
theme(axis.text.x=theme_text(angle=90)) +
labs(title = "Score Distribtion")
## ggsave(temp.plot,filename="test.png")
A much simpler solution is to change the underlying function in reorder:
ggplot(data = md, aes(x = reorder(fullname, value, sum), y = value, fill = variable)) +
geom_col()
My colleague and I are trying to order a stacked bar graph based on the y-values instead of alphabetically by the x-values.
The sample data is:
library(ggplot2)
samp.data <- structure(list(fullname = c("LJ", "PR",
"JB", "AA", "NS",
"MJ", "FT", "DA", "DR",
"AB", "BA", "RJ", "BA2",
"AR", "GG", "RA", "DK",
"DA2", "BJ2", "BK", "HN",
"WA2", "AE2", "JJ2"), I = c(2L,
1L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), S = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 3L, 3L,
3L), D = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 3L, 3L), C = c(0L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
2L, 3L, 3L, 3L, 3L)), .Names = c("fullname", "I", "S", "D", "C"
), class = "data.frame", row.names = c(NA, 24L))
md <- reshape2::melt(samp.data, id = (c("fullname")))
ggplot(data = md, aes(x = fullname, y = value, fill = variable)) +
geom_col()
But I ultimately want to sort by the sum of the 4 variables (I, S, D, and C) instead of the alphabetical order of the fullnames.
The general (non ggplot-specific) answer is to use reorder() to reset the factor levels in a categorical column, based on some function of the other columns.
## Examine the default factor order
levels(samp.data$fullname)
## Reorder fullname based on the the sum of the other columns
samp.data$fullname <- reorder(samp.data$fullname, rowSums(samp.data[-1]))
## Examine the new factor order
levels(samp.data$fullname)
attributes(samp.data$fullname)
Then just replot, using code from the original question
md <- melt(samp.data, id=(c("fullname")))
temp.plot<-ggplot(data=md, aes(x=fullname, y=value, fill=variable) ) +
geom_bar()+
theme(axis.text.x=theme_text(angle=90)) +
labs(title = "Score Distribtion")
## ggsave(temp.plot,filename="test.png")
A much simpler solution is to change the underlying function in reorder:
ggplot(data = md, aes(x = reorder(fullname, value, sum), y = value, fill = variable)) +
geom_col()
I have a survey data set that I'm creating contingency tables for. Each column in the data frame is a question and generally speaking, the questions tend to group together. So to make life easy, I've been using lapply to loop through sections and return the contingency tables with the following code:
> out <- lapply(dat[,162:170], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)
> out
$r3a_1
x 1 2
Don't Know 1.9 1.4
No 14.2 4.9
Yes 83.9 93.7
$r3a_2
x 1 2
Don't Know 2.7 1.7
No 14.8 6.6
Yes 82.4 91.6
etc...
As you can see, I'm looping through columns 162:170 and creating a prop table that shows the different responses between groups 1 and 2.
However, I'd like to weight this data. So I'm using the survey package to create a simple weighted survey design object called dat_weight and using svytable() instead of table(). I can run the updated code on a single column manually:
> round(prop.table(svytable(~dat[,162] + dat$seg_2, dat_weight),2),3)*100
dat$seg_2
dat[, 162] 1 2
Don't Know 2.5 2.7
No 16.5 5.4
Yes 80.9 91.9
However, when I try to use lapply it doesn't work:
> out <- lapply(dat[,162:170], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)
Error in eval(expr, envir, enclos) : object 'x' not found
Clearly the anonymous function call and svytable aren't playing nicely together. I've tried creating a for loop which doesn't work either. I'm guessing this has something to do with scoping but I'm at a loss as to how to fix it.
Surely there has to be a way to loop through chunks of this survey and avoid having to create a unique line of code for each column. Any help would be greatly appreciated.
Edit to add some sample data:
> library("survey")
> dat <- structure(list(r3a_1 = structure(c(3L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Don't Know",
"No", "Yes"), class = "factor"), r3a_2 = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"),
r3a_3 = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("Don't Know",
"No", "Yes"), class = "factor"), r3a_4 = structure(c(3L,
2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L,
3L, 3L, 3L, 1L), .Label = c("Don't Know", "No", "Yes"), class = "factor"),
r3a_5 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L,
2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 1L), .Label = c("Don't Know",
"No", "Yes"), class = "factor"), r3a_6 = structure(c(3L,
3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 3L,
2L, 3L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"),
r3a_7 = structure(c(1L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("Don't Know",
"No", "Yes"), class = "factor"), r3a_8 = structure(c(3L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L,
2L, 2L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"),
r3a_9 = structure(c(1L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L), .Label = c("Don't Know",
"No", "Yes"), class = "factor"), weight = c(0.34, 0.34, 0.34,
0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.43,
0.43, 0.43, 0.34, 0.34, 0.34, 0.34, 0.34), seg_2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("r3a_1",
"r3a_2", "r3a_3", "r3a_4", "r3a_5", "r3a_6", "r3a_7", "r3a_8",
"r3a_9", "weight", "seg_2"), row.names = c(NA, 20L), class = "data.frame")
> dat_weight <- svydesign(ids = ~1, weights = ~weight, data = dat)
From there you can get the weighted and unweighted tables:
round(prop.table(table(dat[,1],dat$seg_2),2),3)*100 #unweighted
round(prop.table(svytable(~dat[,1] + dat$seg_2, dat_weight),2),3)*100 #weighted
However, this works:
lapply(dat[,1:9], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)
While this doesn't:
lapply(dat[,1:9], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)
Ok, well, it seems the svytable function is picky and will only look up data in the design object. It doesn't seem to look for x in the enclosing environment. So an alternative approach is to dynamically build the formula. So instead of passing in the columns of data themselves, we pass in names of columns form the data.frame. Then we plug those into the formula and then they are resolved by the design object which points to the original data.frame. Here's a bit of working code using your sample data
lapply(names(dat)[1:9], function(x) round(prop.table(
svytable(bquote(~.(as.name(x)) + seg_2), dat_weight),
2),3)*100)
So here we use bquote to build the formula. The .() allows us to plug in expressions and here we take the character value in x and convert it to a proper name object. Thus is goes from "r3a_9" to r3a_9.