I am using the Survey Package in R to do some initial descriptive stats for my dissertation. With the outputs for both the svymean and the barplot, I would like the value labels to be displayed for the variable-it would make the descriptive statistics much easier to interpret. Instead of the output labels of: F1RTRCC1, F1RTRCC2-I would like to see the value labels of "academic" and "occupational" to be displayed.
How do I go about making this happen?
I am including a minimal reproducible example with a small subset of my actual data:
#Calling Survey Package
library (survey)
#Using dput to display a subset of my actual data
MRE1 <- structure(list(ï..STU_ID = c(101101L, 101102L, 101104L, 101105L, 101106L, 101107L),
PSU = c(1L, 1L, 1L, 1L, 1L, 1L), STRAT_ID = c(101L,
101L, 101L, 101L, 101L, 101L), BYSCTRL = c(1L, 1L, 1L, 1L, 1L,
1L), G10COHRT = c(1L, 1L, 1L, 1L, 1L, 1L), F1RTRCC = c(2L, 1L,
4L, 2L, 2L, 4L), F1SEX = c(2L, 2L, 2L, 2L, 2L, 1L), F1RACE = c(5L,
2L, 7L, 3L, 4L, 4L), F1PARED = c(5L, 5L, 2L, 2L, 1L, 2L), F1SES2QU = c(2L,
4L, 1L, 1L, 1L, 1L), F1HIMATH = c(5L, 6L, 6L, 4L, 5L, 4L), F1RGPP2 = c(2L,
4L, 4L, 4L, 4L, 1L), F3ATTAINMENT = c(3L, 10L, 6L, 4L, 4L, 3L
), F3EDSTAT = c(5L, 5L, 5L, 2L, 2L, 5L)), .Names = c("ï..STU_ID",
"PSU", "STRAT_ID", "BYSCTRL", "G10COHRT", "F1RTRCC", "F1SEX",
"F1RACE", "F1PARED", "F1SES2QU", "F1HIMATH", "F1RGPP2", "F3ATTAINMENT",
"F3EDSTAT"), row.names = c(NA, 6L), class = "data.frame")
#Svymean of Variable F1RTRCC
#There is an error coming here: Error in UseMethod("svymean", design) :
#no applicable method for 'svymean' applied to an object of class "data.frame"-I
#think this is likely related to my dput subset of my data, as when I am using
#this function with my full dataset, this error does not display.
CC <- svymean(~F1RTRCC, MRE1, na.rm=T)
CC
#Barplot for F1RTRCC
barplot(CC)
With the outputs for both the svymean and the barplot, I would like the value labels to be displayed.
Instead of F1RTRCC1, F1RTRCC2-I would like to see the value labels of "academic" and "occupational" to be displayed.
How do I go about making this happen?
Related
I am using a dataset that has likert scale responses. I am attaching sample observations from the dataset below.
I am always get an error, can someone help me with this?
Thanks
att<-structure(list(att1_goodofall = c(3L, 3L, 1L, 3L, 3L, 3L, 3L,
2L, 3L, 3L), att2_pvtdisease = c(3L, 3L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L), att3_curedisease = c(3L, 1L, 3L, 2L, 2L, 1L, 2L,
3L, 3L, 3L), att4_timewaste = c(4L, 4L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L), att5_helpgenerations = c(3L, 3L, 3L, 3L, 2L, 3L, 3L,
2L, 3L, 3L)), row.names = c(NA, 10L), class = "data.frame")
#labelling the values
for(i in att) {
val_lab(att)<-make_labels("0 Strongly disagree
1 Disagree
2 Neither agree or disagree
3 Agree
4 Strongly agree")
}
#plot_likert function
plot_likert(at)
Error in freq[valid] <- counts :
NAs are not allowed in subscripted assignments
I'm pretty sure that this is a bug but couldn't tell whether it is a bug in expss or in sjPlot.
However, as workaround you could switch to e.g. sjlabelled::set_labels to label your dataset:
library(sjPlot)
library(expss)
att <- sjlabelled::set_labels(att, labels = make_labels("0 Strongly disagree
1 Disagree
2 Neither agree or disagree
3 Agree
4 Strongly agree"))
sjPlot::plot_likert(att, cat.neutral = 3)
After I applied the solution, I wanted to do the same for another set of variables. I applied the same solution but did not get the desired result.
know<-structure(list(know1_spendmoney = c(0L, 9L, 0L, 0L, 0L), know2_access = c(0L,
0L, 0L, 0L, 9L), know3_medicalrecords = c(0L, 0L, 1L, 1L, 1L),
know4_pharmacy = c(0L, 9L, 9L, 9L, 9L), know5_infoprivacy = c(1L,
1L, 1L, 1L, 1L), know6_sharing = c(0L, 1L, 1L, 9L, 9L), know7_police = c(0L,
9L, 9L, 0L, 1L), know8_infoselling = c(0L, 0L, 9L, 0L, 1L
), know9_insurancecompanies = c(0L, 1L, 1L, 1L, 9L), know10_riskofdiseaseinfo = c(1L,
1L, 9L, 9L, 9L)), row.names = c(NA, 5L), class = "data.frame")
know<-sjlabelled::set_labels(know, labels = make_labels("0 False
1 True
9 Don't know"))
plot_likert(know,
expand.grid = TRUE,
show.prc.sign = TRUE, coord.flip = TRUE, reverse.scale = T, values = "sum.outside")
Warning: number of items to replace is not a multiple of replacement length
Any idea how I can resolve this? I tried multiple options nothing seems to work
I am using R cld() function with emmeans, but the order of factor level in the output is different from what I set. Before calling cld(), the by.years output is also in the desired order (screenshot), but when I do cld(), the output is in the alphabetical order of Light - Moderate - No(screenshot). I also checked cld.years$Grazing.intensity, the levels are correct. Is there a way to specify the order of factor levels in the cld() output? Any help is appreciated.
# sample data
plants <- structure(list(Grazing.intensity = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 3L), .Label = c("Light-grazing", "Moderate-grazing", "No-grazing"), class = "factor"), Grazing.intensity1 = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 3L), .Label = c("LG", "MG", "NG"), class = "factor"), Years = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c("Dry-year", "Wet-year"), class = "factor"), Month = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 2L, 2L, 3L), .Label = c("Aug.", "Jul.", "Sept."), class = "factor"), Plots = c(1L, 3L, 8L, 6L, 9L, 7L, 2L, 2L, 10L, 10L, 7L, 7L, 9L, 4L, 2L), Species.richness = c(8L, 6L, 10L, 11L, 9L, 5L, 7L, 13L, 10L, 6L, 5L, 5L, 14L, 8L, 10L)), class = "data.frame", row.names = c(NA, -15L))
# set the order of factor levels
plants$Grazing.intensity <- factor(plants$Grazing.intensity, levels =
c('No-grazing','Light-grazing','Moderate-grazing'))
attach(plants)
lmer.mod <- lmer(Species.richness ~ Grazing.intensity*Years + (1|Month), data = plants)
by.years <- emmeans(lmer.mod, specs = ~ Grazing.intensity:Years, by = 'Years', type = "response")
# display cld
cld.years <- cld(by.years, Letters = letters)
This is my first time posting sample data in StackOverflow, so it may be wrong.. I used dput().
I solved the issue. The order changed because the levels are displayed in the increasing order of emmean. I set sort = FALSE, and the result was displayed in the default order. I should have read the documentations more thoroughly.
I am working with ggstatsplot to get visual representations of my statistical analyses.
I have numerous datasets, all very similar in make-up. Some work just fine, while others don't. data1 is a working example, and data2 doesn't work.
data1 <- structure(list(
treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L),
.Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"),
value = c(1.74501, 2.04001, 1.89501, 1.84001,
1.89501, 9.75001, 8.50001, 8.80001, 11.50001, 10.25001, 7.90001,
9.25001, 11.45001, 7.75001, 7.75001, 7.55001, 8.70001, 8.20001,
6.95001, 6.60001, 7.40001, 7.15001, 8.25001, 9.20001, 8.95001,
6.45001, 6.05001, 5.40001, 7.95001, 6.80001, 4.65001, 6.40001,
6.40001, 6.70001, 5.40001, 3.20001, 2.70001, 4.30001, 4.10001,
3.60001, 4.00001, 3.00001, 4.70001, 3.10001, 3.50001, 6.45001,
5.45001, 4.90001, 7.25001, 4.55001, 4.70001, 6.25001, 5.65001,
6.00001, 5.10001)),
row.names = c(NA, -55L), class = c("tbl_df", "tbl", "data.frame"))
data2 <- structure(list(
treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L),
.Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"),
value = c(1.00001, 1.00001, 1.00001, 1.00001, 1.00001, 6.77501,
5.68751, 5.99201, 8.24501, 7.01251, 4.79501, 5.99126, 8.26276,
5.35376, 5.38751, 4.60251, 5.38901, 4.85201, 4.44401, 5.20501,
6.20701, 5.77001, 4.05201, 3.65126, 3.02401, 4.68351, 3.90001,
2.56951, 3.70001, 3.61901, 3.96401, 2.93601, 1.53901, 1.40801,
2.05601, 2.08501, 1.89701, 1.79501, 1.50001, 2.09151, 1.53551,
1.57501, 3.88851, 3.09151, 2.75501, 4.40626, 2.42001, 2.60951,
3.83501, 3.37151, 3.70001, 2.92701)),
row.names = c(NA, -52L), class = c("tbl_df", "tbl", "data.frame"))
I call the most basic analysis for both datasets:
library(Rmpfr)
library(ggstatsplot)
ggstatsplot::ggbetweenstats(
data = data1,
x = treatment,
y = value,
messages = FALSE )
ggstatsplot::ggbetweenstats(
data = data2,
x = treatment,
y = value,
messages = FALSE )
For data1 I get this:
for data2 I get:
> Error in stats::optim(par = 1.1 * rep(lambda, 2), fn = function(x) { : non-finite value supplied by optim
At first I thought the issue might be a few zeros that I passed on in the negative control, but I first upped them by a tiny amount and then by 1 to make sure the range of the values is not an issue. The only discrepancy I can see is that I only have 7 instead of 10 measurements for treatmentA (level 3) in data2 but 10 in data1 (had to remove a few NAs due to sample failure). However, in both cases the negative control (level 1) only has 5 values, and I don't think that in this type of analysis there is an issue with different sample sizes between the groups.
It's a good idea to try basic plots out in these cases eg isolate the boxplots:
So comparing the two datasets:
boxplot(value ~ treatment, data=data1)
boxplot(value ~ treatment, data=data2)
data2 has a treatment with no variability ("negative_ctrl"), 0 SD. I'm guessing this function is doing some tests that require variation. You will need to read the documentation for the function to see if this is brought up but you can get views either by removing these treatments, or forcing a very small amount of variation eg
# run without negative_ctrl
ggstatsplot::ggbetweenstats(
data = data2[data2$treatment != "negative_ctrl",],
x = treatment,
y = value,
messages = FALSE )
# add some tiny fake variation to force it through (this is a hack)
data3 <- data2
data3[data3$treatment=="negative_ctrl",][1,][["value"]] <- 1.0001
ggstatsplot::ggbetweenstats(
data = data3,
x = treatment,
y = value,
messages = FALSE )
I am trying to make a PCA plot using ggplot and geom_point.
I would like to illustrate 3 factors (Diet, Time, Antibiotics).
I thought I could outline the points in black for one factor).
However this isn't showing the third factor (Time) for the Fill color.
Here is a subset of my data:
> dput(dat.pcx.annot.test)
structure(list(PC1 = c(25.296379160162, 1.4703101394886, 11.4138097811008,
1.41798772574591, 23.7253675969881, 15.5683516005535, -34.6012195481675,
-25.7129281491955, -2.97230018393742, 4.83421092719293, -0.0274189140249825,
23.227939504077, 15.2002258785889, -35.2243685702227, -34.2537374460037,
-7.6380794043063), PC2 = c(27.2678813936857, -9.88577494210313,
-6.19394322321806, -8.88953660465497, 33.6791127012231, -13.2912233546802,
7.77877968081575, 2.7371646557436, -8.41929538502921, -11.5151849519265,
-9.40733576034963, 32.3549860618533, -11.2170071727855, 10.0455709347794,
3.05679707335492, -6.66218028060621), Diet = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("RC",
"WD"), class = "factor"), Time = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("ZT14",
"ZT2"), class = "factor"), Antibiotics = structure(c(2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("Antibiotics ",
"None"), class = "factor")), row.names = c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 18L, 19L, 20L, 21L, 22L), class = "data.frame")
Here is the plotting command :
ggplot(dat.pcx.annot.test,aes(x=PC1,y=PC2,color=Diet,shape=Antibiotics,Fill=Time))+
geom_point(size=3,alpha=0.5)+
scale_color_manual(values = c("black","white") )
And the plot it produces:
I thought if I had both color and fill specified then they would both show.
I would like black outlines for Antibiotics, and Fill color for Time.
Right now Time is not represented.
Any help on how to simultaneously view the 3 factors.
Thanks
Yes I had a fill typo. And I finally figured out how to get the legends to correspond. Here is my final answer.
ggplot(dat.pcx.annot,aes(x=PC1,y=PC2,color=Diet,shape=Antibiotics,fill=Time))+
geom_point(size=3)+
scale_shape_manual(values = c(21, 22) )+
scale_color_manual(values = c("black","white") )+
scale_fill_manual(values=c("#EC9DAE","#AEDE94"))+
xlab(PC1var)+
ylab(PC2var)+
guides(fill=guide_legend(override.aes=list(shape=21)))+
guides(color=guide_legend(override.aes=list(shape=21)))
guides(fill=guide_legend(override.aes=list(shape=21,fill=c("#EC9DAE","#AEDE94"),color=c("black","white"))))
ggsave("cohort2_pca.pdf")
transport<- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), .Label = c("01.01.2001", "01.02.2001", "01.03.2001",
"01.04.2001", "01.05.2001", "01.06.2001", "01.07.2001", "01.08.2001",
"01.09.2001", "01.10.2001", "01.11.2001", "01.12.2001"), class = "factor"),
Market_82 = c(7000L, 7272L, 7668L, 7869L, 8057L, 8428L, 8587L,
8823L, 8922L, 9178L, 9306L, 9439L, 3725L, 4883L, 8186L, 7525L,
6335L, 4252L, 5642L, 1326L, 8605L, 3501L, 1944L, 7332L),
transport = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("plane", "train"), class = "factor")), .Names = c("date",
"Market_82", "transport"), class = "data.frame", row.names = c(NA,
-24L))
let's create seasonalplot for each group(plane and train) separately
library(forecast)
par(mfrow = c(2, 1))
lapply(split(transport['Market_82'], transport$transport), seasonplot(ts(transport,frequency=12)))
then i get error
Error in match.fun(FUN) :
'seasonplot(ts(transport, frequency = 12))' is not a function, character or symbol
How to get seasonlap plot for two groups?
lapply wants a function, without the arguments in brackets. If you want to pass additional arguments to your function, list them after the function, e.g. lapply(func, arg1, arg2).
Also, seasonplot(ts(transport,frequency=12)) would plot both, plane and train data into one plot.
Since in your example you also want to build a time series object using ts, it is best to code it in a function you define within lapply:
Try:
lapply(split(transport['Market_82'], transport$transport), function(x)seasonplot(ts(x, frequency=12)))
Edit
To distinguish which group is for which plot, you could iterate over the names:
data = split(transport['Market_82'], transport$transport)
par(mfrow = c(2, 1))
lapply(names(data), function(x)seasonplot(ts(data[[x]], frequency=12), main=x))