Using plot likert to plot likert scale data - r

I am using a dataset that has likert scale responses. I am attaching sample observations from the dataset below.
I am always get an error, can someone help me with this?
Thanks
att<-structure(list(att1_goodofall = c(3L, 3L, 1L, 3L, 3L, 3L, 3L,
2L, 3L, 3L), att2_pvtdisease = c(3L, 3L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L), att3_curedisease = c(3L, 1L, 3L, 2L, 2L, 1L, 2L,
3L, 3L, 3L), att4_timewaste = c(4L, 4L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L), att5_helpgenerations = c(3L, 3L, 3L, 3L, 2L, 3L, 3L,
2L, 3L, 3L)), row.names = c(NA, 10L), class = "data.frame")
#labelling the values
for(i in att) {
val_lab(att)<-make_labels("0 Strongly disagree
1 Disagree
2 Neither agree or disagree
3 Agree
4 Strongly agree")
}
#plot_likert function
plot_likert(at)
Error in freq[valid] <- counts :
NAs are not allowed in subscripted assignments

I'm pretty sure that this is a bug but couldn't tell whether it is a bug in expss or in sjPlot.
However, as workaround you could switch to e.g. sjlabelled::set_labels to label your dataset:
library(sjPlot)
library(expss)
att <- sjlabelled::set_labels(att, labels = make_labels("0 Strongly disagree
1 Disagree
2 Neither agree or disagree
3 Agree
4 Strongly agree"))
sjPlot::plot_likert(att, cat.neutral = 3)

After I applied the solution, I wanted to do the same for another set of variables. I applied the same solution but did not get the desired result.
know<-structure(list(know1_spendmoney = c(0L, 9L, 0L, 0L, 0L), know2_access = c(0L,
0L, 0L, 0L, 9L), know3_medicalrecords = c(0L, 0L, 1L, 1L, 1L),
know4_pharmacy = c(0L, 9L, 9L, 9L, 9L), know5_infoprivacy = c(1L,
1L, 1L, 1L, 1L), know6_sharing = c(0L, 1L, 1L, 9L, 9L), know7_police = c(0L,
9L, 9L, 0L, 1L), know8_infoselling = c(0L, 0L, 9L, 0L, 1L
), know9_insurancecompanies = c(0L, 1L, 1L, 1L, 9L), know10_riskofdiseaseinfo = c(1L,
1L, 9L, 9L, 9L)), row.names = c(NA, 5L), class = "data.frame")
know<-sjlabelled::set_labels(know, labels = make_labels("0 False
1 True
9 Don't know"))
plot_likert(know,
expand.grid = TRUE,
show.prc.sign = TRUE, coord.flip = TRUE, reverse.scale = T, values = "sum.outside")
Warning: number of items to replace is not a multiple of replacement length
Any idea how I can resolve this? I tried multiple options nothing seems to work

Related

cld() output has a wrong order of factor levels

I am using R cld() function with emmeans, but the order of factor level in the output is different from what I set. Before calling cld(), the by.years output is also in the desired order (screenshot), but when I do cld(), the output is in the alphabetical order of Light - Moderate - No(screenshot). I also checked cld.years$Grazing.intensity, the levels are correct. Is there a way to specify the order of factor levels in the cld() output? Any help is appreciated.
# sample data
plants <- structure(list(Grazing.intensity = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 3L), .Label = c("Light-grazing", "Moderate-grazing", "No-grazing"), class = "factor"), Grazing.intensity1 = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 3L), .Label = c("LG", "MG", "NG"), class = "factor"), Years = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c("Dry-year", "Wet-year"), class = "factor"), Month = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 2L, 2L, 3L), .Label = c("Aug.", "Jul.", "Sept."), class = "factor"), Plots = c(1L, 3L, 8L, 6L, 9L, 7L, 2L, 2L, 10L, 10L, 7L, 7L, 9L, 4L, 2L), Species.richness = c(8L, 6L, 10L, 11L, 9L, 5L, 7L, 13L, 10L, 6L, 5L, 5L, 14L, 8L, 10L)), class = "data.frame", row.names = c(NA, -15L))
# set the order of factor levels
plants$Grazing.intensity <- factor(plants$Grazing.intensity, levels =
c('No-grazing','Light-grazing','Moderate-grazing'))
attach(plants)
lmer.mod <- lmer(Species.richness ~ Grazing.intensity*Years + (1|Month), data = plants)
by.years <- emmeans(lmer.mod, specs = ~ Grazing.intensity:Years, by = 'Years', type = "response")
# display cld
cld.years <- cld(by.years, Letters = letters)
This is my first time posting sample data in StackOverflow, so it may be wrong.. I used dput().
I solved the issue. The order changed because the levels are displayed in the increasing order of emmean. I set sort = FALSE, and the result was displayed in the default order. I should have read the documentations more thoroughly.

How to plot learning curves for binary data?

I would like to plot simple learning curves. My data looks like this:
id trial type choice
1 1 A 0
1 2 A 1
2 1 B 1
2 2 B 0
structure(list(id = c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 6L, 6L, 6L, 6L, 6L), trial = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L), choice = c(0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L), type = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("A", "A3", "B"), class = "factor")), row.names = c(1L,
2L, 3L, 4L, 5L, 31L, 32L, 33L, 34L, 35L, 61L, 62L, 63L, 64L,
65L, 91L, 92L, 93L, 94L, 95L), class = "data.frame")
ID, Trial and Type are integers and Choice is a factor. I would like to plot the choice the different groups have made per trial. How I imagine the graph (a 1 in the vector choice is consider correct):
The smoothness of the curves is an exaggeration.
I would also like to know how can I do calculations by coupling groups. For example, sum all the choices of group A during trials 1 to 10.
Thank you for your help!
Basically you want to summarize your data first, then plot it. You can do this easily with dplyr and ggplot2 for example if your data is stored in a data.frame named dd
library(dplyr)
library(ggplot2)
dd %>%
group_by(type, trial) %>%
summarize(correct=mean(choice)) %>%
ggplot() +
geom_line(aes(trial, correct, color=type))
For each type and trial we calculate the mean value of choice to get the percent of people who answered correctly. Then we plot that value for each trial with a line that's colored by the type.

How do I reduce this data frame by groups?

I have the following
t <- structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice", "Bob",
"Jane Doe", "John Doe"), class = "factor"), school = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice School",
"Bob School", "Someother School", "Someschool College"), class = "factor"),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
question = structure(c(2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L,
4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L,
2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L), .Label = c("q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8"), class = "factor"), mark = c(0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
1L), subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("C", "M"), class = "factor")), .Names = c("name",
"school", "group", "question", "mark", "subject"), row.names = c(7L,
15L, 23L, 31L, 3L, 11L, 19L, 27L, 8L, 16L, 24L, 32L, 4L, 12L,
20L, 28L, 6L, 14L, 22L, 30L, 2L, 10L, 18L, 26L, 5L, 13L, 21L,
29L, 1L, 9L, 17L, 25L), class = "data.frame")
and I need to produce a data frame in which each student has one combined mark for each subject. The combination is simply a sum of the marks on each question. So, for example, Jane Doe will have 3 on subject C and 2 on subject M. I've been banging my head for long enough with Reduce and other approaches. I could possibly solve this in a very procedural way, but if I could do that with a one-liner (or close approximation), I'd be happier. I'm sure it can be done...
You said it in your question; you want to group_by student and subject and compute the sum
library(tidyverse)
asdf %>%
group_by(name, subject) %>%
summarise(score = sum(mark))
Here a data.table solution:
library(data.table)
setDT(t)[, sum(mark), by = list(name, subject)]
And just for completeness, base R:
aggregate(mark ~ name + subject, data=t, sum)
This says "aggregate the response variable mark by the grouping variables name and subject, using sum as the aggregation function".

How to display value labels in R outputs?

I am using the Survey Package in R to do some initial descriptive stats for my dissertation. With the outputs for both the svymean and the barplot, I would like the value labels to be displayed for the variable-it would make the descriptive statistics much easier to interpret. Instead of the output labels of: F1RTRCC1, F1RTRCC2-I would like to see the value labels of "academic" and "occupational" to be displayed.
How do I go about making this happen?
I am including a minimal reproducible example with a small subset of my actual data:
#Calling Survey Package
library (survey)
#Using dput to display a subset of my actual data
MRE1 <- structure(list(ï..STU_ID = c(101101L, 101102L, 101104L, 101105L, 101106L, 101107L),
PSU = c(1L, 1L, 1L, 1L, 1L, 1L), STRAT_ID = c(101L,
101L, 101L, 101L, 101L, 101L), BYSCTRL = c(1L, 1L, 1L, 1L, 1L,
1L), G10COHRT = c(1L, 1L, 1L, 1L, 1L, 1L), F1RTRCC = c(2L, 1L,
4L, 2L, 2L, 4L), F1SEX = c(2L, 2L, 2L, 2L, 2L, 1L), F1RACE = c(5L,
2L, 7L, 3L, 4L, 4L), F1PARED = c(5L, 5L, 2L, 2L, 1L, 2L), F1SES2QU = c(2L,
4L, 1L, 1L, 1L, 1L), F1HIMATH = c(5L, 6L, 6L, 4L, 5L, 4L), F1RGPP2 = c(2L,
4L, 4L, 4L, 4L, 1L), F3ATTAINMENT = c(3L, 10L, 6L, 4L, 4L, 3L
), F3EDSTAT = c(5L, 5L, 5L, 2L, 2L, 5L)), .Names = c("ï..STU_ID",
"PSU", "STRAT_ID", "BYSCTRL", "G10COHRT", "F1RTRCC", "F1SEX",
"F1RACE", "F1PARED", "F1SES2QU", "F1HIMATH", "F1RGPP2", "F3ATTAINMENT",
"F3EDSTAT"), row.names = c(NA, 6L), class = "data.frame")
#Svymean of Variable F1RTRCC
#There is an error coming here: Error in UseMethod("svymean", design) :
#no applicable method for 'svymean' applied to an object of class "data.frame"-I
#think this is likely related to my dput subset of my data, as when I am using
#this function with my full dataset, this error does not display.
CC <- svymean(~F1RTRCC, MRE1, na.rm=T)
CC
#Barplot for F1RTRCC
barplot(CC)
With the outputs for both the svymean and the barplot, I would like the value labels to be displayed.
Instead of F1RTRCC1, F1RTRCC2-I would like to see the value labels of "academic" and "occupational" to be displayed.
How do I go about making this happen?

Faceting bars in ggplot2

I have this problem: I want to build a stacked bar plot with the faceting capabilities, so I can compare the distribution of frequencies for five common categories, within two different objects, separated according to three groups. I have six objects, five categories and three groups. The problem is that each group has only two different and exclusive objects to plot, but so far I can only produce a plot in which the six objects are plotted across the three groups. This is not optimal, since for each group I have four objects with no data.
Is it possible to plot just two objects for each group with the faceting capabilities?
EDITED
This is my data:
structure(list(Face = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("LGH002", "LGH003", "LGM009",
"SCM018", "VAH022", "VAM028"), class = "factor"), Race = structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("1. Amerindian", "2. White", "3. Mestizo", "4. Other races",
"5. Cannot tell"), class = "factor"), Count = c(19L, 0L, 13L, 8L, 0L, 2L,
7L, 23L, 6L, 2L, 1L, 1L, 29L, 6L, 3L, 29L, 0L, 11L, 0L, 0L, 0L, 38L, 1L, 0L,
1L, 0L, 30L, 9L, 0L, 1L), Density = c(0.475, 0, 0.325, 0.2, 0,
0.05, 0.175, 0.575, 0.15, 0.05, 0.025, 0.025, 0.725, 0.15,
0.075, 0.725, 0, 0.275, 0, 0, 0, 0.95, 0.025, 0, 0.025, 0,
0.75, 0.225, 0, 0.025), School = structure(c(1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Municipal",
"Private Fee-Paying", "Private-Voucher"), class = "factor")),
.Names =c("Face", "Race", "Count", "Density", "School"),
class = "data.frame", row.names = c(NA, -30L))
This is the code I'm using to build the plot:
P <- ggplot(data = races.df, aes(x = Face, y = Density, fill = Race)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
P + facet_grid(School ~ ., scales="free") + coord_flip()
As you can imagine, I only want to see the x-values "SCM018" and "LGH002" in "Municipal"; "LGM009" and "LGH003" in "Private-Voucher"; and "VAH022" and "VAM028" in "Private Fee-Paying" (only two objects per group). Is it possible? Any help?
All the best,
Mauricio.

Resources