I am trying to generate a polychoric correlation matrix in R-psych for a 227 x 6 data table which I have called nepr. Importing the data from an excel spreadsheet and entering the code:
nepr=as.data.frame(nepr)
attach(nepr)
library(psych)
out=polychoric(nepr)
neprpoly=out$rho
print(neprpoly,digits=2)
generates the following error message:
>Error in if (any(lower > upper)) stop("lower>upper integration
limits"): missing value where TRUE/FALSE needed
>In addition: warning messages:
>1. In polychoric(nepr): The items do not have an equal number
of response alternatives, global set to FALSE.
>2. In qnorm(cumsum(rsum)[-length(rsum)]): NaNs produced
I was expecting the code which I entered to produce a polychoric correlation matrix based on the dataframe nepr and don't know how to interpret/ act on the error messages which I have received.
Can anyone suggest what changes I need to make to the code to address the error messages?
A sample of the dataset is as follows:
structure(list(Balance = c(4, 4, 5, 5, 3, 4, 3, 4, 2, 2, 2, 5,
2, 2, 2, 2, 1, 2, 4, 1), Earth = c(4, 5, 5, 5, 5, 5, 5, 4, 4,
4, 4, 5, 3, 4, 4, 2, 5, 4, 5, 5), Plants = c(2, 2, 2, 3, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 2, 4), Modify = c(2, 2, 1,
1, 2, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2), Growth =
c(2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 1, 4, 2, 2, 4, 4, 4, 1, 2),
Mankind = c(2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1,
1, 1, 2)), row.names = c(NA,20L), class = "data.frame")
The data consists of inputs of Likert scale rankings (ranked 1-5) to the items 'Balance', 'Earth', 'Plants', 'Modify', 'Growth', and 'Mankind'. There are no missing values in any cells of the 227 row x 6 item matrix; Balance, Plants, & Growth all contain the values 1-5; Earth contains the values 2-5 (no ranking of 1 recorded); Mankind contains the values 1-4 (no ranking of 5 recorded). When I ran the original data set (before reversing the valence of the last 3 columns) I was able to get a polychoric matrix with no problems even though the data contained the Earth data as it appears in the nepr data set. I assume that it is not uncommon to have similar data sets from surveys where variables do not necessarily contain the full range of response values.
What is the best way to represent the following trait rating scale? I'd like to label the traits (8 traits) and degrees or each emotion (1 being low feelings, 5 being strong feelings), across the democratic and republican parties? Do I need to aggregate the items? I'm new to R and not sure how to tackle this.
Survey question and scale:
"Below is a list of feelings or moods that could be caused by an object. Please use the list below to describe how the U.S. FEDERAL parties (and its elected officials) make you feel. If the word definitely describes how a party makes you feel, then choose the number 5. If you decide that the word does not at all describe how the party makes you feel, then choose the number 1. Use the intermediate numbers between 1 and 5 to indicate responses between these two extremes."
Survey sample:
dput(df[Book3(1:nrow(df), 30),])
structure(list(TRAITDEM1 = c(3, 4, 3, 3, 3, 3, 3, 1, 2, 2, 2,
3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 4, 3, 1, 2, 4), TRAITDEM2 = c(3,
1, 1, 2, 2, 2, 3, 5, 4, 2, 2, 2, 3, 3, 3, 4, 1, 2, 3, 1, 4, 5,
2, 3, 1, 1, 1, 4, 1, 2), TRAITDEM3 = c(3, 4, 4, 2, 3, 3, 3, 1,
1, 2, 2, 3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 5, 4, 1, 3,
5), TRAITDEM4 = c(3, 2, 1, 2, 2, 2, 4, 5, 4, 5, 2, 3, 2, 3, 3,
4, 3, 4, 3, 1, 5, 4, 1, 4, 3, 4, 2, 4, 2, 1), TRAITDEM5 = c(3,
4, 3, 4, 4, 3, 2, 1, 1, 2, 2, 3, 4, 2, 2, 1, 1, 3, 1, 5, 1, 1,
2, 1, 4, 4, 4, 1, 3, 4), TRAITDEM6 = c(3, 1, 1, 1, 1, 1, 1, 2,
1, 1, 1, 2, 2, 2, 2, 4, 3, 1, 1, 1, 4, 5, 1, 3, 1, 1, 1, 1, 1,
1), TRAITDEM7 = c(3, 1, 3, 3, 2, 2, 1, 1, 1, 2, 3, 4, 3, 2, 2,
1, 1, 2, 2, 5, 1, 1, 1, 3, 3, 4, 2, 1, 5, 5), TRAITDEM8 = c(3,
1, 1, 1, 2, 1, 3, 5, 2, 4, 1, 1, 2, 2, 3, 1, 3, 1, 2, 1, 5, 5,
2, 2, 1, 2, 1, 2, 1, 1), TRAITREP1 = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1,
1), TRAITREP2 = c(1, 5, 5, 5, 5, 5, 5, 2, 5, 2, 5, 5, 5, 5, 4,
5, 1, 5, 5, 5, 5, 1, 5, 4, 5, 5, 5, 3, 5, 5), TRAITREP3 = c(1,
1, 1, 1, 2, 1, 1, 2, 1, 4, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3,
1, 1, 1, 1, 1, 1, 1, 2), TRAITREP4 = c(1, 5, 5, 1, 5, 5, 5, 3,
5, 2, 5, 4, 5, 5, 5, 5, 3, 5, 5, 5, 5, 1, 5, 3, 5, 5, 5, 4, 5,
1), TRAITREP5 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 2,
1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1), TRAITREP6 = c(1,
5, 5, 5, 3, 3, 3, 1, 1, 1, 3, 3, 5, 3, 4, 5, 3, 4, 5, 4, 5, 1,
5, 3, 4, 4, 5, 1, 1, 3), TRAITREP7 = c(1, 1, 1, 1, 2, 2, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1,
2), TRAITREP8 = c(1, 5, 5, 5, 4, 5, 5, 2, 5, 2, 5, 4, 5, 5, 4,
1, 3, 5, 5, 5, 5, 3, 4, 4, 5, 5, 5, 3, 5, 5), PARTYID_Strength = c(5,
1, 2, 1, 2, 1, 8, 7, 6, 3, 1, 6, 6, 1, 7, 8, 7, 1, 1, 1, 2, 4,
1, 6, 1, 1, 1, 7, 6, 8)), row.names = c(NA, -30L), class = c("tbl_df",
"tbl", "data.frame"))
"PartyID_Strength" represents 8 measures of political parties:
1 - Strong Democrat
2 - Not very strong Democrat
3 - Strong Republican
4 - Not very strong Republican
5 - Independent
6 - Independent - Democrat
7 - Independent - Republican
8 - Other
I tried it this way (graph below) but it's still not plotting the remaining four traits:
Cleaning the data
In order to solve your problem, we have to transform your data, in order to convert it into tidy format.
Observation
There are few particular problems with your original dataset:
Data are in a wide format, i.e. most of the columns from your data frame, can be represented by 3 variables;
Names of the variables are not self-explanatory. Names are in upper case which, by itself, does not hold any useful information, they are not readable and not good for typing/writing.
There is additional information we can extract from the variable names: Party and Feelings toward the Party. First one is an abbreviation ('dem' or 'rep') second one is the numerically encoded feeling towards the political party. However the order of numbers encoding the feeling does not reflect natural order of emotions from the disgust up to joy;
Variable PARTYID_Strength is numerically encoded Political Party [self-]Identification it also does not reflect natural order from strongest democrats through independent towards strongest republicans;
Plan
Convert data from wide into long format using all variables starting with TRAIT, and leaving PARTYID_Strength variable unchanged;
Extract useful information from the TRAIT... variables (Political Party, Feelings Toward the Party);
Convert all numerically encoded variables into the factors with reasonably ordered levels;
Give all variables meaningful names;
Summarize the data;
Transformations
We need to create several lookup tables, which will simplify the workflow.
Affiliation lookup table:
aff_lookup <- c(
'Strong Democrat',
'Not very strong Democrat',
'Strong Republican',
'Not very strong Republican',
'Independent',
'Independent-Democrat',
'Independent-Republican',
'Other'
)
We can further order aff_lookup by this vector:
aff_order = c(1, 2, 6, 5, 7, 4, 3, 8)
Emotions/Feelings lookup table:
emo_lookup <- c(
'Delighted',
'Angry',
'Happy',
'Annoyed',
'Joy',
'Hateful',
'Relaxed',
'Disgusted'
)
And we can order emo_lookup by this vector:
emo_order <- emo_order <- c(8, 6, 2, 4, 7, 3, 1, 5)
Political party lookup table:
party_lookup <- c(
dem = 'National Democratic Party',
rep = 'National Republican Party'
)
Finally, with all helper variables, we can transform our data into desirable form.
library(tidyverse)
dat %<>%
rename_all(tolower) %>%
pivot_longer(
cols = starts_with('trait'),
names_to = c('party', 'emotion'),
names_pattern = 'trait(dem|rep)(\\d)',
values_to = 'score'
) %>%
mutate(
party = factor(party_lookup[party]),
affiliation = factor(
aff_lookup[partyid_strength],
levels = aff_lookup[aff_order]
),
emotion = factor(
emo_lookup[as.numeric(emotion)],
levels = emo_lookup[emo_order]
)
) %>%
group_by(party, emotion, affiliation) %>%
summarise(score = median(score)) %>%
ungroup()
head(dat)
## A tibble: 6 x 4
# party emotion affiliation score
# <fct> <fct> <fct> <dbl>
#1 National Democratic Party Disgusted Strong Democrat 1
#2 National Democratic Party Disgusted Not very strong Democrat 2
#3 National Democratic Party Disgusted Independent-Democrat 2
#4 National Democratic Party Disgusted Independent 3
#5 National Democratic Party Disgusted Independent-Republican 3
#6 National Democratic Party Disgusted Not very strong Republican 5
Plot the data
Plan
Now we can plot the data, as two separate plots for Democrats and Republicans with Affiliation (Political Party Identification) on X-axis and Emotions (Feelings) on Y-axis.
Each Emotion/Affilation point is going to be represented as a bar with the height of the bar representing the Score.
We can also add color encoding to our plot. From my point of view, encoding Emotions/Feelings with a color gradient from red (Disgust) to green (Joy) could help as to gather the internal structure of our data.
Plot
dat %>%
ggplot(
aes(
x = affiliation,
y = as.numeric(emotion) + (score / max(score) * .95) / 2,
height = (score / max(score) * .95),
width = .95,
fill = emotion,
label = score
)
) +
geom_tile(show.legend = FALSE) +
geom_text(size = 3.5, color = 'gray25', alpha = .75) +
facet_wrap(~ party, scales = 'free') +
scale_fill_brewer(palette = 'RdYlGn') +
scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order]) +
labs(x = 'Affiliations', y = 'Emotions') +
ggthemes::theme_tufte() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.ticks.x = element_blank(),
axis.text.y = element_text(hjust = 0, vjust = -0.025),
axis.ticks.y = element_blank()
)
Which gives as following figure:
Explanation
There is a trick with this plot: it looks like a series of barplots, bot it is not real barplots (by the fact, not functionally).
What I do:
The core of this solution is the use of geom_tile() for each data point. It is just a rectangle (square by default) with geometrical center of mass determined by the given coordinates (Affilation, Emotion).
Both Affilation and Emotion are factors, not numerics. And it is OK for Affiliation, because we want only to position our tile according to the Affiliation it represents.
It is more complicated with Emotion, because we want to position each tile according to the Emotion it represents, but also we want to encode Score by the height of the tile.
To define the height of the tile we use height parameter within the aes(). We want our tile height to be less or equall to one (with 0.05 offset) so the tiles between let say Angry and Annoyed do not overlap. That's why we use (score / max(score) * .95 for the height parameter.
We also need to give different y-coordinates for each tile, so the center of the tile is placed not on the imaginary line representing each emotion, but half-height up. So when tile is drawn, it's center (on y-axis) is placed half-height up from the "base line" and the tile extends half-height up and down, creating a fake barplot. That's what the following line of code does as.numeric(emotion) + (score / max(score) * .95) / 2.
We also give a tile a fixed width of .95 by width = .95, file the tile with Red-Yellow-Green gradient and lable each tile with the relevant Score.
The rest are just decorations. However, note how we relable the Y-axis. Because, as it defined in aes() it is continuous scale, but we want to make it fake discrete axis we use this row:
scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order])
Here we just use our emo_order to say that we want breaks for integers from 1 to 8, and after that we label this breaks with feelings from ordered emo_lookup table.
I want to have a barplot using ggplot2 that display multiple bars within each group, but in my plot, I have 4 bars instead of 8 for each group. I will appreciate your help.
here is my code:
levels = c('D', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9')
method = c('G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7','G8')
ave = c(4, 4, 4, 4, 5, 1, 2, 6, 3, 5, 2, 2, 2, 2, 5, 3, 4, 1, 1, 1, 2,
2, 2, 2, 3, 3, 2, 1, 1, 1, 1, 3, 4, 5, 6, 8, 9, 7, 1, 2, 3, 3, 4, 5, 7,
6, 1, 1, 1, 2, 5, 7, 7, 8, 9, 1, 4, 6, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
levels = factor(c(rep(levels,8)))
method = factor(c(rep(method,10)))
dat = data.frame(levels,ave,method)
dodge = position_dodge(width = .9)
p = ggplot(dat,mapping =aes(x = as.factor(levels),y = ave,fill =
as.factor(method)))
p + geom_bar(stat = "identity",position = "dodge") +
xlab("levels") + ylab("Mean")
It looks like geom_bar will only plot bars for observations that exist; if you want to have bars for every method (assuming you want each level to have a bar for each method), you need to have observations in your data corresponding to those pairings. Currently, it looks like each level corresponds to two methods at most. To artificially generate those pairings, you can use tidyr::complete() and tidyr::expand() before plotting. For each new pairing, ave will automatically be assigned NA, but you can change this behavior using the fill parameter in tidyr::complete().
Here's an example where ave is set to 0 for every new pairing instead of NA:
dat %>%
complete(expand(dat, levels, method), fill = list(ave = 0)) %>%
ggplot(df4,mapping = aes(x = as.factor(levels),
y = ave,
fill = as.factor(method),
)) +
geom_bar(stat = "identity", position = position_dodge(width = 1))+
xlab("levels") +
ylab("Mean")
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)