How to plot a rating scale in R - r
What is the best way to represent the following trait rating scale? I'd like to label the traits (8 traits) and degrees or each emotion (1 being low feelings, 5 being strong feelings), across the democratic and republican parties? Do I need to aggregate the items? I'm new to R and not sure how to tackle this.
Survey question and scale:
"Below is a list of feelings or moods that could be caused by an object. Please use the list below to describe how the U.S. FEDERAL parties (and its elected officials) make you feel. If the word definitely describes how a party makes you feel, then choose the number 5. If you decide that the word does not at all describe how the party makes you feel, then choose the number 1. Use the intermediate numbers between 1 and 5 to indicate responses between these two extremes."
Survey sample:
dput(df[Book3(1:nrow(df), 30),])
structure(list(TRAITDEM1 = c(3, 4, 3, 3, 3, 3, 3, 1, 2, 2, 2,
3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 4, 3, 1, 2, 4), TRAITDEM2 = c(3,
1, 1, 2, 2, 2, 3, 5, 4, 2, 2, 2, 3, 3, 3, 4, 1, 2, 3, 1, 4, 5,
2, 3, 1, 1, 1, 4, 1, 2), TRAITDEM3 = c(3, 4, 4, 2, 3, 3, 3, 1,
1, 2, 2, 3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 5, 4, 1, 3,
5), TRAITDEM4 = c(3, 2, 1, 2, 2, 2, 4, 5, 4, 5, 2, 3, 2, 3, 3,
4, 3, 4, 3, 1, 5, 4, 1, 4, 3, 4, 2, 4, 2, 1), TRAITDEM5 = c(3,
4, 3, 4, 4, 3, 2, 1, 1, 2, 2, 3, 4, 2, 2, 1, 1, 3, 1, 5, 1, 1,
2, 1, 4, 4, 4, 1, 3, 4), TRAITDEM6 = c(3, 1, 1, 1, 1, 1, 1, 2,
1, 1, 1, 2, 2, 2, 2, 4, 3, 1, 1, 1, 4, 5, 1, 3, 1, 1, 1, 1, 1,
1), TRAITDEM7 = c(3, 1, 3, 3, 2, 2, 1, 1, 1, 2, 3, 4, 3, 2, 2,
1, 1, 2, 2, 5, 1, 1, 1, 3, 3, 4, 2, 1, 5, 5), TRAITDEM8 = c(3,
1, 1, 1, 2, 1, 3, 5, 2, 4, 1, 1, 2, 2, 3, 1, 3, 1, 2, 1, 5, 5,
2, 2, 1, 2, 1, 2, 1, 1), TRAITREP1 = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1,
1), TRAITREP2 = c(1, 5, 5, 5, 5, 5, 5, 2, 5, 2, 5, 5, 5, 5, 4,
5, 1, 5, 5, 5, 5, 1, 5, 4, 5, 5, 5, 3, 5, 5), TRAITREP3 = c(1,
1, 1, 1, 2, 1, 1, 2, 1, 4, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3,
1, 1, 1, 1, 1, 1, 1, 2), TRAITREP4 = c(1, 5, 5, 1, 5, 5, 5, 3,
5, 2, 5, 4, 5, 5, 5, 5, 3, 5, 5, 5, 5, 1, 5, 3, 5, 5, 5, 4, 5,
1), TRAITREP5 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 2,
1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1), TRAITREP6 = c(1,
5, 5, 5, 3, 3, 3, 1, 1, 1, 3, 3, 5, 3, 4, 5, 3, 4, 5, 4, 5, 1,
5, 3, 4, 4, 5, 1, 1, 3), TRAITREP7 = c(1, 1, 1, 1, 2, 2, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1,
2), TRAITREP8 = c(1, 5, 5, 5, 4, 5, 5, 2, 5, 2, 5, 4, 5, 5, 4,
1, 3, 5, 5, 5, 5, 3, 4, 4, 5, 5, 5, 3, 5, 5), PARTYID_Strength = c(5,
1, 2, 1, 2, 1, 8, 7, 6, 3, 1, 6, 6, 1, 7, 8, 7, 1, 1, 1, 2, 4,
1, 6, 1, 1, 1, 7, 6, 8)), row.names = c(NA, -30L), class = c("tbl_df",
"tbl", "data.frame"))
"PartyID_Strength" represents 8 measures of political parties:
1 - Strong Democrat
2 - Not very strong Democrat
3 - Strong Republican
4 - Not very strong Republican
5 - Independent
6 - Independent - Democrat
7 - Independent - Republican
8 - Other
I tried it this way (graph below) but it's still not plotting the remaining four traits:
Cleaning the data
In order to solve your problem, we have to transform your data, in order to convert it into tidy format.
Observation
There are few particular problems with your original dataset:
Data are in a wide format, i.e. most of the columns from your data frame, can be represented by 3 variables;
Names of the variables are not self-explanatory. Names are in upper case which, by itself, does not hold any useful information, they are not readable and not good for typing/writing.
There is additional information we can extract from the variable names: Party and Feelings toward the Party. First one is an abbreviation ('dem' or 'rep') second one is the numerically encoded feeling towards the political party. However the order of numbers encoding the feeling does not reflect natural order of emotions from the disgust up to joy;
Variable PARTYID_Strength is numerically encoded Political Party [self-]Identification it also does not reflect natural order from strongest democrats through independent towards strongest republicans;
Plan
Convert data from wide into long format using all variables starting with TRAIT, and leaving PARTYID_Strength variable unchanged;
Extract useful information from the TRAIT... variables (Political Party, Feelings Toward the Party);
Convert all numerically encoded variables into the factors with reasonably ordered levels;
Give all variables meaningful names;
Summarize the data;
Transformations
We need to create several lookup tables, which will simplify the workflow.
Affiliation lookup table:
aff_lookup <- c(
'Strong Democrat',
'Not very strong Democrat',
'Strong Republican',
'Not very strong Republican',
'Independent',
'Independent-Democrat',
'Independent-Republican',
'Other'
)
We can further order aff_lookup by this vector:
aff_order = c(1, 2, 6, 5, 7, 4, 3, 8)
Emotions/Feelings lookup table:
emo_lookup <- c(
'Delighted',
'Angry',
'Happy',
'Annoyed',
'Joy',
'Hateful',
'Relaxed',
'Disgusted'
)
And we can order emo_lookup by this vector:
emo_order <- emo_order <- c(8, 6, 2, 4, 7, 3, 1, 5)
Political party lookup table:
party_lookup <- c(
dem = 'National Democratic Party',
rep = 'National Republican Party'
)
Finally, with all helper variables, we can transform our data into desirable form.
library(tidyverse)
dat %<>%
rename_all(tolower) %>%
pivot_longer(
cols = starts_with('trait'),
names_to = c('party', 'emotion'),
names_pattern = 'trait(dem|rep)(\\d)',
values_to = 'score'
) %>%
mutate(
party = factor(party_lookup[party]),
affiliation = factor(
aff_lookup[partyid_strength],
levels = aff_lookup[aff_order]
),
emotion = factor(
emo_lookup[as.numeric(emotion)],
levels = emo_lookup[emo_order]
)
) %>%
group_by(party, emotion, affiliation) %>%
summarise(score = median(score)) %>%
ungroup()
head(dat)
## A tibble: 6 x 4
# party emotion affiliation score
# <fct> <fct> <fct> <dbl>
#1 National Democratic Party Disgusted Strong Democrat 1
#2 National Democratic Party Disgusted Not very strong Democrat 2
#3 National Democratic Party Disgusted Independent-Democrat 2
#4 National Democratic Party Disgusted Independent 3
#5 National Democratic Party Disgusted Independent-Republican 3
#6 National Democratic Party Disgusted Not very strong Republican 5
Plot the data
Plan
Now we can plot the data, as two separate plots for Democrats and Republicans with Affiliation (Political Party Identification) on X-axis and Emotions (Feelings) on Y-axis.
Each Emotion/Affilation point is going to be represented as a bar with the height of the bar representing the Score.
We can also add color encoding to our plot. From my point of view, encoding Emotions/Feelings with a color gradient from red (Disgust) to green (Joy) could help as to gather the internal structure of our data.
Plot
dat %>%
ggplot(
aes(
x = affiliation,
y = as.numeric(emotion) + (score / max(score) * .95) / 2,
height = (score / max(score) * .95),
width = .95,
fill = emotion,
label = score
)
) +
geom_tile(show.legend = FALSE) +
geom_text(size = 3.5, color = 'gray25', alpha = .75) +
facet_wrap(~ party, scales = 'free') +
scale_fill_brewer(palette = 'RdYlGn') +
scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order]) +
labs(x = 'Affiliations', y = 'Emotions') +
ggthemes::theme_tufte() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.ticks.x = element_blank(),
axis.text.y = element_text(hjust = 0, vjust = -0.025),
axis.ticks.y = element_blank()
)
Which gives as following figure:
Explanation
There is a trick with this plot: it looks like a series of barplots, bot it is not real barplots (by the fact, not functionally).
What I do:
The core of this solution is the use of geom_tile() for each data point. It is just a rectangle (square by default) with geometrical center of mass determined by the given coordinates (Affilation, Emotion).
Both Affilation and Emotion are factors, not numerics. And it is OK for Affiliation, because we want only to position our tile according to the Affiliation it represents.
It is more complicated with Emotion, because we want to position each tile according to the Emotion it represents, but also we want to encode Score by the height of the tile.
To define the height of the tile we use height parameter within the aes(). We want our tile height to be less or equall to one (with 0.05 offset) so the tiles between let say Angry and Annoyed do not overlap. That's why we use (score / max(score) * .95 for the height parameter.
We also need to give different y-coordinates for each tile, so the center of the tile is placed not on the imaginary line representing each emotion, but half-height up. So when tile is drawn, it's center (on y-axis) is placed half-height up from the "base line" and the tile extends half-height up and down, creating a fake barplot. That's what the following line of code does as.numeric(emotion) + (score / max(score) * .95) / 2.
We also give a tile a fixed width of .95 by width = .95, file the tile with Red-Yellow-Green gradient and lable each tile with the relevant Score.
The rest are just decorations. However, note how we relable the Y-axis. Because, as it defined in aes() it is continuous scale, but we want to make it fake discrete axis we use this row:
scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order])
Here we just use our emo_order to say that we want breaks for integers from 1 to 8, and after that we label this breaks with feelings from ordered emo_lookup table.
Related
How do I generate a polychoric correlation matrix in R-psych
I am trying to generate a polychoric correlation matrix in R-psych for a 227 x 6 data table which I have called nepr. Importing the data from an excel spreadsheet and entering the code: nepr=as.data.frame(nepr) attach(nepr) library(psych) out=polychoric(nepr) neprpoly=out$rho print(neprpoly,digits=2) generates the following error message: >Error in if (any(lower > upper)) stop("lower>upper integration limits"): missing value where TRUE/FALSE needed >In addition: warning messages: >1. In polychoric(nepr): The items do not have an equal number of response alternatives, global set to FALSE. >2. In qnorm(cumsum(rsum)[-length(rsum)]): NaNs produced I was expecting the code which I entered to produce a polychoric correlation matrix based on the dataframe nepr and don't know how to interpret/ act on the error messages which I have received. Can anyone suggest what changes I need to make to the code to address the error messages? A sample of the dataset is as follows: structure(list(Balance = c(4, 4, 5, 5, 3, 4, 3, 4, 2, 2, 2, 5, 2, 2, 2, 2, 1, 2, 4, 1), Earth = c(4, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 5, 3, 4, 4, 2, 5, 4, 5, 5), Plants = c(2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 2, 4), Modify = c(2, 2, 1, 1, 2, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2), Growth = c(2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 1, 4, 2, 2, 4, 4, 4, 1, 2), Mankind = c(2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2)), row.names = c(NA,20L), class = "data.frame") The data consists of inputs of Likert scale rankings (ranked 1-5) to the items 'Balance', 'Earth', 'Plants', 'Modify', 'Growth', and 'Mankind'. There are no missing values in any cells of the 227 row x 6 item matrix; Balance, Plants, & Growth all contain the values 1-5; Earth contains the values 2-5 (no ranking of 1 recorded); Mankind contains the values 1-4 (no ranking of 5 recorded). When I ran the original data set (before reversing the valence of the last 3 columns) I was able to get a polychoric matrix with no problems even though the data contained the Earth data as it appears in the nepr data set. I assume that it is not uncommon to have similar data sets from surveys where variables do not necessarily contain the full range of response values.
Is there a way, i can order the axis on a melted ggplot? [duplicate]
This question already has answers here: Order discrete x scale by frequency/value (7 answers) How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate] (2 answers) ggplot2, Ordering y axis (1 answer) R ggplot ordering bars within groups (1 answer) Closed 6 months ago. I have a Problem with a Plot I want to order, but it seems like it cant be. install.packages("reshape2") library(reshape2) install.packages("ggplot2") library(ggplot2) df <- createRegressionTable(data,colname) gg <- melt(df, id = "colname") return( ggplot(gg, aes( x = colname, y = variable, fill = value )) + geom_tile(show.legend = FALSE) + geom_text(aes(label = value), alpha = 0.6) + scale_fill_gradient(low = "#D5E8D4", high = "#F8CECC") + labs( x = "Regressant", y = "Regressor" ) + theme(legend.key = element_blank()) ) I know the function createRegressionTable is a black box but this is the result: list(colname = c("zielrichtungU", "zielrichtungO", "imitationU", "imitationO", "steuerungU", "steuerungO", "neuheitU", "neuheitO", "netzwerkU", "netzwerkO"), zielrichtungU = c(5, 1, 5, 1, 3, 4, 1, 1, 1, 1), zielrichtungO = c(1, 5, 1, 5, 1, 5, 3, 5, 1, 1), imitationU = c(5, 1, 5, 5, 1, 5, 1, 1, 4, 1), imitationO = c(1, 5, 5, 5, 1, 1, 5, 5, 5, 5), steuerungU = c(3, 1, 1, 1, 5, 5, 1, 2, 1, 1), steuerungO = c(4, 5, 5, 1, 5, 5, 3, 5, 1, 3), neuheitU = c(1, 3, 1, 5, 1, 3, 5, 5, 1, 1), neuheitO = c(1, 5, 1, 5, 2, 5, 5, 5, 1, 1), netzwerkU = c(1, 1, 4, 5, 1, 1, 1, 1, 5, 5), netzwerkO = c(1, 1, 1, 5, 1, 3, 1, 1, 5, 5)) I tested whether the output of melt is scrambled, but it seems to be ordered, as I wished, and now I don't know where the problem lies And here is the Plot, that I'd love to order:
Nodes sliding off path diagram in R
This is a sample dput since the dataset is huge: > dput(head(dat, n=20)) structure(list(q01 = c(2, 1, 2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 3, 1, 2, 2, 2), q02 = c(1, 1, 3, 1, 1, 1, 3, 2, 3, 4, 1, 1, 1, 2, 2, 1, 2, 2, 3, 1), q03 = c(4, 4, 2, 1, 3, 3, 3, 3, 1, 4, 5, 3, 3, 1, 3, 2, 5, 3, 4, 1), q04 = c(2, 3, 2, 4, 2, 2, 2, 2, 4, 3, 2, 3, 4, 2, 4, 2, 2, 3, 2, 2), q05 = c(2, 2, 4, 3, 2, 4, 2, 2, 5, 2, 2, 4, 3, 2, 2, 2, 1, 3, 3, 3), q06 = c(2, 2, 1, 3, 3, 4, 2, 2, 3, 1, 1, 3, 2, 2, 2, 2, 1, 4, 1, 4), q07 = c(3, 2, 2, 4, 3, 4, 2, 2, 5, 2, 2, 3, 3, 3, 3, 2, 1, 3, 1, 4), q08 = c(1, 2, 2, 2, 2, 2, 2, 2, 5, 2, 2, 1, 3, 2, 2, 2, 1, 2, 1, 1), q09 = c(1, 5, 2, 2, 4, 4, 3, 4, 3, 3, 5, 3, 2, 2, 2, 2, 4, 5, 5, 5), q10 = c(2, 2, 2, 4, 2, 3, 2, 2, 3, 2, 2, 2, 3, 3, 3, 3, 1, 2, 2, 1), q11 = c(1, 2, 3, 2, 2, 2, 2, 2, 5, 2, 1, 2, 3, 2, 2, 2, 1, 3, 1, 2), q12 = c(2, 3, 3, 2, 3, 4, 2, 3, 5, 3, 3, 3, 4, 4, 3, 3, 2, 3, 3, 5), q13 = c(2, 1, 2, 2, 3, 3, 2, 2, 5, 2, 1, 2, 4, 2, 2, 2, 1, 3, 1, 2), q14 = c(2, 3, 4, 3, 2, 3, 2, 2, 5, 1, 2, 2, 4, 4, 3, 3, 1, 3, 2, 5), q15 = c(2, 4, 2, 3, 2, 5, 2, 3, 5, 2, 1, 3, 4, 4, 3, 2, 1, 4, 2, 5), q16 = c(3, 3, 3, 3, 2, 2, 2, 2, 5, 3, 2, 3, 4, 4, 4, 3, 2, 3, 3, 5), q17 = c(1, 2, 2, 2, 2, 3, 2, 2, 5, 2, 2, 2, 3, 2, 2, 2, 2, 2, 1, 2), q18 = c(2, 2, 3, 4, 3, 5, 2, 2, 5, 2, 2, 2, 3, 4, 3, 3, 1, 2, 1, 5), q19 = c(3, 3, 1, 2, 3, 1, 3, 4, 2, 3, 5, 3, 2, 1, 3, 2, 4, 2, 4, 1), q20 = c(2, 4, 4, 4, 4, 5, 2, 3, 5, 3, 3, 4, 4, 5, 4, 3, 2, 3, 2, 5), q21 = c(2, 4, 3, 4, 2, 3, 2, 2, 5, 2, 2, 3, 4, 5, 4, 2, 1, 3, 2, 5), q22 = c(2, 4, 2, 4, 4, 1, 4, 4, 3, 4, 5, 4, 3, 3, 4, 3, 4, 3, 4, 5), q23 = c(5, 2, 2, 3, 4, 4, 4, 4, 3, 4, 5, 4, 4, 1, 4, 4, 4, 4, 4, 5)), variable.labels = c(q01 = "Statistics makes me cry", q02 = "My friends will think I'm stupid for not being able to cope with SPSS", q03 = "Standard deviations excite me", q04 = "I dream that Pearson is attacking me with correlation coefficients", q05 = "I don't understand statistics", q06 = "I have little experience of computers", q07 = "All computers hate me", q08 = "I have never been good at mathematics", q09 = "My friends are better at statistics than me", q10 = "Computers are useful only for playing games ", q11 = "I did badly at mathematics at school", q12 = "People try to tell you that SPSS makes statistics easier to understand but it doesn't", q13 = "I worry that I will cause irreparable damage because of my incompetenece with computers", q14 = "Computers have minds of their own and deliberately go wrong whenever I use them", q15 = "Computers are out to get me", q16 = "I weep openly at the mention of central tendency", q17 = "I slip into a coma whenever I see an equation", q18 = "SPSS always crashes when I try to use it", q19 = "Everybody looks at me when I use SPSS", q20 = "I can't sleep for thoughts of eigen vectors", q21 = "I wake up under my duvet thinking that I am trapped under a normal distribtion", q22 = "My friends are better at SPSS than I am", q23 = "If I'm good at statistics my friends will think I'm a nerd" ), codepage = 65001L, row.names = c(NA, 20L), class = "data.frame") I mostly copied another semPath model but edited it to fit the dataset I was using. First the nodes: nodeNames <- c( "Statistics makes me cry.", "My friends think I'm stupid for not being able to cope with SPSS.", "Standard deviations excite me.", "I dream that Pearson is attacking me with correlation coefficients.", "I don't understand statistics.", "I have little experience with computers.", "All computers hate me.", "I've never been good at mathematics.", "SPSS Anxiety" ) Then the actual semPath: semPaths(onefac8items_a, what = "std", # this argument controls what the color of edges represent. In this case, standardized parameters whatLabels = "est", style = "lisrel", residScale = 8, theme = "colorblind", manifests = paste0("q",1:8), nCharNodes = 0, reorder = FALSE, nodeNames = nodeNames, legend.cex = 0.5, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE, sizeMan = 4, sizeLat = 10, mar = c(2,5,2,5.5), filetype = "pdf", width = 8, height = 6, filename = "SPSS Anxiety" ) So I really only have one question here. When I try to run my path diagram, the nodes look like they are sliding off to the right of the page. How do I fix this? Below is a picture of what I'm referring to:
Since you didn't share your model, I reproduced a dummy model. It seems semPaths doesn't allow us to adjust nodeNames, maybe you could save this graph as an object and try to reproduce with the "plot()" function in order to rescaling since semPaths has a lot of attributes. semPaths(fit, what = "std", style = "lisrel", residScale = 8, theme = "colorblind", nCharNodes = 4, reorder = FALSE, nodeNames = nodeNames, legend.cex = 0.35, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE) Or we could change the GLRatio in the plotOptions: a<-semPaths(onefac8items_a, what = "std", whatLabels = "est", style = "lisrel", residScale = 8, theme = "colorblind", nCharNodes = 0, reorder = FALSE, nodeNames = nodeNames, legend.cex = 0.5, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE, sizeMan = 4, sizeLat = 10, mar = c(2,5,2,5.5) ) a$plotOptions$GLratio<-1 # you may need to play with this number plot(a)
I ended up just shortening my questions down in the nodes and it fixed the problem. I guess there's a limit to how much text you can put into your legend: nodeNames <- c( "Statistics makes me cry.", "Friends think I'm stupid because I cant do SPSS.", "Standard deviations excite me.", "I dream that Pearson is attacking me with correlations.", "I don't understand statistics.", "I have little experience with computers.", "All computers hate me.", "I've never been good at mathematics.", "SPSS Anxiety" )
Your page isn't big enough. There are two graphics systems in R, base and grid. The one semPaths uses is the base package which sort of mimics how you draw on a paper: first you set up the size, then you draw things; you can't go back. The other, grid, is used in lattice and ggplot2 which saves the plotting until you call for it. grid plots typically do not run off the page as base graphics can, the plots are usually scaled to fit with the plotting region. Here is basically your problem using an example from lavaan::cfa library('lavaan') library('semPlot') nodeNames <- c( "Statistics makes me cry.", "My friends think I'm stupid for not being able to cope with SPSS.", "Standard deviations excite me.", "I dream that Pearson is attacking me with correlation coefficients.", "I don't understand statistics.", "I have little experience with computers.", "All computers hate me.", "I've never been good at mathematics.", "SPSS Anxiety" ) ?semPlot::semPaths example(cfa) semPaths( fit, what = "std", # this argument controls what the color of edges represent. In this case, standardized parameters whatLabels = "est", style = "lisrel", residScale = 8, theme = "colorblind", # manifests = paste0("q",1:8), nCharNodes = 0, reorder = FALSE, nodeNames = nodeNames, legend.cex = 0.5, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE, sizeMan = 4, sizeLat = 10, mar = c(2,5,2,5.5), filetype = "pdf", width = 8, height = 6, filename = "SPSS-Anxiety" ) I'm not sure what semPaths is doing here with the size because it is definitely not coming out 8x6 $ identify -verbose SPSS-Anxiety.pdf | grep "Print size" 8: Print size: 11.1944x6 I'm guessing it compensates for the extra features to fit everything on, but it is not doing a very good job. The typical way to save base plots to file is pdf() ## or png() or jpg() etc plotting code dev.off() ## or graphics.off() to close everything not just the current device And to do this you need to remove the filetype part from your code pdf('SPSS-Anxiety-2.pdf', width = 8, height = 6) par(oma = c(0, 2, 0, 25), xpd = NA) semPaths( fit, what = "std", # this argument controls what the color of edges represent. In this case, standardized parameters whatLabels = "est", style = "lisrel", residScale = 8, theme = "colorblind", # manifests = paste0("q",1:8), nCharNodes = 0, reorder = FALSE, nodeNames = nodeNames, legend.cex = 0.5, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE, sizeMan = 4, sizeLat = 10, mar = c(2,5,2,5.5) ) dev.off() Now I am getting something 8x6 $ identify -verbose SPSS-Anxiety-2.pdf | grep "Print size" 8: Print size: 8x6 I increased the size of the outer margins, oma see ?par, which gives me 2 extra lines of space on the left and 25 on the right. Also, note xpd = NA which turns off clipping, ie, anything printed outside of the plotting area will be shown--this also comes up a lot in base plots. But this is a lot of wasted space for some text. I would either scale it down or split the text into multiple lines. You can use strwrap to split each label at white space into <= some maximum width: par(oma = c(0, 0, 3, 0)) semPaths( fit, what = "std", # this argument controls what the color of edges represent. In this case, standardized parameters whatLabels = "est", style = "lisrel", residScale = 8, theme = "colorblind", # manifests = paste0("q",1:8), nCharNodes = 0, reorder = FALSE, nodeNames = sapply(nodeNames, function(x) paste(strwrap(x, 30), collapse = '\n ')), legend.cex = 0.5, rotation = 2, layout = "tree2", cardinal = "lat cov", curvePivot = TRUE, sizeMan = 4, sizeLat = 10, mar = c(2,5,2,5.5) ) title('Anxiety and Depression SEM Path Diagram', outer = TRUE)
Generate a sequence number (1,1,1,2,2,2,3,3,3) within groups of different length
I have a data frame with a column "Tag", here with four different levels. I need help to create the "Seq" column, a sequence generated from the "Tag" Column: df <- data.frame(Tag = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Seq = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3 ) Each "Tag" should be divided into 3 sub-groups defined by "Seq". We need to generate runs of 1, 2, and 3, with a total length of that of each "Tag". Thus, the length of each run of 1, 2, and 3 respectively depends on length of each "Tag". Note that the length each "Tag" differs. For example, Tag 1 is of length 31, and has a "Seq" 10 times 1, 10 times 2, and 11 times 3.
To begin with, Tag 1 is 31 while tag 2 is 32. Looking at the code below, the first number (1) will always be of lesser length than the next two (2,3). I used a ceiling process to come up with this. There is no clear criteria on what the code should do if the number is eg 31/3.. should it give a length of 10, 10, 11? or even 9, 11,11 will be fine? The code gives a 9, 11, 11 length: ec=table(Tag) unlist(mapply(function(x,y)rep(c(1,2,3),c(x,y,y)),ec-2*ceiling(ec/3),ceiling(ec/3))) To check the outputted results, save the results in a variable.. d=mapply(... then do sapply(d,table). Hope this will be of help.
ave(Tag, Tag, FUN = function(x){sort(rep(x = 1:3, length.out = length(x)))}) Explanation: For each level of "Tag" (ave(Tag, Tag, ...): repeat each level of "Seq" (x = 1:3) to the length of the subset of "Tag" (length.out = length(x)). sort the numbers.
Creating a barplot from matrix
So, I have a matrix like that: > dput(tbl_sum_peaks[1:40]) structure(c(2, 8, 3, 4, 1, 2, 1, 3, 1, 3, 1, 4, 4, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 5, 4, 2, 1, 1, 2, 1, 4, 2), .Names = c("AT1G01050", "AT1G01080", "AT1G01090", "AT1G01320", "AT1G01470", "AT1G01800", "AT1G01910", "AT1G01960", "AT1G01980", "AT1G02150", "AT1G02470", "AT1G02500", "AT1G02560", "AT1G02780", "AT1G02816", "AT1G02880", "AT1G02920", "AT1G02930", "AT1G03030", "AT1G03090", "AT1G03110", "AT1G03210", "AT1G03220", "AT1G03230", "AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680", "AT1G03740", "AT1G03870", "AT1G04080", "AT1G04170", "AT1G04270", "AT1G04410", "AT1G04420", "AT1G04530", "AT1G04640", "AT1G04650", "AT1G04690", "AT1G04750")) I would like to make a barplot which will have on yaxis the number of rows with specific number. As we see it the given example data most of the rows has a number 1 so the barplot for number 1 will be the tallest. That's a basic but I can't turn on my brain... so help from someone will be rewarded!
Try barplot(table(tbl_sum_peaks))