I'm trying to create a sentiment analysis using the tidytext code here but my graph comes out vertical, without the output making sense compared to the original which is horizontal. How can I fix this?
#Unnest tokens
edAItext = edAI %>% select(Group, Participant_ID, Brainscape_Pattern) %>%
unnest_tokens(word, Brainscape_Pattern)
# Inner join
bing_word_counts <- edAItextTest %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
#Check
bing_word_counts
#Plot
bing_word_counts %>%
group_by(sentiment) %>%
slice_max(n, n = 5) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(x = "Contribution to sentiment",
y = NULL)
This is how it looks:
This is how it's supposed to look:
Related
I'm trying to get the variable labels and value labels to be displayed on a stacked bar chart.
library(tidyverse)
data <- haven::read_spss("http://staff.bath.ac.uk/pssiw/stats2/SAQ.sav")
data %>%
select(Q01:Q04) %>%
gather %>%
group_by(key, value) %>%
tally %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ggplot(aes(x=key, y=n, fill=factor(value))) +
geom_col() +
geom_text(aes(label=as_factor(n)), position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) + scale_fill_brewer(palette = "Set2")
Instead of Q01, Q02, Q03, Q04, I would like to use the variable labels.
library(labelled)
var_label(data$Q01)
Statistics makes me cry
var_label(data$Q02)
My friends will think Im stupid for not being able to cope with SPSS
var_label(data$Q03)
Standard deviations excite me
var_label(data$Q04)
I dream that . . .
along with associated value labels
val_labels(data$Q01)
Strongly agree Agree Neither Disagree Strongly disagree Not answered
1 2 3 4 5 9
I tried using label = as_factor(n) but that didn't work.
We may extract the labels and then do a join
library(forcats)
library(haven)
library(dplyr)
library(tidyr)
library(labelled)
subdat <- data %>%
select(Q01:Q04)
d1 <- subdat %>%
summarise(across(everything(), var_label)) %>%
pivot_longer(everything())
subdat %>%
pivot_longer(everything(), values_to = 'val') %>%
left_join(d1, by = 'name') %>%
mutate(name = value, value = NULL) %>%
count(name, val) %>%
mutate(n = n/sum(n)*100, round = 1) %>%
mutate(n = round(n, 2)) %>%
ungroup %>%
mutate(labels = names(val_labels(val)[val])) %>%
ggplot(aes(x=name, y=n, fill=labels)) +
geom_col() +
geom_text(aes(label=as_factor(n)),
position=position_stack(.5)) +
coord_flip() +
theme(aspect.ratio = 1/3) +
scale_fill_brewer(palette = "Set2")
-output
I'm trying an example I found here where data is plotted using ggplot2. The code looks like this:
raw_text %>%
group_by(newsgroup) %>%
summarize(messages = n_distinct(id)) %>%
ggplot(aes(messages, newsgroup)) +
geom_col() +
labs(y = NULL)
The bars inside the diagram are supposed to be horizontal, i.e. from left to right, but for me they are vertical:
What do I have to change to also get horizontal bars?
I reproduced the example you indicated by downloading the data. My plot looks like:
code as given:
library(dplyr)
library(tidyr)
library(purrr)
library(readr)
library(ggplot2)
# dataset has to be downloaded see question user1406177
training_folder <- "data/20news-bydate-train/"
# Define a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
tibble(file = dir(infolder, full.names = TRUE)) %>%
mutate(text = map(file, read_lines)) %>%
transmute(id = basename(file), text) %>%
unnest(text)
}
raw_text <- tibble(folder = dir(training_folder, full.names = TRUE)) %>%
mutate(folder_out = map(folder, read_folder)) %>%
unnest(cols = c(folder_out)) %>%
transmute(newsgroup = basename(folder), id, text)
raw_text %>%
group_by(newsgroup) %>%
summarize(messages = n_distinct(id)) %>%
ggplot(aes(messages, newsgroup)) +
geom_col() +
labs(y = NULL)
I would try this:
#Code
raw_text %>%
group_by(newsgroup) %>%
summarize(messages = n_distinct(id)) %>%
ggplot(aes(y=messages, x=newsgroup)) +
geom_col() +
labs(y = NULL)+
coord_flip()
I would like to see the y-axis (in the plot is flipped) starting at some arbitrary value, like 7.5
After a little bit of researching, I came across ylim, but in this case is giving me some
errors:
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.
Warning message:
Removed 10 rows containing missing values (geom_col).
This is my code, and a way to download the data I'm using:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
#"best" writers with at least 10 episodes
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
ylim(7.5,10)
You should use coord_cartesian for zoom in a particular location (here the official documentation: https://ggplot2.tidyverse.org/reference/coord_cartesian.html).
With your example, your code should be something like that:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
coord_cartesian(ylim = c(7.5,10))
If this is not working please provide a reproducible example of your dataset (see: How to make a great R reproducible example)
I found out the solution. With my actual plot, the answer submitted by #dc37 didn't work because coord_flip() and coord_cartesian() exclude each other. So the way to do this is:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(mean_rating) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
theme(legend.position = "None") +
coord_flip(ylim = c(8,8.8))
Let's say I'm working with proportions, I have two main variables (sex and pain_level). It's not difficult to plot them:
With tidyverse and broom (and thanks for this link here: Calling prop.test function in R with dplyr) I can compare if the proportions are statistically different.
Now comes the question!
I want to add to the plot, the error bar. I know it's not as difficult as I'm thinking, but I could not find a way to do it. I've tried to replicate this link here (http://www.andrew.cmu.edu/user/achoulde/94842/labs/lab07_solution.html) but I'm trying to stay at tidyverse environment.
The desired output should be something like that:
Please feel free to use the script/syntax below that simulate the original dataset.
library(tidyverse)
ds <- data.frame(sex = rep(c("M","F"), 18),
pain_level = c("High","Moderate","low"))
#plot
ds %>%
group_by(pain_level, sex) %>%
summarise(n=n()) %>%
mutate(prop = n/sum(n)*100) %>%
ggplot(., aes(x = sex, fill = pain_level, y = prop)) +
geom_bar(stat = "summary") +
facet_wrap( ~ pain_level) +
theme(legend.position = "none")
#p values of proportion test
ds %>%
rowwise %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>% #compute totals
distinct(., pain_level, .keep_all= TRUE) %>% #keep only one value of the row
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst)
I think the following might roughly resemble your desired output:
ds %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst) %>%
ggplot(aes(sex, estimate, group = pain_level)) +
geom_col(aes(fill = pain_level)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
facet_wrap(~ pain_level)
Although my query shows me values in descending order, ggplot then displays them alphabetically instead of ascending order.
Known solutions to this problem haven't seem to work. They suggest using Reorder or factor for values, which didn't work in this case
This is my code:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y = movies_made, fill = studio, label = as.character(movies_made))) +
geom_bar(stat = 'identity') +
geom_label(label.size = 1, size = 5, color = "white") +
theme(legend.position = "none") +
ylab("Movies Made") +
xlab("Studio")
for those wanting a more complete example, here's where I got:
library(dplyr)
library(ggplot2)
# get some dummy data
boxoffice = boxoffice::boxoffice(dates=as.Date("2017-1-1"))
df <- (
boxoffice %>%
group_by(distributor) %>%
summarise(movies_made = n()) %>%
mutate(studio=reorder(distributor, -movies_made)) %>%
top_n(10))
ggplot(df, aes(x=distributor, y=movies_made)) + geom_col()
You'll need to convert boxoffice$studio to an ordered factor. ggplot will then respect the order of rows in the data set, rather than alphabetizing. Your dplyr chain will look like this:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
ungroup() %>% # ungroup
mutate(studio = factor(studio, studio, ordered = T)) %>% # convert variable
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y... (rest of plotting code)