I'm struggling to spread the words when using wordcloud function.
data = tibble(Day = c("January", "February", "March" , "April", "May", "June", "July", "August", "Semptember", "October", "November", "December"),
Freq = c(1294, 1073, 1071, 1019, 938, 912, 703, 680, 543, 201, 190, 343))
set.seed(10)
wordcloud(words = data$Day, freq = data$Freq, min.freq = 1,
random.order=T, scale=c(3,.5), rot.per = 0)
I tried to save the output using ggsave function but this is what I got:
Desired output:
I couldn't find a way to do this in wordcloud but wordcloud2 gives rather more flexibility. I managed to cobble together this with help from another SO question for saving as an image file.
#packages enable saving to png or pdf via html, see link at end of answer
library(webshot)
webshot::install_phantomjs()
library("htmlwidgets")
library(tibble)
library(wordcloud2)
data = tibble(Day = c("January", "February", "March" , "April", "May", "June", "July", "August", "Semptember", "October", "November", "December"),
Freq = c(1294, 1073, 1071, 1019, 938, 912, 703, 680, 543, 201, 190, 343))
set.seed(10)
# control appearance with wordcloud2 arguments. The padding between words is controlled by `gridsize`.
# You have to play around with `size`, `gridSize` and the image size
eg <- wordcloud2(data, size = 0.4, rotateRatio = 0, color = "black", gridSize = 75)
# save as html
saveWidget(wc,"wc.html", selfcontained = F)
# and then as image:png
webshot("wc.html","wc.png", delay = 5, vwidth = 480, vheight = 480)
For saving the image to file see: How to Save the wordcloud in R
And you end up with:
Created on 2020-05-18 by the reprex package (v0.3.0)
Related
I am trying to convert a character column containing month names into a numerical column containing numbers (1=January, etc.).
Example 1. This works:
df2 <- structure(list(Month = c("January", "January", "March", "March", "April")), class = "data.frame", row.names = c(NA, -5L))
df2 %>%
group_by(Month) %>%
mutate(Month = which(Month == month.name)) -> df2
Example 2. This does not work (error: "Month must be size 31 or 1, not 2."):
df <- structure(list(Month = c("August", "August", "August", "August",
"August", "August", "August", "August", "August", "August", "August",
"August", "August", "August", "August", "August", "August", "August",
"August", "August", "August", "August", "August", "August", "August",
"August", "August", "August", "August", "August", "August", "December",
"December", "December", "December")), class = "data.frame", row.names = c(NA, -35L))
df %>%
group_by(Month) %>%
mutate(Month = which(Month == month.name)) -> df
The code for converting is the same in both cases. Why doesn't it work in the second example? I can't get my head around it.
I'm trying to create a boxplot based on timeseries data for multiple years. I want to group observations from multiple years by a variable "DAP" (similar to day of year 0-365), order them by day from November to March but only display the Month on the X-Axis.
I can create a custom order and X-Axis by creating a factor with each month, that works
level_order <- c('November', 'December', 'January', 'February', 'March')
plot <- ggplot(data = df, aes(y = y, x = factor(Month,level = level_order), group=DAP)) +
geom_boxplot(fill="grey85", width = 2.0) +
scale_x_discrete(limits = level_order)
plot
Now I'm stuck making the alignment on the X-Axis according to the days of the month. For example the first datapoint from November 26th needs to more right, closer to December.
Changing the X-Axis to "Date" creates monthly labels for each year and also removed the grouping.
plot <- ggplot(data = df, aes(y = y, x = Date, group=DAP)) +
geom_boxplot(fill="grey85")
plot + scale_x_date(date_breaks = "1 month", date_labels = "%B")
Setting the X-Axis to "DAP" instead of date gives me the correct order and spacing , but I need to display month on the X-Axis. How can I combine this last graph with the X-Axis labeling of graph 1?
plot <- ggplot(data = df, aes(y = y, x = DAP, group=DAP)) +
geom_boxplot(fill="grey85")
plot
and here a sample of the dataset
DAP Date Month y
1 47 2010-11-26 November 0.6872708
21 116 2011-02-03 February 0.7643213
41 68 2011-12-17 December 0.7021531
61 137 2012-02-24 February 0.7178306
81 92 2013-01-10 January 0.7330749
101 44 2013-11-23 November 0.6610618
121 113 2014-01-31 January 0.7961012
141 68 2014-12-17 December 0.7510821
161 137 2015-02-24 February 0.7799938
181 92 2016-01-10 January 0.6861423
201 47 2016-11-26 November 0.7155526
221 116 2017-02-03 February 0.7397810
241 72 2017-12-21 December 0.7259670
261 144 2018-03-03 March 0.6725775
281 106 2019-01-24 January 0.7637322
301 65 2019-12-14 December 0.7184616
321 134 2020-02-21 February 0.6760159
The following approach uses tidyverse. The date is separated into year-month-day and those newly created columns are made numeric. In the ggplot part position_dodge2(preserve = "single") is used which keeps the boxwidth the same. scale_x_discrete helps to redefine x-axis breaks and tick labels. width = 1 controls the distance between the boxes.
library(tidyverse)
df <- tibble::tribble(
~DAP, ~Date, ~Month, ~y,
47, "2010-11-26", "November", 0.6872708,
116, "2011-02-03", "February", 0.7643213,
68, "2011-12-17", "December", 0.7021531,
137, "2012-02-24", "February", 0.7178306,
92, "2013-01-10", "January", 0.7330749,
44, "2013-11-23", "November", 0.6610618,
113, "2014-01-31", "January", 0.7961012,
68, "2014-12-17", "December", 0.7510821,
137, "2015-02-24", "February", 0.7799938,
92, "2016-01-10", "January", 0.6861423,
47, "2016-11-26", "November", 0.7155526,
116, "2017-02-03", "February", 0.7397810,
72, "2017-12-21", "December", 0.7259670,
144, "2018-03-03", "March", 0.6725775,
106, "2019-01-24", "January", 0.7637322,
65, "2019-12-14", "December", 0.7184616,
134, "2020-02-21", "February", 0.6760159
)
df$Date <- as.Date(df$Date)
df %>%
separate(Date, sep = "-", into = c("year", "month", "day")) %>%
mutate_at(vars("year":"day"), as.numeric) %>%
select(-c(year, Month)) %>%
ggplot(aes(
x = factor(month, level = c(11, 12, 1, 2, 3)), y = y,
group = DAP, color = factor(month)
)) +
geom_boxplot(width = 1, lwd = 0.2, position = position_dodge2(preserve = "single")) +
scale_x_discrete(
breaks = c(11, 12, 1, 2, 3),
labels = c("November", "December", "January", "February", "March")
) +
labs(x = "") +
theme(legend.position = "none")
Try this. To get the right order, spacing and labels I make a new date. As year seems to be not relevant I set the year for obs November and December to 2019,
and for the other obs to 2020.
df <- structure(list(DAP = c(
47L, 116L, 68L, 137L, 92L, 44L, 113L,
68L, 137L, 92L, 47L, 116L, 72L, 144L, 106L, 65L, 134L
), Date = c(
"2010-11-26",
"2011-02-03", "2011-12-17", "2012-02-24", "2013-01-10", "2013-11-23",
"2014-01-31", "2014-12-17", "2015-02-24", "2016-01-10", "2016-11-26",
"2017-02-03", "2017-12-21", "2018-03-03", "2019-01-24", "2019-12-14",
"2020-02-21"
), Month = c(
"November", "February", "December",
"February", "January", "November", "January", "December", "February",
"January", "November", "February", "December", "March", "January",
"December", "February"
), y = c(
0.6872708, 0.7643213, 0.7021531,
0.7178306, 0.7330749, 0.6610618, 0.7961012, 0.7510821, 0.7799938,
0.6861423, 0.7155526, 0.739781, 0.725967, 0.6725775, 0.7637322,
0.7184616, 0.6760159
)), row.names = c(NA, -17L), class = "data.frame")
library(ggplot2)
# Make a new Date to get the correct order as with DAP.
# Set year for obs November and Decemeber to 2019,
# for other Obs to 2020,
df$Date1 <- gsub("20\\d{2}-(1\\d{1})", "2019-\\1", df$Date)
df$Date1 <- gsub("20\\d{2}-(0\\d{1})", "2020-\\1", df$Date1)
df$Date1 <- as.Date(df$Date1)
# use new date gives correcr order, spacing and labels
# Also adjusted limits
plot <- ggplot(data = df, aes(y = y, x = Date1, group = DAP)) +
geom_boxplot(fill = "grey85")
plot +
scale_x_date(date_breaks = "1 month", date_labels = "%B", limits = c(as.Date("2019-11-01"), as.Date("2020-03-31")))
I've been around the forums looking for a solution to my issue but can't seem to find anything. Derivatives of my question and their answer haven't really helped either. My data has four columns, one for Year and one for Month). I've been wanting to plot the data all in one graph without using any facets for years in ggplot. This is what I've been struggling with so far with:
df<-data.frame(Month = rep(c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October",
"November", "February", "March"),each = 20),
Year = rep(c("2018", "2019"), times = c(220, 40)),
Type = rep(c("C", "T"), 260),
Value = runif(260, min = 10, max = 55))
df$Month<-ordered(df$Month, month.name)
df$Year<-ordered(df$Year)
ggplot(df) +
geom_boxplot(aes(x = Month, y = Value, fill = Type)) +
facet_wrap(~Year)
I'd ideally like to manage this using dplyr and lubridate. Any help would be appreciated!
One option would be to make a true date value, then you can use the date axis formatter. Something like this is a rough start
ggplot(df) +
geom_boxplot(aes(x = lubridate::mdy(paste(Month, 1, Year)), y = Value, fill = Type, group=lubridate::mdy(paste(Month, 1, Year)))) +
scale_x_date(breaks="month", date_labels = "%m")
Do you mean this?
df<-data.frame(Month = rep(c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October",
"November", "February", "March"),each = 20),
Year = rep(c("2018", "2019"), times = c(220, 40)),
Type = rep(c("C", "T"), 260),
Value = runif(260, min = 10, max = 55))
df$Month <- factor(df$Month,levels=c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October",
"November", "Dicember"), ordered = T)
df$Month<-ordered(df$Month)
df$Year<-ordered(df$Year)
df$Year_Month <- paste0(df$Month, " ", df$Year)
df$Year_Month <- factor(df$Year_Month, levels = unique(df$Year_Month))
ggplot(df) +
geom_boxplot(aes(x = Year_Month, y = Value, fill = Type))
Here is my code. I am trying to make an rshiny page to show the mean symptoms in the past 30 days vesus the state based off of slider position (1 to 12 - for month). I know it is a little sloppy but I almost have it. I can get a graph that changes the title based off of the month on the slider but the graph just lists all of the data and not by month. Any help would be great.
`asthma = read.csv("AsthmaChild.Ozone.2006_2007.Sample.csv")
state.month = asthma[,-3:-10]
state.month = state.month[,-4]
state.month = aggregate(state.month$Symptoms.Past30D ~ state.month$STATE +
state.month$Month, state.month, mean)
colnames(state.month) = c("STATE", "Month", "Symptoms.Past30D")
sd = asthma[,-3:-10]
sd = sd[,-4]
sd = aggregate(sd$Symptoms.Past30D ~ sd$STATE + sd$Month, sd, function(x)
sd = sd(x))
colnames(sd) = c("STATE", "Month", "sd")
merged = merge(state.month,sd, by=c("STATE", "Month"))
df = count(asthma, "STATE", "Month")
colnames(df) = c("STATE","Freq")
data = merge(df, merged,by=c("STATE"))
data$sem = (data$sd)/(sqrt(data$Freq))
merged = data
merged$ConfUp = (merged$Symptoms.Past30D) + (merged$sem)
merged$ConfDown = (merged$Symptoms.Past30D) - (merged$sem)
merged$Month = as.character(merged$Month)
merged$Month = gsub("12", "December", merged$Month)
merged$Month = gsub("11", "November", merged$Month)
merged$Month = gsub("10", "October", merged$Month)
merged$Month = gsub("9", "September", merged$Month)
merged$Month = gsub("8", "August", merged$Month)
merged$Month = gsub("7", "July", merged$Month)
merged$Month = gsub("6", "June", merged$Month)
merged$Month = gsub("5", "May", merged$Month)
merged$Month = gsub("4", "April", merged$Month)
merged$Month = gsub("3", "March", merged$Month)
merged$Month = gsub("2", "February", merged$Month)
merged$Month = gsub("1", "January", merged$Month)
index = c(1:12)
values = c("January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December")
ui = fluidPage(
sidebarPanel(
sliderInput("Month", "Month: Jan=1, Dec=12",min = 1, max =
12,step=1,value=1)),
mainPanel(plotOutput("plot")))
server = function(input,output){
sliderInput(inputId="Month",
label="Month: Jan=1, Dec=12",
min = 1,
max = 12,
value=1,
step=1)
mainPanel(plotOutput("plot"))
dat = reactive({
test <- merged[merged$Month %in%
seq(from=min(input$Month),to=max(input$Month),by=1),]
})
output$plot = renderPlot({
ggplot(data=merged, aes(x=Symptoms.Past30D, y = STATE)) +
geom_errorbarh(aes(xmin=ConfUp,xmax=ConfDown), height=1, linetype = 1) +
xlab ("Mean Sympotms.Past30D (SEM)") + ylab ("STATE") +
labs(title=paste(values[match(input$Month, index)]))
})
}
shinyApp(ui, server)`
I am looking at the change in maximum temperature per month, from 1954-2000 using data thus:
http://pastebin.com/37zUkaA4
I have decided to only plot the abline for each month on the graph for clarity. My code is as follows:
OxTemp$Month <- factor(OxTemp$Month, levels=c("January", "February", "March","April", "May", "June", "August", "September", "October", "November", "December"), ordered=TRUE)
p<-ggplot(OxTemp, aes(x=Year, y=MaxT, group=Month, colour=Season, linetype=Month))
p+geom_smooth(method = 'lm',size = 1, se = F)
Which gives me the following plot:
I was wondering if there was a way to:
a) Change the colours in the "Month" legend to match the colours in the "Season" legend
b) Make the legends a little wider so that the linetypes are more visible
c) Add a label of each line's gradient to the plot, such that to the right handside of each line the slope value is displayed
Many thanks!
OxTemp <- read.table("http://pastebin.com/raw.php?i=37zUkaA4",header=TRUE,stringsAsFactors=FALSE)
library(ggplot2)
OxTemp$Month <- factor(OxTemp$Month,
levels=c("Jan", "Feb", "Mar","Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), ordered=TRUE)
OxTemp$Season <- factor(OxTemp$Season,
levels=c("Spring", "Summer", "Autumn", "Winter"), ordered=TRUE)
library(plyr)
slopedat <- ddply(OxTemp,.(Month),function(df) data.frame(slope=format(signif(coef(lm(MaxT~Year,data=df))[2],2),scientific=-2),
y=max(predict(lm(MaxT~Year,data=df)))))
p <- ggplot(OxTemp, aes(x=Year, y=MaxT)) +
geom_smooth(aes(group=Month, colour=Season, linetype=Month),method = 'lm',size = 1, se = F) +
scale_colour_manual(values=c("Winter"= 4, "Spring" = 1, "Summer" = 2,"Autumn" = 3)) +
geom_text(data=slopedat,aes(x=2005,y=y,label=paste0("slope = ",slope))) +
scale_x_continuous(limits=c(1950, 2010)) +
guides(linetype=guide_legend(override.aes=list(colour=c("Jan"= 4, "Feb" = 4, "Mar" = 1,
"Apr" = 1, "May" = 1, "Jun" = 2,
"Jul" = 2, "Aug" = 2, "Sep" = 3,
"Oct" = 3, "Nov" = 3, "Dec" = 4)),keywidth=5))
print(p)