Plot two sub-variables during a 12 month period - R - r

The table shows the first row with 12 month names and the values of visitors, with portuguese (Portugal) and foreigners (ESTRANGEIRO) (ignore the row with no names)
How can I plot, in ggplot2, a bar graph that shows the portuguese visitors and the foreigners visitors during the 12 month period?

Usually it is better to provide some reproducible code example than to submit a screenshot, see e.g. here: Click
To accomplish what you want to do, you will have to change your format a little bit. Given a dataframe that looks like yours and using reshape2:
df <- data.frame(month=factor(c("Jan","Feb","Mar"),labels=c("Jan","Feb","Mar"),ordered=TRUE),
portugal=c(4000,2330,3000),
foreigner=c(4999,2600,3244),
stringsAsFactors = FALSE)
library(reshape2)
plotdf<-melt(df)
colnames(plotdf)<-c("Month","Country","Visitors")
levels(plotdf$Country)<-c("Portgual","Foreigners")
ggplot(plotdf,aes(x=Month,y=Visitors,fill=Country)) +
geom_bar(stat="identity",position=position_dodge()) +
xlab("Month") +
ylab("Visitors")

Related

Changing period of dates to standard date to do line graph

I'm trying to plot a line graph with R using the dataset that can be found here . I'm looking specifically at how to plot the number of cases in each region i.e. north east, north west etc against the period of time.
However, as the date is a period of a week rather than a standard date, how can I convert it to make the line graph actually possible? For example, right now it has the dates as 01/09/2020 - 07/09/2020. How can I use this for a line graph?
Sorry if my explanation isn't clear, here is a picture below.
I assume you're trying to plot a time series? You could just trim the dates to the beginning of the week and label the time axis as "Week beginning on date". You could do this with substr() in base r and keep the first 10 characters.
substr(data$column,1,10)
You may also want to format it as a date, easiest with the lubridate package, something like dmy() (day month year).
Here is the full code you would want:
library(tidyverse)
#Read in data
data <- read.csv("/Users/sabrinaxie/Downloads/covid19casesbysociodemographiccharacteristicengland1sep2020to10dec20213.csv")
#Modify data and remove extraneous top rows
data <- data %>%
rename(Period=Table.9..Weekly.estimates.of.age.standardised.COVID.19.case.rates..per.100.000.person.weeks..by.region..England..1.September.2020.to.6.December.20211.2.3) %>%
slice(3:n())
#Keep first 10 characters of Period column and assign to old column to replace
data$Period <- substr(data$Period,1,10)
#Parse as date
data$Period <- dmy(data$Period)

Creating a Cumulative Sum Plot using ggplot with duplicate x values

In my hypothetical example, people order ice-cream at a stand and each time an order is placed, the month the order was made and the number of orders placed is recorded. Each row represents a unique person who placed the order. For each flavor of ice-cream, I am curious to know the cumulative orders placed over the various months. For instance if a total of 3 Vanilla orders were placed in April and 4 in May, the graph should show one data point at 3 for April and one at 7 for May.
The issue I am running into is each row is being plotted separately (so there would be 3 separate points at April as opposed to just 1).
My secondary issue is that my dates are not in chronological order on my graph. I thought converting the Month column to Date format would fix this but it doesn't seem to.
Here is my code below:
library(lubridate)
Flavor <- c("Vanilla", "Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","chocolate","chocolate","chocolate")
Month <- c("1-Jun-21", "1-May-19", "1-May-19","1-Apr-19", "1-Apr-19","1-Apr-19","1-Apr-19", "1-Mar-19", "1-Mar-19", "1-Mar-19","1-Mar-19", "1-Apr-19", "1-Mar-19", " 1-Apr-19", " 1-Jan-21", "1-May-19", "1-May-19","1-May-19","1-May-19","1-Jun-19","2-September-19", "1-September-19","1-September-19","1-December-19","1-May-19","1-May-19","1-Jun-19")
Orders <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)
data <- data.frame(Flavor,Month,Orders)
data$Month <- dmy(data$Month)
str(data)
data2 <- data[data$Flavor == "Vanilla",]
ggplot(data=data2, aes(x=Month, y=cumsum(Orders))) + geom_point()
In these situations, it's usually best to pre-compute your desired summary and send that to ggplot, rather than messing around with ggplot's summary functions. I've also added a geom_line() for clarity.
data %>%
group_by(Flavor, Month) %>%
summarize(Orders = sum(Orders)) %>%
group_by(Flavor) %>%
arrange(Month) %>%
mutate(Orders = cumsum(Orders)) %>%
ggplot(data = ., aes(x=Month, y=Orders, color = Flavor)) + geom_point() + geom_line()

Time series plot in R skips dates on x-axis

I'm trying to create a simple time series plot in R with the following data (it's in tbl format):
Date sales
<date> <dbl>
1 2010-02-05 1105572.
2 2010-09-03 1048761.
3 2010-11-19 1002791.
4 2010-12-24 1798476.
5 2011-02-04 1025625.
6 2011-11-18 1031977.
When I use the following command: plot(by_date$Date, by_date$sales, type = 'l'), the resulting graph just skips the individual dates, as I want it to display, and just shows the year on the x-axis, like this (please ignore the axis labels for now):
I've checked the format of the date column using class(by_date$Date) and it does show up as 'Date'. I've also checked other threads here and the closest one that came to answering my query is the one here. Tried that approach but didn't work for me, while plotting in ggplot or converting data to data frame didn't work either. Please help, thanks.
With ggplot this should work -
library(ggplot2)
ggplot(by_date, aes(Date, sales)) + geom_line()
You can use scale_x_date to format your x-axis as you want.

Plotting 52 week range in R

I am trying to pull stock price data using tq_get in tidyquant, then want to plot the current price against the 52 week range. Here is an example of what I am looking to create.
Basically just a visual representation of where the stock is currently trading in relation to its 52 week range. Below is the code I have begun to load in the appropriate values for TSLA. First, I am wondering if it is possible to set the "from" and "to" dates so that they constantly update to be exactly one year ago and the current date, respectively? Second, is there a ggplot or another package that might be able to generate a similar plot? I've explored boxplots, but really I need something even more simple than that, as I really only need one axis. Thanks in advance!
X <- tq_get(c("^GSPC","TSLA"),get="stock.prices",from="2019-05-04", to="2020-05-04")
TSLA <- X %>% filter(symbol == "TSLA") %>% tk_xts()
chartSeries(TSLA)
TSLAlow <- min(TSLA$close)
TSLAlow
TSLAhigh <- max(TSLA$close)
TSLAhigh
TSLAclose <- tail(X$close, n=1)
TSLAclose
TSLArange <- tibble(TSLAlow, TSLAhigh, TSLAclose)

Force ggplot scales to start on e.g. 1st of year, 1st of month etc

I'm looking for a way to force the date labels on a ggplot to start at a (seemingly) logical time. I've had the problem a number of times but my current problem is I want the breaks to be on the 01/01/yyyy
My data is a large dataset with POSIXct Date column, data to plot in Flow column and a number of site names in the Site column.
library(ggplot2)
library(scales)
ggplot(AllFlowData, aes(x=Date, y = Flow, colour = Site))+geom_line()+
scale_x_datetime(date_breaks = "1 year", expand =c(0,0),labels=date_format("%Y"))
I can force the breaks to be every year and they appear okay without the labels=date_format("%Y") (starting on 01/01 each year) but if I include labels=date_format("%Y") (as there is 10 years of data so gets a bit messy) the date labels move to ~November, and 1989 is the first label even though my data starts on the 01/01/1990.
I have had this problem numerous times in the past on different time steps, such as wanting to force it to the 1st of the month or daily times to be at midnight instead during the day. Is there a generic way to do this?
I have looked at create specific date range in ggplot2 ( scale_x_date), but I do not want to have to hard code my breaks as I have a fair few plots to do with different date ranges.
Thanks
If the dates come to you in a vector like:
dates <- seq.Date(as.Date("2001-03-04"), as.Date("2001-11-04"), by="day")
## "2001-03-04" "2001-03-05" "2001-03-06" ... "2001-11-03" "2001-11-04"
use pretty.Dates() to make a best guess about the end points.
range(pretty(dates))
## "2001-01-01" "2002-01-01"
Then pass this range to ggplot.
However, I recommend coord_cartesian() instead of scale_x_date(). Typically I want to crop the graphic bounds, instead of flat-out exclude the values entirely (which can mess up things like a loess summary).

Resources