On the bottom image, I have a graph produced by this code:
library(lubridate)
shangPM$date <- with(shangPM, ymd_h(paste(year, month, day, hour, sep= ' ')))
ggplot(data = shangPM, aes(x = date, y = PM_US.Post)) +
geom_line()
However, there is four years shown on my x-axis with no data, making the graph look weird. I tried using xlim and coord_cartesian, but this does not seem to be working well with my date variable (maybe I'm wrong?)
A bit of a noob here - can someone help me zoom in on only the dates I have data for for my plot?
Here is my error:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
Related
I have a dataset where one of the columns is dates but in character format. I used the following code to convert it to dates format and then take the month only:
library(lubridate)
dates <- dmy(Austria$date)
Month <- month(dates, label = TRUE, abbr = FALSE)
The problem is that I am taking levels back for the months which I don't want to. I searched on how to remove the levels but everything I found was about removing levels that are unused (which is not my case).
I also, used the as,Date but I am still having the same problem:
dates_Austria <- as.Date(Austria$date, "%d/%m/%Y")
My final purpose is to make a plot which will have unemployment on the horizontal axis, income level on the vertical axis and then change the color of the plot according to the month, like that:
ggplot(data = my_data, aes(x = unemployment, y = income, colour = Month)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
But by using that code I am getting back different regression lines according to the month. I want one line for all the data and the the rest of the dots of the scatter plot to change colour according to the month.
Any help would be appreciated.
I am creating a frequency plot using the geom_freqpoly function in ggplot2. I have a large data set of social media comments across 14 months and am plotting the number of comments for each week of that data. I am using this code, first converting the UTC to POSIXct and the doing the frequency plot:
ggplot(data = TRP) +
geom_freqpoly(mapping = aes(x = created_utc), binwidth = 604800)
This is creating a plot that looks like this:
I want however to top and tail the plot, as it touches 'zero' at both the start and end, making it look like there was rapid growth and rapid decline. This is not the case as this is simply a snapshot of the data, which exists before and after my analysis. The data begins at the 4,000 mark and ends at around 2,000 and I want it represented like that. I have checked the 'pad' instruction and have insured it is set at FALSE.
Any help as to why this may be occurring would be greatly appreciated.
Thanks!
Rather than adjusting the geom_freqpoly to work differently than intended, it might be simpler to calculate the weekly totals yourself and use geom_line:
library(lubridate); library(dplyr)
set.seed(1)
df <- data.frame(
datetime = ymd_h(2018010101) + dhours(runif(1000, 0, 14*30*24))
)
df %>%
count(week_count = floor_date(datetime, "1 week")) %>%
ggplot(aes(week_count, n)) +
geom_line()
I created a difftime object to determine the amount of hours it takes to report a crime that has occurred. Also, in the same dataset I have a variable which indicates whether the crime occurred on a weekday or in the weekend. Now I'd like to create a ggplot2 boxplot with 'weekday' and 'weekend' on the x-axis and use difftime on the y-axis.
I used:
ggplot(data = data, aes(x = workday, y = difftime_var)) +
geom_boxplot()
However, this gives the warning: Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
I'd like to adjust the boxplot in such way that it looks like a 'real' boxplot, showing the mean amount of time it takes etc. Right now, it's basically a flat line at the bottom of the graph with a few dots above. The y-axis goes from 0 to 40 000. Probably because the min and max value of the difftime object are very small / large.
Thanks in advance for helping out!
Please provide an reproducible example dataset to your question.
I guess the problem is that difftime has a huge range, which makes it impossible to show a boxplot. First thing you can try is
ggplot(data = data, aes(x = workday, y = difftime_var)) +
geom_boxplot(outlier.shape=NA)
Another (not elegant) way is to set a limit to the yaxis:
ggplot(data = data, aes(x = workday, y = difftime_var)) +
geom_boxplot() + ylim(ymin, ymax)
For more information, there was a similar question asked before:
How to remove outliers in boxplot in R?
I create a dummy timeseries xts object with missing data on date 2-09-2015 as:
library(xts)
library(ggplot2)
library(scales)
set.seed(123)
seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour")
ob1 <- xts(rnorm(length(seq),150,5),seq)
seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour")
ob2 <- xts(rnorm(length(seq2),170,5),seq2)
final_ob <- rbind(ob1,ob2)
plot(final_ob)
# with ggplot
df <- data.frame(time = index(final_ob), val = coredata(final_ob) )
ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d"))
After plotting my data looks like this:
The red coloured rectangular portion represents the date on which data is missing. How should I show that data was missing on this day in the main plot?
I think I should show this missing data with a different colour. But, I don't know how should I process data to reflect the missing data behaviour in the main plot.
Thanks for the great reproducible example.
I think you are best off to omit that line in your "missing" portion. If you have a straight line (even in a different colour) it suggests that data was gathered in that interval, that happened to fall on that straight line. If you omit the line in that interval then it is clear that there is no data there.
The problem is that you want the hourly data to be connected by lines, and then no lines in the "missing data section" - so you need some way to detect that missing data section.
You have not given a criteria for this in your question, so based on your example I will say that each line on the plot should consist of data at hourly intervals; if there's a break of more than an hour then there should be a new line. You will have to adjust this criteria to your specific problem. All we're doing is splitting up your dataframe into bits that get plotted by the same line.
So first create a variable that says which "group" (ie line) each data is in:
df$grp <- factor(c(0, cumsum(diff(df$time) > 1)))
Then you can use the group= aesthetic which geom_line uses to split up lines:
ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change
scale_x_datetime(labels = date_format("%Y-%m-%d"))
I am having a real hard time with ggplot function!
I try to briefly explain my problem.
I have a dataset of several tweets associated to a time stamp; I would like to plot the data obtaining a graph with time on the x bar and the frequency or the "tweet-rate" per hour on the y axis.
What did I do?
library(ggplot2)
c4l.tweets <- read.csv("/Users/vincenzo/Desktop/Collect %23c4l13 Tweets - Archive.csv")
c4l.tweets$time <- as.POSIXct(strptime(c4l.tweets$time, "%d/%m/%Y %H:%M:%S", tz="CST") - 6*60*60)
library(chron)
c4l.tweets$by.hour <- trunc(c4l.tweets$time, units="hours")
ggplot(count(c4l.tweets, "by.hour"), aes(x=by.hour, y=freq))
+ geom_bar(stat="identity") + xlab("Number") + ylab("Date") + labs(title="tweets by hour")
So basically I truncated the data by the timestamp and used the count function to plot them.
I get the
Error: No layers in plot
and
Error in +geom_bar(stat = "identity") : argument not valid for the operator
But why? what am I doing wrong?
I usually have this problem each time i try to plot something via ggplot, what do I do wrong?
Thank you!
Vincenzo