I have a dataset where one of the columns is dates but in character format. I used the following code to convert it to dates format and then take the month only:
library(lubridate)
dates <- dmy(Austria$date)
Month <- month(dates, label = TRUE, abbr = FALSE)
The problem is that I am taking levels back for the months which I don't want to. I searched on how to remove the levels but everything I found was about removing levels that are unused (which is not my case).
I also, used the as,Date but I am still having the same problem:
dates_Austria <- as.Date(Austria$date, "%d/%m/%Y")
My final purpose is to make a plot which will have unemployment on the horizontal axis, income level on the vertical axis and then change the color of the plot according to the month, like that:
ggplot(data = my_data, aes(x = unemployment, y = income, colour = Month)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
But by using that code I am getting back different regression lines according to the month. I want one line for all the data and the the rest of the dots of the scatter plot to change colour according to the month.
Any help would be appreciated.
Related
I am using time series data (several months with stamps in the range of seconds). However, for a reproduceable example here, I used smaller but similar data from the NAB Corpus (Number of twitter posts regarding the Apple stock over time)
# packages
library(readr)
library(magrittr)
library(scales)
library(ggplot2)
# data
twttr_aapl <- "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AAPL.csv"
twttr_aapl_ts <- read_csv(twttr_aapl,
col_types = cols(timestamp = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
value = col_integer()))
I really like to use the ggforce package to highlight patterns in the data by zooming in via facet_zoom; e.g., before an anomaly appears. The result is two plots on top of each other that look something like this:
twttr_aapl_ts %>% ggplot(aes(x = timestamp, y = value)) +
geom_line() +
scale_x_datetime("time",
# date_breaks = "hour",
# position = "top",
labels = scales::label_date_short(),
breaks = breaks_pretty()) +
scale_y_continuous(expression("log"[10]*"(# tweets)"),
trans = "log10",
breaks = breaks_log()) +
ggforce::facet_zoom(xlim = c(as.POSIXct("2015-03-14 22:00:00"),
as.POSIXct("2015-03-18 18:00:00")),
horizontal = FALSE) +
theme_bw() -> twttr_plt_01
Is it possible to access only one of the plots or at least its axis? I’d like to
change the x-axis position of the upper plot only ("top"), so that it no longer intersects with the zoom lines, i.e., the axis position of the lower plot should remain the same (bottom). However, applying position = "top" changes the position of both of the two x-axis.
change the breaks so that hours are displayed in the bottom plot. But using date_breaks = "hour" also changes the breaks for both axes.
scale the y-axis of the upper plot, only (e.g. to trans = "log10") but not the one of the lower plot.
change the way the labels are displayed (in the given example, only day and time are relevant in the bottom plot, month and year are redundant).
So, my question is not only related to scales rather it is a general one. E.g., if I want to change only the line color in the lower plot, only.
I would like to create an interactive histogram with dates on the x-axis.
I have used ggplot+ggplotly.
I've read I need to use to pass the proper information using the "text=as.character(mydates)" option and sometimes "tooltips=mytext".
This trick works for other kinds of plots but there is a problem with the histograms, instead of getting a single bar with a single value I get many sub-bars stacked.
I guess the reason is passing "text=as.character(fechas)" produces many values instead of just the class value defining that bar.
How can I solve this problem?
I have tried filtering myself the data but I don't know how to make this the parameters match the parameters used by the histogram, such as where the dates start for each bar.
library(lubridate)
library(ggplot2)
library(ggplotly)
Ejemplo <- data.frame(fechas = dmy("1-1-20")+sample(1:100,100, replace=T),
valores=runif(100))
dibujo <- ggplot(Ejemplo, aes(x=fechas, text=as.character(fechas))) +
theme_bw() + geom_histogram(binwidth=7, fill="darkblue",color="black") +
labs(x="Fecha", y="Nº casos") +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
scale_x_date(date_breaks = "weeks", date_labels = "%d-%m-%Y",
limits=c(dmy("1-1-20"), dmy("1-4-20")))
ggplotly(dibujo)
ggplotly(dibujo, tooltip = "text")
As you can see, the bars are not regular histogram bars but something complex.
Using just ggplot instead of ggplotly shows the same problem, though then you woulnd't need to use the extra "text" parameter.
Presently, feeding as.character(fechas) to the text = ... argument inside of aes() will display the relative counts of distinct dates within each bin. Note the height of the first bar is simply a count of the total number of dates between 6th of January and the 13th of January.
After a thorough reading of your question, it appears you want the maximum date within each weekly interval. In other words, one date should hover over each bar. If you're partial to converting ggplot objects into plotly objects, then I would advise pre-processing the data frame before feeding it to the ggplot() function. First, group by week. Second, pull the desired date by each weekly interval to show as text (i.e., end date). Next, feed this new data frame to ggplot(), but now layer on geom_col(). This will achieve similar output since you're grouping by weekly intervals.
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
set.seed(13)
Ejemplo <- data.frame(fechas = dmy("1-1-20") + sample(1:100, 100, replace = T),
valores = runif(100))
Ejemplo_stat <- Ejemplo %>%
arrange(fechas) %>%
filter(fechas >= ymd("2020-01-01"), fechas <= ymd("2020-04-01")) %>% # specify the limits manually
mutate(week = week(fechas)) %>% # create a week variable
group_by(week) %>% # group by week
summarize(total_days = n(), # total number of distinct days
last_date = max(fechas)) # pull the maximum date within each weekly interval
dibujo <- ggplot(Ejemplo_stat, aes(x = factor(week), y = total_days, text = as.character(last_date))) +
geom_col(fill = "darkblue", color = "black") +
labs(x = "Fecha", y = "Nº casos") +
theme_bw() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_x_discrete(label = function(x) paste("Week", x))
ggplotly(dibujo) # add more text (e.g., week id, total unique dates, and end date)
ggplotly(dibujo, tooltip = "text") # only the end date is revealed
The "end date" is displayed once you hover over each bar, as requested. Note, the value "2020-01-12" is not the last day of the second week. It is the last date observed in the second weekly interval.
The benefit of the preprocessing approach is your ability to modify your grouped data frame, as needed. For example, feel free to limit the date range to a smaller (or larger) subset of weeks, or start your weeks on a different day of the week (e.g., Sunday). Furthermore, if you want more textual options to display, you could also display your total number of unique dates next to each bar, or even display the date ranges for each week.
I am trying to plot values on the y axis against years on the x axis with ggplot2.
This is the dataset: https://drive.google.com/file/d/1nJYtXPrxD0xvq6rBz2NXlm4Epi52rceM/view?usp=sharing
I want to plot the values of specific countries.
It won't work by just specifying year as the x axis and a country's values on the y axis. I'm reading I need to melt the data frame, so I did that, but it's now in a format that doesn't seem convenient to get the job done.
I'm assuming I haven't correctly melted, but I have a hard time finding what I need to specifically do.
What I did beforehand is manually transpose the data and make the years a column, as well as all the countries.
This is the dataset transposed:
https://drive.google.com/file/d/131wNlubMqVEG9tID7qp-Wr8TLli9KO2Q/view?usp=sharing
Here's how I melted:
inv_melt.data <- melt(investments_t.data, id.vars="Year")
ggplot() +
geom_line(aes(x=Year, y=value), data = inv_melt.data)
The plot shows the aggregated values of all countries per year, but I want them per country in such a manner that I can also select to plot certain countries only.
How do I utilize melt in such a manner? Could someone walk me through this?
There are no columns named "Year" in the linked to data set, there are columns per year. So it need to be melted by "country" and then the "variable" edited with sub.
inv_melt.data <- reshape2::melt(investments_t.data, id.vars="country")
inv_melt.data$variable <- as.integer(sub("^X", "", inv_melt.data$variable))
ggplot(inv_melt.data, aes(variable, value, color = country)) +
geom_line(show.legend = FALSE)
Edit.
The following code keeps only some countries, filtering out the ones with more missing values.
i <- sapply(investments_t.data[-1], function(x) sum(is.na(x)) == 0)
i <- c(1, which(i))
inv_melt.data <- reshape2::melt(investments_t.data[i], id.vars = "Year")
ggplot(inv_melt.data, aes(Year, value, color = variable)) +
geom_line(show.legend = FALSE)
I am trying to plot a line graph along with points for a data set.
Unfortunately I am unable to plot the line in the graph, however, the points are plotted?
I am getting the error as geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
can anyone help me out?
I have already made sure that all the variables are either numeric or character, and not factors. I also tried to label them as groups but it did not work.
df<- c('a','b','c','d','e')
df1<-1:5
df2<-11:15
df3<-21:25
df4<-cbind(df,df1,df2,df3)
colnames(df4)<-c("Names", "P1","P2","P3")
df4<-as.data.frame(df4)
dfplot <- gather(df4,key="Period", value="Price",-Names,P1,P2,P3 )
dfplot<-dfplot[order(dfplot$Names),]
vars<-c("Price")
vars1<-c("Names","Period")
dfplot[vars] <- sapply(dfplot[vars], as.numeric)
dfplot[vars1]<-sapply(dfplot[vars1], as.character)
ggplot(dfplot, aes(x = Period, y = Price, color = Names),group=5 ) + geom_line()+geom_point()
As your data is grouped, you need to tell the geom_line() which field the data is grouped on:
ggplot(dfplot, aes(x = Period, y = Price, color = Names),group=5 ) + geom_line(aes(group=Names))+geom_point()
Some more information
I have about 20 years of daily data in a time series. It has columns Date, rainfall and other data.
I am trying plot rainfall vs Time. I want to get 20 line plots with different colours and legend is generated that show the years in one graph. I tried the following codes but it is not giving me the desired results. Any suggestion to fix my issue would be most welcome
library(ggplot2)
library(seas)
data(mscdata)
p<-ggplot(data=mscdata,aes(x=date,y=precip,group=year,color=year))
p+geom_line()+scale_x_date(labels=date_format("%m"),breaks=date_breaks("1 months"))
It doesnt look great but here's a method. We first coerce the data into dates in the same year:
mscdata$dayofyear <- as.Date(format(mscdata$date, "%j"), format = "%j")
Then we plot:
library(ggplot2)
library(scales)
p <- ggplot(data = mscdata, aes(x = dayofyear, y = precip, group = year, color = year))
p + geom_line() +
scale_x_date(labels = date_format("%m"), breaks = date_breaks("1 months"))
While I agree with #Jaap that this may not be the best way to depict these data, try to following:
mscdata$doy <- as.numeric(strftime(mscdata$date, format="%j"))
ggplot(data=mscdata,aes(x=doy,y=precip,group=year)) +
geom_line(aes(color=year))
Although the given answers are good answers to your questions as it stands, i don't think it will solve your problem. I think you should be looking at a different way to present the data. #Jaap already suggested using facets. Take for example this approach:
#first add a month column to your dataframe
mscdata$month <- format(mscdata$date, "%m")
#then plot it using boxplot with year on the X-axis and month as facet.
p1 <- ggplot(data = mscdata, aes(x = year, y = precip, group=year))
p1 + geom_boxplot(outlier.shape = 3) + facet_wrap(~month)
This will give you a graph per month, showing the rainfall per year next to one each other. Because i use boxplot, the peaks in rainfall show up as dots ('normal' rain events are inside box).
Another possible approach would be to use stat_summary.