I am working with following data set:
team, time, rank1, rank2, rank3, rank4, rank5
bull, 20180102,0,0,0,0,1
corn, 20180102,0,29,0,0,1
fivfo, 20180102,23,4,0,0,1
lazy, 20180102,0,0,0,0,1
tt, 20180102,0,4,222,0,1
cheer, 20180102,23,0,34,0,1
manup, 20180102,0,13,0,0,1
bull, 20180103,0,10,0,10,1
corn, 20180103,0,59,0,0,1
fivfo, 20180103,43,4,0,0,1
lazy, 20180103,0,0,0,0,1
tt, 20180103,0,4,122,0,1
cheer, 20180103,23,0,34,0,11
manup, 20180103,0,13,10,0,11
Goal is to plot rank per team while reflecting the date time. I was trying to use melt but can't really figure which axis is to be melt against.
I tried to use the melt as follows:
melt.s <- melt(s, id=c("team","time"))
ggplot(melt.s,aes(x=time,y=value,colour=variable,group=variable)) + geom_line()
problem with the above is that team name doesn't really appear key take away of the plot that I want to show case is team and the number of time that they have reached the rank.
Trying to figure the best way to plot but so far thinking following
rank5 |
rank4 |
rank3 | legend (team)
rank2 |
rank1 |___________________
time
Perhaps something like this:
library(tidyr); library(lubridate)
gather(s, rank, rank_count, -c(team, time)) %>%
mutate(time = ymd(time)) %>%
ggplot(aes(time, rank_count)) +
geom_line() +
ggrepel::geom_text_repel(aes(label = rank_count), size = 3) +
scale_x_date(date_labels = "%b %d") +
facet_grid(rank~team)
Related
I am using ggplot(). I have a dataset that has a variable called "state_crossing" which takes multiple string values - like "baja california", "sonora" and "tamaulipas". I saw some tricks on other threads on how to make multiple lines without needing to call a geom_line() for each variable (e.g., here Plotting two variables as lines using ggplot2 on the same graph).
Then, I created a new variable value = 1 for each observation, and I am using the following command:
plotting <- data.frame(
year = rep(c("2001", "2002"), times=c(4,5)),
state_crossing = c("baja california", "baja california", "sonora", "tamaulipas", "sonora", "sonora", "tamaulipas", "tamaulipas", "baja california"),
value = rep(1, 9)
)
ggplot(plotting, aes(x=year, y=value)) +
geom_line(aes(color=state_crossing, group=state_crossing), stat = "summary", fun = "sum")
This is great, but it naturally plots the sum of occurrences, whereas I wanted the fraction of observations with that value of state_crossing within each value of x = year. The function "mean" doesn't work, as the mean is equal to 1 for the variable. Any idea of a "fun" that could give me the fraction?
E.g., on my reproducible example, I'd like "baja california" to show 2/3 in 2001, and then 0 in 2002; and "sonora" to show up as "1/3" and then "1/2"
Trying to summarize data with ggplot can be a bit of a headache. It's great just for plotting data. So its easiest if you summarize your data into proportions first using something like dplyr and then you can easily plot the data.
library(dplyr)
plotting %>%
count(year, state_crossing) %>%
group_by(year) %>%
mutate(prop=n/sum(n)) %>%
ggplot() +
aes(x=year, y=prop, color=state_crossing, group=state_crossing) +
geom_line()
I want to show change in job numbers within certain time period. Ideally, I'd like to use a ggplot2 geom_dotplot and then color those dots by the column that they are in for that month. One idea I have not tried yet: do I need to reformat my data using tidyr from a wide to a long format in order to plot this?
Example data
Month Finance Tech Construction Manufacturing
Jan 14,000 6,800 11,000 17,500
Feb 11,500 8,400 9,480 15,000
Mar 15,250 4,200 7,200 12,400
Apr 12,000 6,400 10,300 8,500
My current r code attempt: I know that I need to fill the dot color by a factor of industry type. Maybe I have to have the data in a long format to do so.
library(tidyverse)
g <- ggplot(dat, aes(x = Month)) +
geom_dotplot(stackgroups = TRUE, binwidth = 1000, binpositions = "all") +
theme_light()
g
Here's how the plot I'm trying to make could look. Ideally I'd like to bin the dots as one dot per 1000 in the column value. Is that possible?
Thank you for taking the time to help someone who is new to R and is studying in school. Much appreciated as always,
I could not get the geom_dotplot to work, the y-axis always comes out wrong. Try something like, first pivot long and we repeat the Month+category per every 1000, note this solution below rounds up:
library(dplyr)
library(tidyr)
library(ggplot2)
test = pivot_longer(dat,-Month,names_to="category") %>%
group_by(Month,category) %>%
summarize(bins=ceiling(value/ 1000)) %>%
uncount(bins)
If you would prefer to round down to the nearest 1000, use floor() instead of ceiling() .
Then plot:
test$Month = factor(test$Month,levels=dat[,1])
test %>% ggplot(aes(x=Month,y=1,col=category)) +
geom_point(position=position_stack()) +
scale_y_continuous(labels=scales::number_format(scale=1000))
SO!
I am trying to create a plot of monthly deviations from annual means for temperature data using a bar chart. I have data across many years and I want to show the seasonal behavior in temperatures between months. The bars should represent the deviation from the annual average, which is recalculated for each year. Here is an example that is similar to what I want, only it is for a single year:
My data is sensitive so I cannot share it yet, but I made a reproducible example using the txhousing dataset (it comes with ggplot2). The salesdiff column is the deviation between monthly sales (averaged acrross all cities) and the annual average for each year. Now the problem is plotting it.
library(ggplot2)
df <- aggregate(sales~month+year,txhousing,mean)
df2 <- aggregate(sales~year,txhousing,mean)
df2$sales2 <- df2$sales #RENAME sales
df2 <- df2[,-2] #REMOVE sales
df3<-merge(df,df2) #MERGE dataframes
df3$salesdiff <- df3$sales - df3$sales2 #FIND deviation between monthly and annual means
#plot deviations
ggplot(df3,aes(x=month,y=salesdiff)) +
geom_col()
My ggplot is not looking good at the moment-
Somehow it is stacking the columns for each month with all of the data across the years. Ideally the date would be along the x-axis spanning many years (I think the dataset is from 2000-2015...), and different colors depending on if salesdiff is higher or lower. You are all awesome, and I would welcome ANY advice!!!!
Probably the main issue here is that geom_col() will not take on different aesthetic properties unless you explicitly tell it to. One way to get what you want is to use two calls to geom_col() to create two different bar charts that will be combined together in two different layers. Also, you're going to need to create date information which can be easily passed to ggplot(); I use the lubridate() package for this task.
Note that we combine the "month" and "year" columns here, and then useymd() to obtain date values. I chose not to convert the double valued "date" column in txhousing using something like date_decimal(), because sometimes it can confuse February and January months (e.g. Feb 1 gets "rounded down" to Jan 31).
I decided to plot a subset of the txhousing dataset, which is a lot more convenient to display for teaching purposes.
Code:
library("tidyverse")
library("ggplot2")
# subset txhousing to just years >= 2011, and calculate nested means and dates
housing_df <- filter(txhousing, year >= 2011) %>%
group_by(year, month) %>%
summarise(monthly_mean = mean(sales, na.rm = TRUE),
date = first(date)) %>%
mutate(yearmon = paste(year, month, sep = "-"),
date = ymd(yearmon, truncated = 1), # create date column
salesdiff = monthly_mean - mean(monthly_mean), # monthly deviation
higherlower = case_when(salesdiff >= 0 ~ "higher", # for fill aes later
salesdiff < 0 ~ "lower"))
ggplot(data = housing_df, aes(x = date, y = salesdiff, fill = as.factor(higherlower))) +
geom_col() +
scale_x_date(date_breaks = "6 months",
date_labels = "%b-%Y") +
scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
theme_bw()+
theme(legend.position = "none") # remove legend
Plot:
You can see the periodic behaviour here nicely; an increase in sales appears to occur every spring, with sales decreasing during the fall and winter months. Do keep in mind that you might want to reverse the colours I assigned if you want to use this code for temperature data! This was a fun one - good luck, and happy plotting!
Something like this should work?
Basically you need to create a binary variable that lets you change the color (fill) if salesdiff is positive or negative, called below factordiff.
Plus you needed a date variable for month and year combined.
library(ggplot2)
library(dplyr)
df3$factordiff <- ifelse(df3$salesdiff>0, 1, 0) # factor variable for colors
df3 <- df3 %>%
mutate(date = paste0(year,"-", month), # this builds date like "2001-1"
date = format(date, format="%Y-%m")) # here we create the correct date format
#plot deviations
ggplot(df3,aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col()
Of course this results in a hard to read plot because you have lots of dates, you can subset it and show only a restricted time:
df3 %>%
filter(date >= "2014-1") %>% # we filter our data from 2014
ggplot(aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # adds label rotation
I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos.
So here again:
I have a dataframe like this:
Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/
Date DENI011
1 1993-01-01 9.946
2 1993-01-02 13.663
3 1993-01-03 6.502
4 1993-01-04 6.031
5 1993-01-05 15.241
6 1993-01-06 6.561
....
....
6569 2010-12-26 44.113
6570 2010-12-27 34.764
6571 2010-12-28 51.659
6572 2010-12-29 28.259
6573 2010-12-30 19.512
6574 2010-12-31 30.231
I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. So I want to have something like this:
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot
Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines.
I found several similar questions here, but nothing works for me. I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object.
Then I tried it this way:
Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.
g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) +
geom_line() +
scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)
That also didnt work, the plot looks horrible. I dont need to put all the years in 1 plot from 1993-2010. Actually only a few years would be ok, like from 1998-2006 maybe.
And suggestions, how to solve this?
As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. However, it's also possible to retain daily data in a similar plot.
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11
library(tidyverse)
library(lubridate)
# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)
# Parse the dates, and use lower case names
df <- as_tibble(raw) %>%
rename_all(tolower) %>%
mutate(date = ymd(date))
One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot.
# Define the plot
p <- df %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, deni011, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
In this case though, your daily data are quite variable, so this is a bit of a mess. You could hone in on a single year to see the daily variation a bit better.
# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
geom_line(data = function(x) filter(x, year == 2010), size = 1)
But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. Or, indeed, some monthly aggregate.
# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).
There are multiple values from one month, so when plotting your original data, you got multiple points in one month. Therefore, the line looks strange.
If you want to create something similar to the example your provided, you have to summarize your data by year and month. Below I calculated the mean of each year and month for your data. In addition, you need to convert your year and month to factors if you want to plot it as discrete variables.
library(dplyr)
Ref_Data2 <- Ref_Data %>%
group_by(MonthN, YearN, Month) %>%
summarize(DENI011 = mean(DENI011)) %>%
ungroup() %>%
# Convert the Month column to factor variable with levels from Jan to Dec
# Convert the YearN column to factor
mutate(Month = factor(Month, levels = unique(Month)),
YearN = as.factor(YearN))
g <- ggplot(data = Ref_Data2,
aes(x = Month, y = DENI011, group = YearN, colour = YearN)) +
geom_line()
g
If you don't want to add in library(dplyr), this is the base R code. Exact same strategy and results as www's answer.
dat <- read.delim("~/Downloads/df1.dat", sep = " ")
dat$Date <- as.Date(dat$Date)
dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)
month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)
ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
geom_path()
I have been trying to plot a line plot with ggplot.
My data looks something like this:
I04 F04 I05 F05 I06 F06
CAT 3 12 2 6 6 20
DOG 0 0 0 0 0 0
BIEBER 1 0 0 1 0 0
and can be found here.
Basically, we have a certain number of CATs (or other creatures) initially in a year (this is I04), and a certain number of CATs at the end of the year (this is F04). This goes on for some time.
I can plot something like this fairly simply using the code below, and get this:
This is fantastic, but doesn't work very well for me. After all, I have these staring and ending inventory for each year. So I am interested in seeing how the initial values (I04, I05, I06) change over time. So, for each animal, I would like to create two different lines, one for initial quantity and one for final quantity (F01, F05, F06). This seems to me like now I have to consider two factors.
This is really difficult given the way my data is set up. I'm not sure how to tell ggplot that all the I prefixed years are one factor, and all the F prefixed years are another factor. When the dataframe gets melted, it's too late. I'm not sure how to control this situation.
Any advice on how I can separate these values or perhaps another, better way to tackle this situation?
Here is the code I have:
library(ggplot2)
library(reshape2)
DF <- read.csv("mydata.csv", stringsAsFactors=FALSE)
## cleaning up, converting factors to numeric, etc
text_names <- data.frame(as.character(DF$animals))
names(text_names) <- c("animals")
numeric_cols <- DF[, -c(1)]
numeric_cols <- sapply(numeric_cols, as.numeric)
plot_me <- data.frame(cbind(text_names, numeric_cols))
plot_me$animals <- as.factor(plot_me$animals)
meltedDF <- melt(plot_me)
p <- ggplot()
p <- p + geom_line(aes(seq(1:36), meltedDF$value, group=meltedDF$animals, color=meltedDF$animals))
p
Using your original data from the link:
nd <- reshape(mydata, idvar = "animals", direction = "long", varying = names(mydata)[-1], sep = "")
ggplot(nd, aes(x = time, y = I, group = animals, colour = animals)) + geom_line() + ggtitle("Development of initial inventories")
ggplot(nd, aes(x = time, y = F, group = animals, colour = animals)) + geom_line() + ggtitle("Development of final inventories")
I think from a data analyst perspective the following approach might provide better insight.
For each animal we visualize the initial and the final quantity in a separate panel. Moreover, each subplot has its own y scale because the values of the different animal types are radically different. Like this, differences within and across animal types are easier to spot.
Given the current structure of your data, we do not need two different factors. After the gather call the indicator column includes data like I04, F04, etc. We just need to separate the first character from the rest resulting in two columns type and time. We can use type as the argument for color in the ggplot call. time provides a unified x-axis across all animal types.
library(tidyr)
library(dplyr)
library(ggplot2)
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep = 1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = type)) +
geom_line() +
facet_grid(animals ~ ., scales = "free_y")
Of course, you might also do it the other way round, namely using a subplot for the initial and the final quantities like this:
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep=1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = animals)) +
geom_line() +
facet_grid(type ~ ., scales = "free_y")
But as described above, I would not recommend that because the y scale varies too much across animal types.