My dataset looks like below,
dat <- data.frame(ID = c(150,151,155,155,155,155,150), year = c(1995,2011,2012,2012,2013,2012,2013), Acceptance = c(no,yes,yes,yes,yes,no,no));
I wanted to plot a bar chart, for ID 155, with X-axis over the Year, and var 3 Which shows only Yes.
I have tried the below code
cl_d <- dat %>%
filter(ID==155)%>%
filter(year(Date)>2000)%>%
group_by(ID, year)%>%
summarise(count=n())
ggplot(cl_d, aes(year, count))+
geom_bar(stat='identity')
The bar plot should show the count of Acceptance for "Yes" over the Date greater than 2000 for the particular ID 155
Hey this code should work I alway try to avoid plugins if you have any questions left just ask!
dat <- data.frame(c(150,151,155,155,155,155,150),
c(1995,2011,2012,2012,2013,2012,2013),
c("no","yes","yes","yes","yes","no","no"))
colnames(dat)[1] <- "ID"
colnames(dat)[2] <- "Date"
colnames(dat)[3] <- "claim_count1"
NewData <- dat[dat$ID==155 & dat$Date > 2000 & dat$claim_count1== "yes",]
ggplot(data=NewData, aes(x=Date)) + geom_bar(stat ="count")
This?
dat %>%
filter(ID==155)%>%
filter(Acceptance == "yes") %>%
filter(year>2000) %>%
group_by(year) %>%
count() %>%
ggplot(aes(year, n))+
geom_col()
It appears you want year to be in date format and the graph to also be in the date format. If this is the case see the code below:
dat <- data.frame(ID = c(150,151,155,155,155,155,150),
year = c(1995,2011,2012,2012,2013,2012,2013),
Acceptance = c("no","yes","yes","yes","yes","no","no"))
dat$year <- as.Date(ISOdate(dat$year, 1, 1))
cl_d <- dat %>% filter(ID==155) %>%
subset(year > as.Date("2000-01-01")) %>%
group_by(ID, year) %>%
summarise(count=n())
ggplot(cl_d, aes(year, count)) +
geom_bar(stat='identity') +
scale_x_date(date_labels ="%Y", date_breaks = "1 year")
Is this what you're after?
library(tidyverse);
dat %>%
filter(ID == 155 & year >= 2000 & Acceptance == "yes") %>%
count(ID, year) %>%
ggplot(aes(as.factor(year), n)) +
geom_bar(stat = "identity") +
labs(x = "Year", y = "Count")
Sample data
dat <- data.frame(
ID = c(150,151,155,155,155,155,150),
year = c(1995,2011,2012,2012,2013,2012,2013),
Acceptance = c("no","yes","yes","yes","yes","no","no"));
Related
I have a simple two-column time-series dataset that looks like this:
Date Signups
22-Feb-18 601
23-Feb-18 500
24-Feb-18 6000
...
27-Apr-22 999
28-Apr-22 998
29-Apr-22 123
30-Apr-22 321
And I'm trying to make a simple line chart that shows the monthly total over time and then a point at the most recent month. But the filter within the geom_point is giving me a hard time. Here's what I have:
library(tidyverse)
library(scales)
library(lubridate)
signups %>%
mutate(Date = dmy(Date)) %>%
group_by(month(Date), year(Date)) %>%
mutate(month = paste0(month(Date),"-",year(Date))) %>%
mutate(month = my(month)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup() %>%
dplyr::filter(month >= "2018-03-01") %>%
ggplot(aes(month, monthly_total)) +
geom_line() +
geom_point(data = signups %>% dplyr::filter(month == "2022-03-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
If I comment out the geom_point it gives me the line chart that I'm looking for. But when the geom_point is included here it throws this error:
Error in dplyr::filter(., month == "2022-03-01") :
Caused by error in `month == "2022-03-01"`:
! comparison (1) is possible only for atomic and list types
I've tried using subset instead of filter and it didn't help. Let me know if you have any suggestions. Thanks!
The comment from Limey got us there. Here's what I needed to do:
signups <- signups %>%
mutate(Date = dmy(Date)) %>%
mutate(just_month = paste0(month(Date),"-",year(Date))) %>%
mutate(just_month = my(just_month)) %>%
group_by(month(Date), year(Date)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup()
signups %>%
dplyr::filter(just_month >= "2018-03-01") %>%
ggplot(aes(just_month, monthly_total)) +
geom_line(aes(just_month, monthly_total)) +
geom_point(data = dplyr::filter(signups, just_month == "2022-04-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
I would like to sort by ggplot facet_wrap by color.
For example, in this demo code, the color corresponds to groups A, B, C. I am looking to have all the red plots next to each other, and same for the blue and green plots.
I tried sorting my data by group but ggplot seems to switch the order when plotting.
library(tidyverse)
set.seed(42)
# Generate example data frame
id <- 1:15
data <- map(id, ~rnorm(10))
date <- map(id, ~1:10)
group <- map_chr(id, ~sample(c('a','b','c'), size=1))
df <- tibble(id=id, data=data, date=date, group=group) %>% unnest(cols = c(data, date))
# Generate plot
df %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ id)
This could help:
library(tidyverse)
set.seed(42)
# Generate example data frame
id <- 1:15
data <- map(id, ~rnorm(10))
date <- map(id, ~1:10)
group <- map_chr(id, ~sample(c('a','b','c'), size=1))
df <- tibble(id=id, data=data, date=date, group=group) %>% unnest(cols = c(data, date))
df2 <- df %>% mutate(id=factor(id))%>%
group_by(group) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(id = fct_reorder(id, N))
# Generate plot
df2 %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ id)
This would be a way (would have to get rid of the double title though):
df %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ group + id)
I got coronavirus df and I need to compare Israel and UK data from the time both countries had more than 10 confirmed patients, this is my code :
library(ggplot2)
library(dplyr)
#Data frame
df.raw <- read.csv(url('https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'))
str(df)
df <- df.raw
df$Date <- as.Date(df$Date)
str(df)
df.israel <- df %>% filter(Country == 'Israel', Confirmed>10)
df.uk <- df %>% filter(Country == 'United Kingdom', Confirmed>10)
if(df.israel$Date[1] > df.uk$Date[1]){
df.uk <- df.uk %>% filter(Date >= df.israel$Date[1])
} else {
df.israel <- df.israel %>% filter(Date >= df.uk$Date[1])
}
ggplot() +
geom_point(data = df.israel, aes(Date, Confirmed), color = 'blue') +
geom_point(data = df.uk, aes(Date,Confirmed), color = 'red')
Now, I need that my X axis will be numeric (1,2,3 etc) but I don't know how (tried xlim, scale_x_continuous) someone knows how to do this?
My graph
You can use match to get numbers instead of Date. Also it is better to get data in long format instead of creating two separate dataframes.
library(dplyr)
library(ggplot2)
df %>%
filter(Country %in% c('Israel', 'United Kingdom') & Confirmed>10) %>%
tidyr::pivot_longer(cols = Country) %>%
arrange(Date) %>%
mutate(day = match(Date, unique(Date))) %>%
ggplot() + aes(day, Confirmed, color = value) + geom_point() +
scale_color_manual(values = c('blue', 'red'))
Hello I need to get my ggplot with date format having this format in X axis:
.
But my date format has time with it.
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(word, created_at, sentiment) %>%
ungroup()
p <- sentiment_bing1 %>% filter(sentiment == "positive") %>% ggplot(aes(x=created_at, y = n)) +
geom_line(stat="identity", position = "identity", color = "Blue") + scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') + stat_smooth() + theme_gdocs() +
xlab("Date") + ylab("Normalized Frequency of Positive Words in Trup's Tweets")
1 abound 11/30/17 13:05 positive 0.0
2 abuse 1/11/18 12:33 negative 0.0
3 abuse 10/27/17 1:18 negative 0.0
4 abuse 2/18/18 17:10 negative 0.0
This is what I have done to get the result. Now how do I achieve it like the picture? Conversion to date doesn't help as there are instances where the tweet takes place on same day but different time and that then messes the graph.
Welcome to SO!
It's hard to answer your question without seeing the data you are using and the error that your code is generating. Next time try and create a reproducible question. This will make it easier for someone to identify where your problem lies.
Based on the code and data you've provided I've created a sample data set with a (broadly) similar structure to that from the chart...
library(lubridate)
library(ggplot2)
library(ggthemes)
set.seed(100)
start_date <- mdy_hm("03-01-2017-12:00")
end_date <- mdy_hm("03-01-2018-12:00")
number_hours <- interval(start_date, end_date)/hours(1)
created_at <- start_date + hours(6:number_hours)
length(created_at)
word <- sample(c("abound", "abuse"), size = length(created_at), replace = TRUE,
prob=c(0.25, 0.75))
Your plotting code looks good. I could be wrong here, but from what I can tell your problem could lie in the way you are summarising the frequencies. In the code below, I've used the lubridate package to group you data by dates (day), allowing for a daily frequency count.
test_plot <- data_frame(created_at, word) %>%
mutate(sentiment =
case_when(
word == "abound" ~ "positive",
word == "abuse" ~ "negative")) %>%
filter(sentiment == "positive") %>%
mutate(created_at = date(round_date(ymd_hms(created_at), unit = "day"))) %>%
group_by(created_at) %>%
tally() %>%
ggplot() +
aes(x = created_at, y = n) +
geom_line(stat="identity", position = "identity", color = "Blue") +
geom_smooth() +
scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') +
theme_gdocs() +
xlab("Date") +
ylab("Frequency of Positive Words in Trump's Tweets")
Which gives you this...
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(created_at, sentiment) %>%
spread(sentiment, n, fill=0) %>%
mutate(N = (sentiment_bing1$negative - min(sentiment_bing1$negative)) / (max(sentiment_bing1$negative) - min(sentiment_bing1$negative))) %>%
mutate(P = (sentiment_bing1$positive - min(sentiment_bing1$positive)) / (max(sentiment_bing1$positive) - min(sentiment_bing1$positive))) %>%
ungroup
sentiment_bing1$created_at <- as.Date(sentiment_bing1$created_at, "%m/%d/%y")
The use of spread helped in separating the positive and negative and then in normalization to get the result I wasa looking for!
I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.
this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()
I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.