R-use dataframe's data to make histogram - r

I am trying to generate a histogram from below data:
There have five categories(screen.out~safety.out), and at the end have a total(just calculate how many "1" in each category)
This is my target plot:
But I don't know how to generate my target plot. Can it just use total number to generate a plot(all categories in one picture just like the annex2)? or other method?
Thanks for watching.

datatotal %>%
select(-complaindata) %>%
gather() %>%
ggplot(aes(key, value)) +
geom_col() +
labs(x = "x_name", y = "y_name")
This should give you the plot as designed in your image.

I've made an example with your data, but please next time don't post the images, copy and paste the actual code and data.
Fake data:
library(dplyr)
data <- tibble(
screen.out = c(rep(1, 19), 0),
voice.out = c(rep(0, 15), rep(1,5)),
cs.out = c(rep(0,10), rep(1, 10))
) # this is just some fake data
The trick is now to aggregate all the columns (here 3) in two columns one with the numbers and one with the original column name (it's done by gather below).
Plot:
library(ggplot2)
data %>%
gather("key", "value") %>% # you can change this names
ggplot(aes(key, value, fill = key)) + # as long as you update here too accordingly
geom_col()

Related

Arrange weekdays starting on Sunday

everyone!
How can I arrange weekdays, starting on Sunday, in R? I got the weekdays using lubridate's function weekdays(), but the days appears randomly (image attached) and I can't seem to find a way to sort it. I tried the arrange function, but I guess it only works with numeric values. A bar chart looks very weird starting on Friday. This is what the code looks like:
my_dataset <- my_dataset %>%
mutate(weekDay = weekdays(Date))
my_dataset %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = weekDay, y = steps))+
geom_bar(stat = "identity")
Thanks!
I tried the arrange function, but I guess it only works with numeric values.
Your weekDay-vector probably is of the class character. This will be arranged in alphabetical order by ggplot. The solution to this is to convert this character-vector into a factor-class.
There are several ways to get the x-axis in the order you would like to see. All of them mean to convert weekDays into a factor.
In order to come close to your example I have at first created a data frame with weekdays and some data. As those are both created randomly a seed was set to make the code reproducible.
One method is to create the data.frame with summaries and then to define in this DF weekdays as a factor with defined levels.
This can also be done within the ggplot-call when creating the aesthetics.
library(tidyverse)
set.seed(111)
myData <- data.frame(
weekDay = sample(weekdays(Sys.Date() + 0:6), 100, replace = TRUE),
TotalSteps = sample(1000:8000, 100)
)
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) -> DF # new data.frame
# the following defines weekDay as a factor and also sets
# the sequence of factor levels. This sequence is then taken
# by ggplot to construct the x-axis.
DF$weekDay <- factor(DF$weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
))
ggplot(DF, aes(x = weekDay, y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
# the factor can also be defined within the ggplot-call
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = factor(weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
)), y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")

How to specify ggplot legend order when you have multiple variables that are not all part of one column?

I'm plotting the same data by different time scales (Week, Month, Quarter, etc.) using ggplot, and as a result, I'm pulling the data from different columns. However, when I see my legend, I want it to be a specific order.
I know if all the grouping variables were in one column, I could set it as an ordered factor, as it explained here, but my data are spread across multiple columns. I also tried the suggestions here about re-ordering multiple geoms, but it didn't work.
Because my actual dataset is very complex, I've reproduced a smaller version that just has week and month data. For the final answer, please allow it to specify a specific order, not just something like rev(), because in my actual dataset, I have 6 columns that need a specific order.
Here's a code to reproduce--for this, the first 3 chunks make the dataset, so only the 4th chunk to make the plot should be relevant for the actual solution. The default that R shows the order is by showing 'Score - Month' first in the legend, so I'd like to see how I could make this the 2nd.
library(dplyr)
library(ggplot2)
library(lubridate)
#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
Week_score = c(sample(100:200, 79)),
Month = ymd(format(Week, "%Y-%m-01")))
#Generates month data -- shouldn't be relevant to troubleshoot
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
Month_score = c(sample(150:200, 19)))
#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot
all_time <- by_week %>%
full_join(by_month) %>%
mutate(helper = across(c(contains("Month")), ~paste(.))) %>%
mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
mutate(Month = as.Date(Month))
#Makes plot - this is where I want the order in the legend to be different
all_time %>%
ggplot(aes(x = Week)) +
geom_line(aes(y= Week_score, colour = "Week_score")) +
geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"))
Here's what the current legend looks like--I want the order switched with a solution that is scalable to more than 2 options. Thank you!
As #stefan mentioned right in the comments, you should set the names of your labels in the limits option of scale_colour_discrete. You can add more columns by yourself. You can use the following code:
library(dplyr)
library(ggplot2)
library(lubridate)
#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
Week_score = c(sample(100:200, 79)),
Month = ymd(format(Week, "%Y-%m-01")))
#Generates month data -- shouldn't be relevant to troubleshoot
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
Month_score = c(sample(150:200, 19)))
#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot
all_time <- by_week %>%
full_join(by_month) %>%
mutate(helper = across(c(contains("Month")), ~paste(.))) %>%
mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
mutate(Month = as.Date(Month))
#Makes plot - this is where I want the order in the legend to be different
all_time %>%
ggplot(aes(x = Week)) +
geom_line(aes(y= Week_score, colour = "Week_score")) +
geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"), limits = c("Week_score", "Month_score"))
Output:
As you can see the order of the labels is changed.

create plot in ggplot for each unique value in a row in r

I have a dataframe like this:
library(tidyverse)
my_data <- tibble(name = c("Justin", "Janet", "Marisa"),
x = c(100, 50, 75),
y = c(2, 3, 6))
Each name is unique, and I want to make a bar graph for each person without having to do it line by line. I also want to save each plot as a unique object because I'll be inputting it into a power point using the officer package. Last, the names won't always be the same, but each name will always be unique.
For instance, I want one plot for Janet, one plot for Justin, and one plot for Marisa. I don't want them faceted but instead as their own objects.
Any thoughts?
We can get the data in long format first and for each individual name create the plot.
library(tidyverse)
long_data <- my_data %>% tidyr::pivot_longer(cols = -name, names_to = 'col')
plots_list <- map(unique(my_data$name), ~long_data %>%
filter(name == .x) %>%
ggplot() + aes(name, value, fill = col) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_fill_manual(values = c('red', 'blue')) +
ggtitle(paste0('Plot for ', .x)))
This will return list of plots where individual plots can be accessed via plots_list[[1]], plots_list[[2]] etc.
plots_list[[1]]

How can I set my own tick labels in ggplot while plotting factor values of time series?

So, I am plotting some time series in ggplot and on the x axis I got some date/time data. Data from 2008 to 2016. The problem is that dates are not continuous and for instance the last date of 2008 is
2008/05/14 19:05:12
and the next date is for 2009 something like this
2009/03/24 10:17:54
While plotting these, the result is the following
In order to get rid of the empty spaces I turn my dates into factors
dates <- factors(dates) in order to get the correct plot.
But after that I am unable to set the x tick labels as they don't change using
scale_x_continuous(breaks = c(1,1724,2283,5821,8906,10112,10156,14875 ),
labels = c("2008","2009","2010","2011","2012","2013","2014","2015"))
How can I change them?
There's a few problems this is throwing up, and the solution will really depend on what you're looking for. I'd suggest you post up some sample data and your code so far to get a more precise answer, but here's a possibility in the mean time:
Your graph above is not showing a continuous scale (though it may look like it), it's a discrete scale with the number of levels corresponding to unique date observations. Two problems come out of this:
applying a scale_x_continuous wont work, as the year breaks wont be evenly spread
your data looks like it's smoothly spread, but it isn't, which isn't a good principle for visualisation.
If what you're trying to do is show change year-by-year you could sort all of your data into yearly 'bins' and plot:
library(tidyverse)
library(lubridate)
# creating random data
df <- tibble(date = as_datetime(runif(1000, as.numeric(as_datetime("2001/01/24 09:30:43")), as.numeric(as_datetime("2006/02/24 09:30:43")))))
df["val"] <- rnorm(nrow(df), 25, 5)
# use lubridate to extract year as new variable, and plot grouped years
df %>%
mutate(year = factor(year(date))) %>%
ggplot(aes(year, val)) +
geom_point(position = "jitter")
Another possibility could be to use a colour scale to note your groupings by year, keeping all the dates in order but removing the gaps (and therefore not using a continuous x-axis scale):
df %>% # begin by simulating a data 'gap'
filter(date>as_datetime("2003/07/24 09:30:43")|date<as_datetime("2002/09/24 09:30:43")) %>%
mutate(year = factor(year(date)), # 'year' to select colour
date = factor(date)) %>%
ggplot(aes(date, val, col = year)) +
geom_point() +
theme(axis.ticks.x = element_blank(), # removes all ticks and labels, as too many unique times
axis.text.x = element_blank())
If neither of those are helpful do comment below with any clarifications of what you're looking for, and I'll see if I can help!
Edit: One last idea, you could create an invisible series of points which act as the breaks for your axis ticks:
blank_labels <- tibble(date = as_datetime(c("20020101 000000",
"20030101 000000",
"20040101 000000",
"20050101 000000",
"20060101 000000")),
col = "NA", val = 0)
df2 <- df %>%
filter(date>as_datetime("2003/07/24 09:30:43")|date<as_datetime("2002/09/24 09:30:43")) %>%
mutate(col = "black") %>%
bind_rows(blank_labels) %>%
mutate(date_fac = factor(date))
tick_values <- left_join(blank_labels, df2, by = c("date", "col"))
df2 %>%
ggplot(aes(date_fac, val, col = col)) +
geom_point() +
scale_x_discrete(breaks = tick_values$date_fac, labels = c("2002", "2003", "2004", "2005", "2006")) +
scale_color_identity()

R: using ggplot2 with a group_by data set

I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))

Resources