Related
I am looking to get a bar graph of medals in R. I have 3 distinct columns (gold, silver, bronze). The columns for gold medals has a total of 8, the silver has 10, and the bronze has 13.
For the code, I started writing: ggplot(data, aes(x=?)) + geom_bar()
I am not sure how to write all 3 gold medals on the function where it shows x=?
Thanks
For plotting purposes, it is "easier" to work with long data instead of wide. Below I converted the data you mentioned in your comment to long and plotted the data as a grouped bar.
library(tidyverse)
# load data
raw_data <- structure(list(Rank = c(1, 2, 3, 4, 5, 6),
`Team/Noc` = c("United States of America", "People's Republic of China", "Japan", "Great Britain", "ROC", "Australia"),
Gold = c(39, 38, 27, 22, 20, 17),
Silver = c(41,32, 14, 21, 28, 7),
Bronze = c(33, 18, 17, 22, 23, 22),
Total = c(113, 88, 58, 65, 71, 46),
`Rank by Total` = c(1, 2, 5, 4, 3, 6)),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
# convert wide data to long
long_data <- raw_data %>%
pivot_longer(cols = -`Team/Noc`, names_to = 'Medal') %>% # convert wide data to long format
filter(Medal %in% c("Gold", "Silver", "Bronze")) # only select medal columns
# plot
ggplot(long_data) +
geom_col(aes(x = `Team/Noc`,
y = value,
fill = Medal),
position = "dodge" # grouped bars
)
Hope this gets you started!
I've been doing these awful graphs with R with a very basic code below
mydata %>%
mutate(week = week(date)) %>%
ggplot(aes(x = week))+
geom_freqpoly()
In the data there are recorded events, in the standard date format, in all four weeks of a month. But as you can see in the picture, the graph dives to the bottom in between of the weeks making it look awful. So how to make the graph go from one point to the other without this dive?
To reconstruct the data frame
structure(list(ID = c(82, 23, 81, 76, 56, 17, 11, 50, 69, 84),
pvm = structure(c(1295395200, 1295222400, 1295395200, 1295654400,
1294272000, 1294272000, 1293926400, 1294185600, 1294012800,
1295222400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Not sure if this is something you're looking for but you can use geom_point and geom_line to produce a 'better graph'. I'm not sure what the data is meant to show and why you're using geom_freqpoly
Data <- structure(list(ID = c(82, 23, 81, 76, 56, 17, 11, 50, 69, 84),
pvm = structure(c(1295395200, 1295222400, 1295395200, 1295654400,
1294272000, 1294272000, 1293926400, 1294185600, 1294012800,
1295222400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
ggplot(Data, aes(x=pvm, y=ID), group = 1)+
geom_point()+
geom_line()
Geom_point and Line graoh
New to this answering questions game but let me know if this isn't what you're looking for.
I'm trying to conditionally replace values in multiple columns based on a string match in a different column but I'd like to be able to do so in a single line of code using the across() function but I keep getting errors that don't quite make sense to me. I feel like this is probably a simple solution so if anyone could point me in the right direction, that would be fantastic!
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
"total" = c(34, 56, 75, 89, 21, 56),
"group_a" = c(30, 26, 45, 60, 3, 46),
"group_b" = c(4, 30, 30, 29, 18, 10))
# working but not concise
df %>%
mutate(total = ifelse(str_detect(type, "Park"), NA, total),
group_a = ifelse(str_detect(type, "Park"), NA, group_a),
group_b = ifelse(str_detect(type, "Park"), NA, group_b))
# concise but not working
df %>% mutate(across(total, group_a, group_b), ifelse(str_detect(type, "Park"), NA, .))
Update
We got a solution that works with my dummy dataset but is not working with my real data, so I am going to share a small snippet of my real data frame with the numbers changed and organization names hidden. When I run this line of code (df %>% mutate(across(c(Attempts, Canvasses, Completes)), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .))) on these data, I get the following error message:
Error: Problem with mutate() input ..2. x Input ..2 must be a
vector, not a formula object. i Input ..2 is
~ifelse(str_detect(long_name, "park-cemetery"), NA, .).
This a small sample of the data that produces this error:
df <- structure(list(Org = c("OrgName", "OrgName", "OrgName", "OrgName",
"OrgName", "OrgName", "OrgName", "OrgName", "OrgName", "OrgName"
), nCode = c("M34", "R36", "R46", "X29", "M31", "K39", "Q12",
"Q39", "X41", "K27"), Attempts = c(100, 100, 100, 100, 100, 100,
100, 100, 100, 100), Canvasses = c(80, 80, 80, 80, 80, 80, 80,
80, 80, 80), Completes = c(50, 50, 50, 50, 50, 50, 50, 50, 50,
50), van_nocc_id = c(999, 999, 999, 999, 999, 999, 999, 999,
999, 999), van_name = c("M-Upper West Side", "SI-Rosebank", "SI-Tottenville",
"BX-park-cemetery-etc-Bronx", "M-Stuyvesant Town-Cooper Village",
"BK-Kensington", "Q-Broad Channel", "Q-Lindenwood", "BX-Wakefield",
"BK-East New York"), boro_short = c("M", "SI", "SI", "BX", "M",
"BK", "Q", "Q", "BX", "BK"), long_name = c("Upper West Side",
"Rosebank", "Tottenville", "park-cemetery-etc-Bronx", "Stuyvesant Town-Cooper Village",
"Kensington", "Broad Channel", "Lindenwood", "Wakefield", "East New York"
)), row.names = c(NA, -10L), class = "data.frame")
Final update
The curse of the misplaced closing bracket! Thanks to everyone for your help... the correct solution was df %>% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))
If you use the newly introduced function across (which is the correct way to approach this task), you have to specify inside across itself the function you want to apply. In this case the function ifelse(...) has to be a purrr-style lambda (so starting with ~). Check out across documentation and look for the arguments .cols and .fns.
df %>%
mutate(across(c(total, group_a, group_b), ~ifelse(str_detect(type, "Park"), NA, .)))
Output
# type total group_a group_b
# 1 Park NA NA NA
# 2 Neighborhood 56 26 30
# 3 Airport 75 45 30
# 4 Park NA NA NA
# 5 Neighborhood 21 3 18
# 6 Neighborhood 56 46 10
Here a data.table solution.
require(data.table)
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
"total" = c(34, 56, 75, 89, 21, 56),
"group_a" = c(30, 26, 45, 60, 3, 46),
"group_b" = c(4, 30, 30, 29, 18, 10))
setDT(df)
df[type == "Park", c("total", "group_a", "group_b") := NA]
Update: that didn't take long to figure out! Just needed to place the columns in a vector:
# concise AND working!
df %>% mutate(across(c(total, group_a, group_b)), ifelse(str_detect(type, "Park"), NA, .))
I had tried this initially but placed the columns in quotes... don't do that :)
I have a data frame with which I am learning tidyverse methods in R that looks like this:
> glimpse(data)
Observations: 16
Variables: 6
$ True.species <fct> Badger, Blackbird, Brown hare, Domestic cat, Domestic d...
$ misidentified <dbl> 17, 16, 59, 20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, ...
$ missed <dbl> 61, 106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259,...
$ Total <dbl> 78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 77, 369...
$ PrMissed <dbl> 0.7820513, 0.8688525, 0.1060606, 0.5454545, 0.5714286, ...
$ PrMisID <dbl> 0.21794872, 0.13114754, 0.89393939, 0.45454545, 0.42857...
Here is the dput():
data <- structure(list(True.species = structure(c(1L, 2L, 3L, 5L, 6L,
7L, 8L, 9L, 13L, 16L, 17L, 18L, 20L, 21L, 22L, 23L), .Label = c("Badger",
"Blackbird", "Brown hare", "Crow", "Domestic cat", "Domestic dog",
"Grey squirrel", "Hedgehog", "Horse", "Human", "Jackdaw", "Livestock",
"Magpie", "Muntjac", "Nothing", "Pheasant", "Rabbit", "Red fox",
"Red squirrel", "Roe Deer", "Small rodent", "Stoat or Weasel",
"Woodpigeon"), class = "factor"), misidentified = c(17, 16, 59,
20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, 5, 13), missed = c(61,
106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259, 473, 9, 17
), Total = c(78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391,
77, 369, 494, 14, 30), PrMissed = c(0.782051282051282, 0.868852459016393,
0.106060606060606, 0.545454545454545, 0.571428571428571, 0.869565217391304,
0.797101449275362, 0.666666666666667, 0.833333333333333, 0.840909090909091,
0.51150895140665, 0.753246753246753, 0.70189701897019, 0.95748987854251,
0.642857142857143, 0.566666666666667), PrMisID = c(0.217948717948718,
0.131147540983607, 0.893939393939394, 0.454545454545455, 0.428571428571429,
0.130434782608696, 0.202898550724638, 0.333333333333333, 0.166666666666667,
0.159090909090909, 0.48849104859335, 0.246753246753247, 0.29810298102981,
0.0425101214574899, 0.357142857142857, 0.433333333333333)), row.names = c(NA,
-16L), class = "data.frame")
I managed to make a rudimentary plot of what I want with ggplot() as follows:
ggplot(data = data, aes(x = True.species, y = PrMissed)) + geom_bar(stat = "identity")
But there are three things I can't figure out how to do:
I want a stacked bar chart where the variables PrMissed and PrMisID are on top of each other. Note that PrMissed + PrMisID == 1 for each row in the data frame, so the final plot would have equally high stacks but each containing two colors (how do I specify them?), one for PrMissed and another for PrMisID.
I want the order of the bars to be in ascending order of the PrMissed variable so that Brown hare would be on one end and Small rodent on the other.
I prefer this plot to be "flipped" on its side so that the labels (the animal names like "Brown hare") are on the left side and easier to read. An added complexity is that rather than the labels simply saying the animal name, I want them to say the corresponding Total value, so for example Brown hare would get a corresponding axis label like "Brown hare (total = 66)".
I been trying for a long time a for the life of me couldn't figure out an axiomatic way to do this with ggplot(). I know the answer might be simple so please excuse my ignorance. Can anyone help? Thanks in advance.
Here's my answer which does not require the use of data.tables and is solely based on tidyverse packages:
library(ggplot2)
library(reshape2)
library(magrittr)
library(dplyr)
# order Species by PrMissed value
data$True.species <- factor(data$True.species,
levels = data[order(data$PrMissed, decreasing = F),"True.species"])
# reshape to have the stackable values and plot
melt(data,
id.vars = c("True.species", "misidentified", "missed", "Total"),
measure.vars = c("PrMissed", "PrMisID")) %>%
mutate(x_axis_text = paste(.$True.species, "(Total = ", .$Total, ")") ) %>%
ggplot(aes(x = x_axis_text, y = value, fill = variable) ) +
geom_bar(stat = "identity") +
coord_flip()
Which would result in a plot like this
Break down of the code:
Your individual points are done like this.
1) To have stackable values, they need to be all in one column, so using melt from the reshape2 package we tidy the data and create 2 new columns in the data. One is value containing the values from 0 to 1 and the other is variable indicating if that number is associated with PrMissed or PrMisID
2) Before melting the data we convert the True.species values into factor based on PrMissed values. Use decreasing = T to invert the order if you wish.
3) coord_flip() flips the x and y axis so that the species are on the y axis instead of the y axis and you can easily read them on the left side.
I can help with a data.table and ggplot2 solution:
First, you'll need to make your wide table a long one with melt. Then, you're looking for position = "stack" argument to geom_bar:
Also, please notice that naming data a table is bad idea, as there's a function called data().
require(data.table)
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value, fill = variable))+
geom_bar(position = "stack", stat = "identity")
I forgot about the sorting... (and rotation of texts, so they are readable):
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value,
fill = variable))+
geom_bar(position = "stack", stat = "identity")+
theme(axis.text.x = element_text(angle = 90))+
scale_x_discrete(limits = sort(df$True.species))
I have the chunk of code below where I am trying to fill the missing minutes in my data df_stuff by joining it to a time series which has all minutes for an entire year. I would actually like to aggregate this data at 15 minute intervals instead of minute. Does anyone know a simple way of doing this? I was looking at to.minutes15 from the xts package but it seems to have problems with my POSIXct format time series.
Code:
library("sqldf")
##Filling Gaps in time by minute
myTZ <- "America/Los_Angeles"
tseries <- seq(as.POSIXct("2015-01-01 00:00:00", tz=myTZ),
as.POSIXct("2015-12-31 23:59:00", tz=myTZ), by="min")
df2 <- data.frame(SeqDateTime=tseries)
finaldf <- sqldf("select df2.SeqDateTime,
median(df_stuff.brooms) as broomsTot
from df2
left outer join df_stuff on df2.SeqDateTime = df_stuff.broomTime
group by df2.SeqDateTime
order by df2.SeqDateTime asc")
Data:
df_stuff <- structure(list(brooms = c(27, 53, 10, 55, 14, 49, 26,
13, 12, NA, NA, 23, 28, 31, NA, 46, NA, 13, NA, 33, 12, 4, 28,
34, 0, 24, 7, 31, 33, 37, 56, 41, 50, 55, 41, 15, 23, 26, 14,
27, 22, 41, 48, 19, 28, 11, 11, NA, 49, NA), broomTime = structure(c(1423970100,
1424122200, 1424136180, 1424035260, 1424141580, 1424122440, 1423274580,
1424129580, 1424146320, 1429129320, 1429032060, 1429142940, 1428705000,
1429142460, 1429128720, 1429204560, 1422909480, 1424137200, 1424042100,
1424149620, 1424131920, 1424108940, 1424144820, 1424040600, 1424119620,
1424148660, 1443593040, 1443657120, 1424125860, 1424223120, 1424235240,
1424232720, 1424234940, 1424234640, 1424230440, 1424115300, 1429208280,
1429131720, 1429148460, 1429151040, 1424129760, 1424125380, 1424123220,
1424137380, 1424115780, 1424219340, 1424131560, 1424233560, 1424224920,
1443640800), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("brooms",
"broomTime"), row.names = c(NA, 50L), class = "data.frame")
You can summarize by any amount of time interval by using cut within the group_by function in dplyr.
library(dplyr)
ans <- finaldf %>%
group_by(SeqDateTime = cut(SeqDateTime, breaks = "15 min")) %>%
summarize(broomsTot = sum(as.numeric(broomsTot), na.rm = TRUE))
head(ans)
Source: local data frame [6 x 2]
SeqDateTime broomsTot
(fctr) (dbl)
1 2015-01-01 02:00:00 0
2 2015-01-01 02:15:00 0
3 2015-01-01 02:30:00 0
4 2015-01-01 02:45:00 0
5 2015-01-01 03:00:00 0
6 2015-01-01 03:15:00 0
I can assure you that xts does not have problem with your POSIXct time series. xts uses POSIXct for its internal time index.
Here's how to join df_stuff with a 1-minute series and then aggregate that result to a 15-minute series.
library(xts)
# create xts object
xts_stuff <- with(df_stuff, xts(brooms, broomTime))
# merge with empty xts object that contains a regular 1-minute index
xts_stuff_1min <- merge(xts_stuff, xts(,tseries))
# aggregate to 15-minutes
ep15 <- endpoints(xts_stuff_1min, "minutes", 15)
final_df <- period.apply(xts_stuff_1min, ep15, median, na.rm=TRUE)