Subset Data.Frame With Multiple Conditions - r

End Goal:
Create a plot for each region of StressCumulative, BaseCumulative, StressQoQ, and BaseQoQ over the date range from rows 1:167.
Problem:
I'm having difficulty subsetting my data.frame. My issue is that the condition by which I'm subsetting is logical, and thus will only return the first element after the condition.
subset_region_1 <- subset.data.frame(HPF, HPF$region == 1, select = BaseCumulative, HPF$StressCumulative, StressQoQ, BaseQoQ)
Warning messages:
1: In if (drop) warningc("drop ignored") :
the condition has length > 1 and only the first element will be used
2: drop ignored
This returns only the first column, BaseCumulative.
Data:
Here you get a glimpse of what I'm working with. This is the table I am looking to subset from. My data.frame is in a tall format
I would like to create a subset in order to graph BaseCumulative, StressCumulative, BaseQoQ, and StressQoQ variables over the range of dates from rows 1:167. The date column uses the same dates for all 100 regions. My issue is that when I go to plot in ggplot, I get an error that my aes mappings are not of the same size. The full table has date = 18370 rows long, but the values repeat every 167 rows (for each unique region). Further, the BaseCumulative variable is also 18370 rows long but is unique for all regions, i.e. every 167 rows. I want to know how I can subset by region while obtaining the correct row size for the variables I am interested in measuring.
Data Pts:
#Rows 1-3 (Region 1 Sample):
dput(head(HPF[1:3, ]))
structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215,
20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110,
113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909,
0.00691832065162346)), .Names = c("region", "path", "date", "index_value",
"index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative",
"StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
#Rows 168:200 (Region 2 Sample):
dput(head(HPF[168:200, ]))
structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1,
1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215,
20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314,
1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA,
0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972,
0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828,
100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110,
111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839,
-0.00917785181610786, -0.00194849914020834, -0.0799443229370588
)), .Names = c("region", "path", "date", "index_value", "index",
"counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
Question:
How do I subset other columns in addition to specifying region == #? I have tried the following but then the issue is that values recycle for the dates and my charts are incorrect:
ggplot(HPF, aes(x = date, y= BaseCumulative, linetype = factor(region == 1))) +
geom_line() +
theme_light()
Further, I am also unsuccessful if I try to subset within the ggplot such as:
ggplot(HPF[HPF$region == 1, ], aes(x = HPF$date[1:167, ], y= HPF$BaseCumulative[1:167, ], linetype = factor(region == 1))) +
geom_line() +
theme_light()
Any help is appreciated.

I'm not entirely sure what you're trying to show in your plot; is this what you're after?
library(tidyverse);
df %>%
gather(what, value, 7:10) %>%
ggplot(aes(date, value, colour = what)) + geom_line() + theme_light()
Explanation: Convert your data from wide to long format, then pass what as a colour (or linetype) aesthetic to get different line plots for columns 7, 8, 9, 10 in one plot.
If you want separate plots for region, you could add + facet_wrap(~ as.factor(region)), e.g.
df %>%
gather(what, value, 7:10) %>%
ggplot(aes(date, value, colour = what)) + geom_line() + theme_light() + facet_wrap(~ as.factor(region))
Sample data
df1 <- structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215,
20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110,
113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909,
0.00691832065162346)), .Names = c("region", "path", "date", "index_value",
"index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative",
"StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"));
df2 <- structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1,
1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215,
20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314,
1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA,
0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972,
0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828,
100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110,
111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839,
-0.00917785181610786, -0.00194849914020834, -0.0799443229370588
)), .Names = c("region", "path", "date", "index_value", "index",
"counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df <- rbind.data.frame(df1, df2);

Related

Making 2-way graph (ggplot2) out of a tabyl table changing values

male FALSE TRUE
0 50.0% 66.7%
1 50.0% 33.3%
structure(list(male = 0:1, `FALSE` = c("50.0%", "50.0%"), `TRUE` = c("66.7%",
"33.3%")), row.names = c(NA, -2L), core = structure(list(male = 0:1,
`FALSE` = c(1, 1), `TRUE` = c(4, 2)), class = "data.frame", row.names = c(NA,
-2L)), tabyl_type = "two_way", var_names = list(row = "male",
col = "dummy"), class = c("tabyl", "data.frame"))
How can I make a plot using ggplot2 of this table constructed with janitor? The thing is that I would like two plots side-by-side: one for dummy=TRUE and the other for dummy=FALSE (but changing the labels such that TRUE is replaced by a and FALSE by b -- i am having difficulties with this because TRUE and FALSE are logical). I would also like to replace the values 0 and 1 for c and d respectively.
You can try a tidyverse. The trick is to transform the data from wide to long since this is the prefered input for ggplot. Here I used pivot_longer, but you can also use reshape or melt.
library(tidyverse)
df %>%
pivot_longer(-1) %>%
mutate(name = ifelse(name, "a", "b")) %>%
ggplot( aes(factor(male), value, fill =name)) +
geom_col(position = position_dodge())
Using base R you can try
# transform percentages to numerics
df$a <- as.numeric(gsub("%", "", df$`TRUE`))
df$b <- as.numeric(gsub("%", "", df$`FALSE`))
barplot(cbind(a, b) ~ male, df, beside=T,legend.text = TRUE)

how to put a plot and a table together using grid.arrange

I have a plot and a table, and I would like to combine them into a plot. how should I do that.
Here is my codes:
df<-structure(list(AEDECOD = c("Hypoxia", "Malignant pleural effusion",
"Decubitus ulcer", "Nausea"), ADY = c(13, 13, 13, 14)), row.names = c(NA,
-4L), class = "data.frame")
tbl <-structure(list(`Analysis Relative Day` = 13, `AE Type` = "SER",
`Adverse Event` = "Hypoxia/Malignant pleural effusion"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"))
p1<- ggplot(data =df, aes(x = ADY, y = AEDECOD))+ geom_point()
p2 <-grid.arrange(p1, tbl,
nrow = 2,as.table = TRUE)
print(p2)
I got the error codes:
Error: Input must be a vector, not a <viewport> object.
If you know any other way to do the same thing, I would like to learn that as well.
We may use ggarrange after converting the tibble to ggtexttable
library(ggpubr)
ggarrange(p1, ggtexttable(tbl), nrow = 2)
Or using the OP's code
library(gridExtra)
grid.arrange(p1, ggtexttable(tbl),
nrow = 2,as.table = TRUE)
-output

execute different functions considering output in r

Let's say I have 2 different functions to apply. For example, these functions are max and min . After applying bunch of functions I am getting outputs below. I want to assign a function to each output.
Here is my data and its structure.
data<-structure(list(Apr = structure(list(`a1` = structure(list(
date = c("04-01-2036", "04-02-2036", "04-03-2036"), value = c(0,
3.13, 20.64)), .Names = c("date", "value"), row.names = 92:94, class = "data.frame"),
`a2` = structure(list(date = c("04-01-2037", "04-02-2037",
"04-03-2037"), value = c(5.32, 82.47, 15.56)), .Names = c("date",
"value"), row.names = 457:459, class = "data.frame")), .Names = c("a1",
"a2")), Dec = structure(list(`d1` = structure(list(
date = c("12-01-2039", "12-02-2039", "12-03-2039"), value = c(3,
0, 11)), .Names = c("date", "value"), row.names = 1431:1433, class = "data.frame"),
`d2` = structure(list(date = c("12-01-2064", "12-02-2064",
"12-03-2064"), value = c(0, 5, 0)), .Names = c("date", "value"
), row.names = 10563:10565, class = "data.frame")), .Names = c("d1",
"d2"))), .Names = c("Apr", "Dec"))
I applied these functions:
drop<-function(y){
lapply(y, function(x)(x[!(names(x) %in% c("date"))]))
}
q1<-lapply(data, drop)
q2<-lapply(q1, function(x) unlist(x,recursive = FALSE))
daily_max<-lapply(q2, function(x) lapply(x, max))
dailymax <- data.frame(matrix(unlist(daily_max), nrow=length(daily_max), byrow=TRUE))
row.names(dailymax)<-names(daily_max)
max_value <- apply(dailymax, 1, which.max)
And I'm getting
Apr Dec
2 1
And I am applying any random function to both Apr[2] and Dec[1] like:
Map(function(x, y) sum(x[[y]]), q2, max_value)
So, the function will be executed considering the outputs (to Apr's second element which is a1, Dec's first element which is a2.) As you can see, there are outputs as numbers 1 and 2.
What I want
What I want is assigning specific functions to 1 and 2. If output is 1 then max function; if it is 2, min function will be executed. In conclusion, max function will be applied to Apr[2] and min function will be applied to Dec[1].
I will get this:
min(q2$Apr$a2.value)
[1] 5.32
max(q2$Dec$d2.value)
[1] 5
How can I achieve this automatically for all my functions?
You can take help of switch here to apply a function based on number in max_value.
apply_function <- function(x, num) switch(num, `1` = max, `2` = min)(x)
Map(function(x, y) apply_function(x[[y]], y), q2, max_value)
#$Apr
#[1] 5.32
#$Dec
#[1] 11
Map returns a list if you want a vector output use mapply.

Remove specific markers from legend

Sorry if this question has already been answered but I could not find the solution to what I am after. I have a plot that uses both geom_line and geom_point. The result of this is that in the legend, it adds both a line and a point when they should have one or the other. I want to keep the circles for the data tg1 and tg2 and remove the line and then do the opposite to the data full i.e. keep the line but remove the circle. I have seen that something like this works where you want to remove dots from all of the legend entries but nothing to only do specifics Removing ggplot2's geom_point icons from the legend. Can anyone help? Thanks.
#code for plot
library(ggplot2)
library(tidypaleo)
ggplot(LGRSL, aes(x =mmsl , y = Age))+
coord_flip()+
theme_classic(12)+
geom_point(data=tg1,aes(x=mmslc,y=Year,col="Fort Denison 1"),pch=1,size=2)+
geom_point(data=tg2,aes(x=mmslc,y=Year,col="Fort Denison 2"),pch=1,size=2)+
geom_lineh(data = full, aes(x=Lutregalammslc,y=Year,col="Full budget"))+
scale_colour_manual(values=c("grey15","grey50","black"))
## data
## tg1
structure(list(Year = 1886:1891, SLR = c(6919L, 6935L, 6923L,
6955L, 6956L, 6957L), mmsl = c(-0.158, -0.142, -0.154, -0.122,
-0.121, -0.12), m = c(6.919, 6.935, 6.923, 6.955, 6.956, 6.957
), GIA.correction = c(-0.02814, -0.02793, -0.02772, -0.02751,
-0.0273, -0.02709), SLRc = c(6.89086, 6.90707, 6.89528, 6.92749,
6.9287, 6.92991), mmslc = c(-0.19667, -0.18046, -0.19225, -0.16004,
-0.15883, -0.15762)), row.names = c(NA, 6L), class = "data.frame")
##tg2
structure(list(Year = 1915:1920, SLR = c(7011L, 6929L, 6987L,
6945L, 6959L, 6951L), mmsl = c(-0.066, -0.148, -0.09, -0.132,
-0.118, -0.126), m = c(7.011, 6.929, 6.987, 6.945, 6.959, 6.951
), GIA.correction = c(-0.02205, -0.02184, -0.02163, -0.02142,
-0.02121, -0.021), SLRc = c(6.98895, 6.90716, 6.96537, 6.92358,
6.93779, 6.93), mmslc = c(-0.09858, -0.18037, -0.12216, -0.16395,
-0.14974, -0.15753)), row.names = c(NA, 6L), class = "data.frame")
##full
structure(list(Year = 1900:1905, Lutregala = c(-0.103609677,
-0.118603251, -0.134550791, -0.105553735, -0.103983082, -0.121731984
), Wapengo = c(-0.095213147, -0.096005337, -0.115700625, -0.097696891,
-0.084444784, -0.109161066), Tarra = c(-0.106672829, -0.109537943,
-0.135256365, -0.101357772, -0.089716518, -0.104258351), Lutregalammsl = c(-0.292863465,
-0.307857039, -0.323804579, -0.294807523, -0.29323687, -0.310985772
), Wapengommsl = c(-0.257028279, -0.257820469, -0.277515756,
-0.259512023, -0.246259916, -0.270976198), Tarrammsl = c(-0.30925682,
-0.312121933, -0.337840355, -0.303941762, -0.292300508, -0.306842342
), LgGIAc = c(-0.01921, -0.01904, -0.01887, -0.0187, -0.01853,
-0.01836), WapGIAc = c(-0.02486, -0.02464, -0.02442, -0.0242,
-0.02398, -0.02376), TarGIAc = c(-0.02373, -0.02352, -0.02331,
-0.0231, -0.02289, -0.02268), Lutregalammslc = c(-0.312073465,
-0.326897039, -0.342674579, -0.313507523, -0.31176687, -0.329345772
), Wapmmslc = c(-0.281888279, -0.282460469, -0.301935756, -0.283712023,
-0.270239916, -0.294736198), Tarmmslc = c(-0.33298682, -0.335641933,
-0.361150355, -0.327041762, -0.315190508, -0.329522342)), row.names = c(NA,
6L), class = "data.frame")
##LGRSL
structure(list(depths = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5), RSL = c(0.047746907,
0.025564293, 0.021733558, 0.007855661, -0.004909879, 0.01747051
), RSLerror = c(0.058158556, 0.057902654, 0.057988654, 0.057957388,
0.057905405, 0.057226072), Age = c(2017.456716, 2013.594255,
2006.92838, 1999.675523, 1994.729181, 1990.518154), Ageerror = c(0.373138707,
0.77640096, 1.430582242, 1.627131115, 3.222393394, 3.239674718
), mmsl = c(0.01993169, -0.002250924, -0.006081659, -0.019959556,
-0.032725096, -0.010344707)), row.names = c(NA, 6L), class = "data.frame")
##LGRSLgp
structure(list(Age = 1892:1897, mean = c(-0.298147401, -0.304630597,
-0.31023294, -0.315506983, -0.321225142, -0.327190675), error = c(0.051858047,
0.04985084, 0.047760525, 0.045624121, 0.043505044, 0.041477551
), min = c(-0.246289354, -0.254779758, -0.262472416, -0.269882862,
-0.277720098, -0.285713124), max = c(-0.350005447, -0.354481437,
-0.357993465, -0.361131103, -0.364730186, -0.368668226), x = c(-0.02125,
-0.02108, -0.02091, -0.02074, -0.02057, -0.0204), meangia = c(-0.276897401,
-0.283550597, -0.28932294, -0.294766983, -0.300655142, -0.306790675
), rate = c(NA, -4.967327, -4.946326, -4.964493, -4.977451, -4.911859
), raterror = c(NA, 3.581013, 3.796417, 4.022157, 4.226762, 4.255126
), mmsl = c(-0.325962618, -0.332445814, -0.338048157, -0.3433222,
-0.349040359, -0.355005892)), row.names = c(NA, 6L), class = "data.frame")
Here is a way.
Override the guide legend with a list of vectors of values for each of the aesthetics involved, shape and linetype. Note the different ways to specify what is to be removed.
I have also simplified the code a bit.
library(ggplot2)
library(dplyr)
colrs <- c("Fort Denison 1" = "grey15",
"Fort Denison 2" = "grey50",
"Full budget" = "black")
legnd <- list(shape = c(1, 1, NA),
linetype = c("blank", "blank", "solid"))
bind_rows(
tg1 %>% mutate(col = "Fort Denison 1"),
tg2 %>% mutate(col = "Fort Denison 2")
) %>%
ggplot(aes(x = mmslc, y = Year, colour = col)) +
geom_point(pch = 1, size = 2) +
geom_lineh(data = full, aes(x = Lutregalammslc, col = "Full budget"))+
scale_colour_manual(values = colrs,
guide = guide_legend(override.aes = legnd)) +
coord_flip() +
theme_classic(base_size = 12)

Adding parameters to a ggplot produced plot in a function

Let's say I have a saved plot named my_plot, produced with ggplot. Also, let's say that the column in my_plot[[1]] data frame used for horizontal axis is named my_dates
Now, I want to add some vertical lines to the plot, which, of course, can be done by something like that:
my_plot +
geom_vline(aes(xintercept = my_dates[c(3, 8)]))
Since I perform this task quite on a regular basis, I want to write a function for that -- something like that:
ggplot.add_lines <- function(given_plot, given_points) {
finale <- given_plot +
geom_vline(aes(xintercept = given_plot[[1]]$my_dates[given_points]))
return(finale)
}
Which, as it's probably obvious to everyone, doesn't work:
> ggplot.add_lines(my_plot, c(3, 5))
Error in eval(expr, envir, enclos) : object 'given_plot' not found
So, my question would be what am I doing wrong, and how can it be fixed? Below is some data for a reproducible example:
> dput(my_plot)
structure(list(data = structure(list(my_dates = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), my_points = c(-2.20176409422924, -1.12872396340683,
-0.259703895194354, 0.634233385649338, -0.678983982973015, -1.83157126614836,
1.33360095418957, -0.120455389285709, -0.969431974863616, -1.20451262626184
)), .Names = c("my_dates", "my_points"), row.names = c(NA, -10L
), class = "data.frame"), layers = list(<environment>), scales = <S4 object of class structure("Scales", package = "ggplot2")>,
mapping = structure(list(x = my_dates, y = my_points), .Names = c("x",
"y"), class = "uneval"), theme = list(), coordinates = structure(list(
limits = structure(list(x = NULL, y = NULL), .Names = c("x",
"y"))), .Names = "limits", class = c("cartesian", "coord"
)), facet = structure(list(shrink = TRUE), .Names = "shrink", class = c("null",
"facet")), plot_env = <environment>, labels = structure(list(
x = "my_dates", y = "my_points"), .Names = c("x", "y"
))), .Names = c("data", "layers", "scales", "mapping", "theme",
"coordinates", "facet", "plot_env", "labels"), class = c("gg",
"ggplot"))
According to this post, below is my solution to this problem. The environment issue in the **ply and ggplot is annoying.
ggplot.add_lines <- function(given_plot, given_points) {
finale <- eval(substitute( expr = {given_plot +
geom_vline(aes(xintercept = my_dates[given_points]))}, env = list(given_points = given_points)))
return(finale)
}
The following code runs well on my machine. (I cannot make your reproducible work on my machine...)
df <- data.frame(my_dates = 1:10, val = 1:10)
my_plot <- ggplot(df, aes(x = my_dates, y = val)) + geom_line()
my_plot <- ggplot.add_lines(my_plot, c(3, 5))
print(my_plot)
Update: The above solution fails when more than two points are used.
It seems that we can easily solve this problem by not including the aes (subsetting together with aescauses problems):
ggplot.add_lines <- function(given_plot, given_points) {
finale <- given_plot + geom_vline(xintercept = given_plot[[1]]$my_dates[given_points])
return(finale)
}
I would take the following approach: extract the data.frame of interest, and pass it to the new layer,
df <- data.frame(my_dates = 1:10, val = rnorm(10))
my_plot <- ggplot(df, aes(x = my_dates, y = val)) + geom_line()
add_lines <- function(p, given_points=c(3,5), ...){
d <- p[["data"]][given_points,]
p + geom_vline(data = d, aes_string(xintercept="my_dates"), ...)
}
add_lines(my_plot, c(3,5), lty=2)

Resources