(R) line in qplot, mean of y for every x - r

I am trying to make a qplot in R. I know I could reformat my data but I want to try to make it in the qplot as I plan, at a later date to connect it to Shiny.
Before the problem, this is my data:
Date | Price | Postnr
2016-08-01 5000 101
2016-08-01 4300 103
2016-07-01 7000 105
2016-07-01 4500 105
2016-07-01 3000 103
2016-06-01 3900 101
2016-06-01 2700 103
2016-06-01 2900 105
2016-05-01 7100 101
I am trying to create a graph using plot lines.
I want to group using Postnr.
My problem is:
I want the Date to be on the X-axis, Price on the Y, the plot point to be created by getting the average Price on each day but I have no idea how to go about creating it with in the qplot itself.
-Edit-
Included reproducable data
mydata <- structure(list(Date = structure(c(4L, 4L, 3L, 3L, 3L, 2L,
2L, 2L, 1L), .Label = c("2016-05-01", "2016-06-01", "2016-07-01",
"2016-08-01"), class = "factor"), Price = c(5000L, 4300L, 7000L,
4500L, 3000L, 3900L, 2700L, 2900L, 7100L), Postnr = c(101L, 103L,
105L, 105L, 103L, 101L, 103L, 105L, 101L)), .Names = c("Date",
"Price", "Postnr"), row.names = c(NA, 9L), class = "data.frame")

After Ian Fellows got me on the right path I finally found what I was looking for:
ggplot(data = mydata,
aes(x = Date, y = Price, colour = Postnr, group=Postnr)) +
stat_summary(fun.y=mean, geom="point")+
stat_summary(fun.y=mean, geom="line")

Is this the idea you are looking for, #Atius?
date = runif(100,0,10)+as.Date("1980-01-01")
Price = runif(100,0,5000)
Postnr = runif(100,101,105)
dataFrame =data.frame(date=date, Price=Price, Postnr=Postnr)
d <- ggplot(dataFrame, aes(date, Price))
d + geom_point()
d + stat_summary_bin(aes(y = Postnr), fun.y = "mean", geom = "point")

Related

how do I Columns to x-axis

Please help i am trying to make all then columns into x-axis and the make side by side bars later by date
this is my data i really tried but to no avail
dateVisited hh_visited hh_ind_confirmed new_in_mig out_mig deaths HOH_death Preg_Obs Preg_Outcome child_forms
102 2020-07-21 292 1170 131 86 18 7 3 14 79
103 2020-07-22 400 1553 115 100 25 10 11 18 107
104 2020-07-23 381 1458 103 67 21 9 5 23 87
105 2020-07-24 345 1379 90 98 12 4 3 20 89
106 2020-07-25 436 1585 131 119 13 2 7 20 117
107 2020-07-26 0 0 0 0 0 0 0
0 0
I think you're looking for something like this:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(cols = -1) %>%
ggplot(aes(name, value)) +
geom_col(aes(fill = dateVisited), width = 0.6,
position = position_dodge(width = 0.8)) +
guides(x = guide_axis(angle = 45))
Reproducible Data from question
df <- structure(list(dateVisited = structure(1:6, .Label = c("2020-07-21",
"2020-07-22", "2020-07-23", "2020-07-24", "2020-07-25", "2020-07-26"
), class = "factor"), hh_visited = c(292L, 400L, 381L, 345L,
436L, 0L), hh_ind_confirmed = c(1170L, 1553L, 1458L, 1379L, 1585L,
0L), new_in_mig = c(131L, 115L, 103L, 90L, 131L, 0L), out_mig = c(86L,
100L, 67L, 98L, 119L, 0L), deaths = c(18L, 25L, 21L, 12L, 13L,
0L), HOH_death = c(7L, 10L, 9L, 4L, 2L, 0L), Preg_Obs = c(3L,
11L, 5L, 3L, 7L, 0L), Preg_Outcome = c(14L, 18L, 23L, 20L, 20L,
0L), child_forms = c(79L, 107L, 87L, 89L, 117L, 0L)), class = "data.frame",
row.names = c("102", "103", "104", "105", "106", "107"))
Your data cannot be used easily since it requires time to format it into something that could ingested by R. Here is something to get you started. I made up a hypothetical dataframe of 4 columns that resemble your data, use the function melt from reshape2 package to format the data such that it is understandable by ggplot2 package, and use ggplot2 package to generate a bar plot.
df <- data.frame(dateVisited = seq(as.Date('2019-01-01'), as.Date('2019-12-31'), 30),
hh_visited = runif(13, 0, 436),
hh_ind_confirmed = runif(13, 0, 1585),
new_in_mig = runif(13, 0, 131))
df <- reshape2::melt(df, id.vars = 'dateVisited')
ggplot(data = df, aes(x = dateVisited, y = value, fill = variable))+
geom_col(position = 'dodge')

How would I color values in a scatterplot in ggplot2 IF the variable is defining how it is plotted?

I have the following dataset:
Species Country IUCN_Area IUCN.Estimate Estimate.year
1 Reticulated Kenya Embu 0 2018
2 Reticulated Kenya Laikipia_Isiolo_Samburu 3043 2018
3 Reticulated Kenya Marsabit 625 2018
4 Reticulated Kenya Meru 999 2018
5 Reticulated Kenya Turkana 0 2018
6 Reticulated Kenya West Pokot 0 2018
GEC_Stratum_Detect_Estimate UpperCI_detect LowerCI_detect
1 130 277 -17
2 16414 19919 12910
3 57 347 -233
4 4143 6232 2054
5 0 0 0
6 0 0 0
I want to create a scatterplot which has on the x-axis "IUCN Estimate", and on the y-axis the "GEC_Stratum_Detect_Estimate". I then want to color the dots by type, i.e. "IUCN" and "GEC". However, how would I color the dots by their type, if the variables are defining the axes? I'm pretty sure there must be a simple code to layer on, but it's been stumping me so far. I've also tried rejigging the dataset but haven't managed to get anywhere. Here's the plot code:
ggplot(df, aes(x=IUCN.Estimate, y=GEC_Stratum_Detect_Estimate, shape=Species)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()
And here is the data:
structure(list(Species = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Maasai",
"Reticulated", "Southern"), class = "factor"), Country = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Botswana", "Kenya", "Tanzania"
), class = "factor"), IUCN_Area = structure(c(4L, 10L, 12L, 13L,
23L, 25L), .Label = c("Burigi-Biharamulo", "Central District",
"Chobe", "Embu", "Kajiado", "Katavi-Rukwa", "Kilifi", "Kitui",
"Kwale", "Laikipia_Isiolo_Samburu", "Makueni/ Machakos", "Marsabit",
"Meru", "Moremi GR", "Narok", "Ngamiland", "No IUCN Estimate",
"Nxai and Makgadikgadi", "Ruahu-Rungwa-Kisigo", "Selous-Mikumi",
"Taita Taveta", "Tana River", "Turkana", "Ugalla GR", "West Pokot"
), class = "factor"), IUCN.Estimate = c(0L, 3043L, 625L, 999L,
0L, 0L), Estimate.year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L), GEC_Stratum_Detect_Estimate = c(130L, 16414L, 57L, 4143L,
0L, 0L), UpperCI_detect = c(277L, 19919L, 347L, 6232L, 0L, 0L
), LowerCI_detect = c(-17L, 12910L, -233L, 2054L, 0L, 0L)), row.names = c(NA,
6L), class = "data.frame")
Thank you in advance.
The following will color the dots by the IUCN_Area:
library(ggplot2)
ggplot(df, aes(x=IUCN.Estimate, y=GEC_Stratum_Detect_Estimate, shape=Species)) +
geom_point(aes(color=IUCN_Area)) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()
And the following by IUCN.Estimate. As IUCN is a numeric value, ggplot default colors by range values. Where as in above, the factor value colors by discrete values.
library(ggplot2)
ggplot(df, aes(x=IUCN.Estimate, y=GEC_Stratum_Detect_Estimate, shape=Species)) +
geom_point(aes(color=IUCN.Estimate)) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()
As OP requested to color by both IUCN and GEC below, this will do this. How it is interpreted may be another matter. But any value can be given to the color variable. Here, I've added the two numbers together and set as.factor(). Presumably in large datasets the sum of the points might identify a group of note.
library(ggplot2)
ggplot(df, aes(x=IUCN.Estimate, y=GEC_Stratum_Detect_Estimate, shape=Species)) +
geom_point(aes(color=as.factor(IUCN.Estimate+GEC_Stratum_Detect_Estimate))) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()

Multiple entries with same patient id: how to make it one entry

I have multiple entries in my data from same patient ID, I wanted to make it as one entry. What are my possible options? Here is the data -
PtID WorryHighBGNow
40 5
40 1
40 2
70 3
101 4
263 2
263 5
263 3
143 4
245 4
137 3
219 2
219 3
219 4
3 3
264 3
264 3
98 1
200 3
105 3
111 4
149 3
I want to create a visualization like below out of this data, where on y axis I want to see columns of my table and on X I want to se ranking 1,2,3,4,5
If x is your data frame you can try this
d <- setDT(x)[, list(WorryHighBGNow = paste(WorryHighBGNow, collapse = ', ')),by = c('PtID')]
It will give result like
PtID WorryHighBGNow
40 5,1,2
70 3
101 4
263 2,5,3
And so on.
Not really sure that is what you need. I've just tried to mimic visualization you linked to the question, as close as it possible.
library(tidyverse)
dat %>%
mutate_all(factor) %>%
count(WorryHighBGNow, ) %>%
mutate(percentage = round(n / sum(n) * 100, 1)) %>%
mutate(WorryHighBGNow = reorder(WorryHighBGNow, n)) %>%
ggplot(aes(x = WorryHighBGNow, y = percentage,
fill = WorryHighBGNow, label = paste(percentage, '%'))) +
geom_col() +
geom_text(hjust = -.1, fontface = 'bold') +
scale_fill_brewer(type = 'qual', breaks = 1:5) +
coord_flip() +
expand_limits(y = 50) +
theme_void() +
theme(legend.position = 'bottom')
Data:
dat <- structure(
list(
PtID = c(40L, 40L, 40L, 70L, 101L, 263L, 263L, 263L, 143L, 245L, 137L, 219L,
219L, 219L, 3L, 264L, 264L, 98L, 200L, 105L, 111L, 149L),
WorryHighBGNow = c(5L, 1L, 2L, 3L, 4L, 2L, 5L, 3L, 4L, 4L, 3L, 2L, 3L, 4L,
3L, 3L, 3L, 1L, 3L, 3L, 4L, 3L)
),
class = "data.frame", row.names = c(NA, -22L)
)

Interaction plot in R

I want from the following dataset:
ID Result Days Position
1 70 0 1
1 80 23 1
2 90 15 2
2 89 30 2
2 99 40 2
3 23 24 1
etc...
To make 2 spaghetti plots: 1 for those who are in position 1 and one for those in position 2. I tried a "for & if" loop, but I just got the mixed plot many times. Also I am using ggplot.
dfPr <- df[df$Progress==1]
x11()
ggplot(dfPr, aes(x=OrderToFirstBx, y=result.num, color=factor(MRN))) +
geom_line() + theme_bw() + xlab("Time in Days") + ylab("ALT")
This worked! But if you have another solution please tell me.
Thank you.
You gave such limited example data, and your sample code doesn't seem to match the variable names in your sample data which make it very hard to tell exactly what you wanted.
If you want two separate plots, using facets might be the easiest. Try
#sample data
dfPr <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L, 3L), Result = c(70L,
80L, 90L, 89L, 99L, 23L), Days = c(0L, 23L, 15L, 30L, 40L, 24L
), Position = c(1L, 1L, 2L, 2L, 2L, 1L)), .Names = c("ID", "Result",
"Days", "Position"), class = "data.frame", row.names = c(NA,
-6L))
ggplot(dfPr, aes(x=Days, y=Result, group=ID)) +
geom_line() + facet_wrap(~Position)

Plot Bar Chart in R

I have some data. for example there are two columns. First column data is continuous. second column value is binary value(t|f). I want to plot this in a bar chart in R language. In the first column, I want group the numbers into category like 0-100, 101-200,..... then i want to plot number of t's in y axis. I have used ggplot2 in R. But i am not clear with how to group these x axis data.
1 123 t
2 145 t
3 222 t
4 345 f
5 455 t
6 567 t
7 245 t
8 300 t
9 150 t
10 600 t
11 333 t
First, here's your sample data in a data.frame
dd<-structure(list(V1 = 1:11, V2 = c(123L, 145L, 222L, 345L, 455L,
567L, 245L, 300L, 150L, 600L, 333L), V3 = structure(c(2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("f", "t"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -11L))
Here's a strategy for plotting
ggplot(dd, aes(x=cut(V2, breaks=c(0,1:9*100)), weight=as.numeric(V3=="t"))) +
geom_bar(stat="bin") + xlab("value")
We define x and weights in the aes(). We use cut() to break up with numbers into ranges. Then we use weights to turn each value to a zero/one value that will be added together in the bins.

Resources