.drc plot and ggplot function - r

I am trying to plot a graph with ggplot. Currently, I am only able to plot with the plot function in R, not ggplot for my .drc results. I want to use ggplot since I already have nice line of code for it and ggplot is more customizable than the plot function in R. my line of code with the .drc fuynction is:
try<-drm(X..bound~Dilution,data=Titration.8.31,Sample,robust="mean",fct=LL.4())
which generates my .drc data. I can then plot this using the plot function which I don't really want to do since I can't really change labels or anything. in ggplot my line of pre-existing code with loess lines of best fit which I want to remove since they drop below zero and replace with my .drc code is :
ggplot(Titration.8.31, aes(x = Dilution, y = `X..bound`)) +
geom_point(size=5,aes(color=Sample,shape=Sample)) +
scale_shape_manual(values=c(0,2,5,8,13,15,16,17,18,19,20,10,9,3)) +
scale_x_continuous(trans = "log10",breaks = trans_breaks("log10", function(x) 10^x),labels = trans_format("log10", math_format(10^.x)), minor_breaks = 10^(seq(0, 7, by = 0.25))) +
labs(x="Antibody Dilution",y="% Cell Binding") +
theme_minimal() +
theme(axis.title.x=element_text(size=22)) +
theme(axis.title.y=element_text(size=22)) +
theme(axis.text=element_text(size=18)) +
scale_color_manual(values=c("dodgerblue2", "#E31A1C", "green4",
"#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "palegreen2", "#FDBF6F", "gray70", "maroon", "orchid1", "darkturquoise", "darkorange4", "brown")) +
coord_cartesian(ylim=c(0,100)) +
theme(legend.key.size =unit(1,"in")) +
theme(legend.text=element_text(size=11))`
How do I change this line of code so that my .drc lines can be the new lines of best fit? If I can't use ggplot, how do I change the x axis label in the plot function (which I think this might be easier)?
The data is dput(Titration.8.31):
structure(list(Dilution = c(300L, 900L, 2700L, 8100L, 24300L,
72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L,
300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L,
2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L,
24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L,
218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L),
X..bound = c(92.43, 92.95, 92.26, 86.55, 67.49, 21.86, 0.72,
89.57, 87.84, 82.35, 65.84, 24.18, 3.56, 0.32, 91.63, 90.57,
87.22, 77.03, 39.52, 5.39, 1.24, 93.51, 93.56, 90.33, 80.49,
38.97, 4.7, 0.93, 95.37, 94.44, 91.24, 77.74, 28.76, 2.14,
0.15, 0.01, 0, 0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0),
Sample = c("CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA",
"CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA",
"CoV77-39 1mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA",
"CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA",
"CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG",
"CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG",
"CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 HA",
"CoV77-39 HA", "CoV77-39 HA", "CoV77-39 HA", "CoV77-39 HA",
"CoV77-39 HA", "CoV77-39 HA", "CoV77-39 WT", "CoV77-39 WT",
"CoV77-39 WT", "CoV77-39 WT", "CoV77-39 WT", "CoV77-39 WT",
"CoV77-39 WT")), class = "data.frame", row.names = c(NA,
-49L))
any help is appreciated and very welcome as I am very new to coding :) Thank you in advance for your time!! It is really appreciated as I really am stuck

Related

Drm function for dose response curve

I am trying to make a dose-receptive curve (i.e a titration curve). The data is
structure(list(Dilution = c(300L, 900L, 2700L, 8100L, 24300L,
72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L,
300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L,
2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L,
24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L,
218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L),
X..bound = c(92.43, 92.95, 92.26, 86.55, 67.49, 21.86, 0.72,
89.57, 87.84, 82.35, 65.84, 24.18, 3.56, 0.32, 91.63, 90.57,
87.22, 77.03, 39.52, 5.39, 1.24, 93.51, 93.56, 90.33, 80.49,
38.97, 4.7, 0.93, 95.37, 94.44, 91.24, 77.74, 28.76, 2.14,
0.15, 0.01, 0, 0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0),
Sample = c("CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA",
"CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA", "CoV77-39 1mer 0DA",
"CoV77-39 1mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA",
"CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA",
"CoV77-39 5mer 0DA", "CoV77-39 5mer 0DA", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG", "CoV77-39 5mer 2DA GGG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDGDG",
"CoV77-39 5mer 2DA GDGDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG",
"CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG",
"CoV77-39 5mer 2DA GDG", "CoV77-39 5mer 2DA GDG", "CoV77-39 HA",
"CoV77-39 HA", "CoV77-39 HA", "CoV77-39 HA", "CoV77-39 HA",
"CoV77-39 HA", "CoV77-39 HA", "CoV77-39 WT", "CoV77-39 WT",
"CoV77-39 WT", "CoV77-39 WT", "CoV77-39 WT", "CoV77-39 WT",
"CoV77-39 WT")), class = "data.frame", row.names = c(NA,
-49L))
I then run try<-drm(X..bound~Dilution,data=Titration.8.31,Sample,robust="mean",fct=LL.4()) and then this generates a curve after I run
plot(try,col=c("dodgerblue2", "#E31A1C", "green4", "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "palegreen2", "#FDBF6F", "gray70", "maroon", "orchid1", "darkturquoise", "darkorange4", "brown"),lty=c(1,1,1,1,1,5,5))
which looks great because the lines of best fit don't go below 0. However, I am not sure how to change the numerical axis; more specifically the x-axis. I am trying to make the x axis label with 10^.x---I know how to do this in ggplot by using
+scale_x_continuous(trans = "log10",breaks = trans_breaks("log10", function(x) 10^x),labels = trans_format("log10", math_format(10^.x)), minor_breaks = 10^(seq(0, 7, by = 0.25)))
but this won't work with the plot function. Is there a way to make my x-axis label a 10^x# instead of the odd labeling/spacing it has now in the attached image? Or is there a way to do this in ggplot (preferred)?

Making lines of best fits in R for multiple lines using ggplot2

I have a titration curve with a line of code
ggplot(Titration.Aug.9, aes(x = Dilution, y = `X..bound`)) +
geom_line(aes(color = Sample)) +
geom_point() +
geom_smooth(formula = y ~ x, method = "loess", se = FALSE, linetype = "dashed") +
scale_x_continuous(trans = "log10",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)),
minor_breaks = 10^(seq(0, 7, by = 0.25))) +
scale_color_brewer(type = "Sample", palette = "Set1") +
labs(x = "Antibody Dilution", y = "% Cell Binding") +
theme_minimal()
and I have generated a plot that looks pretty nice. However, I need the data to have a line of best fit for each individual plot point. I tried to do this with loess but I need each line to be a different color. graph image This is the graph I had but I need the lines to be the best fit for each sample not linked by the actual dot plots graph image 2 kind of like this one generated with this line of code
ggplot(Titration.Aug.9, aes(x = Dilution, y = `X..bound`)) +geom_line(aes(color = Sample)) +geom_point() +geom_smooth(formula = y ~ x, method = "loess", se = FALSE,aes(Color=Sample)) +scale_x_continuous(trans = "log10",breaks = trans_breaks("log10", function(x) 10^x),labels = trans_format("log10", math_format(10^.x)),minor_breaks = 10^(seq(0, 7, by = 0.25))) +scale_color_brewer(type="Sample",palette="Set1") +labs(x="Antibody Dilution",y="% Cell Binding") +theme_minimal()
However I need each off these to be colored to each different sample like in the first image but with the line fitting format in the second image. My data is :
dput(Titration.Aug.9)
structure(list(Dilution = c(300L, 900L, 2700L, 8100L, 24300L,
72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L,
300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L,
2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L,
24300L, 72900L, 218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L,
218700L, 300L, 900L, 2700L, 8100L, 24300L, 72900L, 218700L, 300L,
900L, 2700L, 8100L, 24300L, 72900L, 218700L, 300L, 900L, 2700L,
8100L, 24300L, 72900L, 218700L), X..bound = c(52.74, 40.31, 30.63,
18.89, 7.57, 0.8, 0.01, 20.23, 11.29, 7.55, 3.24, 0.54, 0.12,
0.03, 53.27, 46.82, 38.17, 26.77, 11.59, 2.23, 0.07, 69.25, 63.55,
56.34, 40.95, 19.35, 2.4, 0.05, 75.8, 68.21, 62.82, 40.33, 11.73,
0.82, 0.04, 85.75, 82.82, 74.29, 46.63, 9.36, 0.24, 0.05, 71.65,
66.54, 56.63, 33.96, 6.33, 0.19, 0.03, 85.43, 86.49, 75.73, 51.62,
15.16, 1.05, 0.01, 92.44, 90.13, 85.92, 72.06, 30.08, 3.15, 0.12
), Sample = c("1mer 0DA", "1mer 0DA", "1mer 0DA", "1mer 0DA",
"1mer 0DA", "1mer 0DA", "1mer 0DA", "1mer 2DA", "1mer 2DA", "1mer 2DA",
"1mer 2DA", "1mer 2DA", "1mer 2DA", "1mer 2DA", "1mer 3DA", "1mer 3DA",
"1mer 3DA", "1mer 3DA", "1mer 3DA", "1mer 3DA", "1mer 3DA", "1mer 4DA",
"1mer 4DA", "1mer 4DA", "1mer 4DA", "1mer 4DA", "1mer 4DA", "1mer 4DA",
"5mer 0DA", "5mer 0DA", "5mer 0DA", "5mer 0DA", "5mer 0DA", "5mer 0DA",
"5mer 0DA", "5mer 2DA", "5mer 2DA", "5mer 2DA", "5mer 2DA", "5mer 2DA",
"5mer 2DA", "5mer 2DA", "5mer 4DA", "5mer 4DA", "5mer 4DA", "5mer 4DA",
"5mer 4DA", "5mer 4DA", "5mer 4DA", "5mer 2DA GDG", "5mer 2DA GDG",
"5mer 2DA GDG", "5mer 2DA GDG", "5mer 2DA GDG", "5mer 2DA GDG",
"5mer 2DA GDG", "5mer 2DA GDGDG", "5mer 2DA GDGDG", "5mer 2DA GDGDG",
"5mer 2DA GDGDG", "5mer 2DA GDGDG", "5mer 2DA GDGDG", "5mer 2DA GDGDG"
)), class = "data.frame", row.names = c(NA, -63L))
Any help is appreciated!! Thank you in advance for your time!! :)
You simply need to map Sample to the color aesthetic inside geom_smooth. I have removed the original geom_line to make the result clearer, but this could be added back in.
ggplot(Titration.Aug.9, aes(x = Dilution, y = `X..bound`)) +
geom_point() +
geom_smooth(formula = y ~ x, method = "loess", se = FALSE,
linetype="dashed", aes(color = Sample)) +
scale_x_continuous(trans = "log10",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)),
minor_breaks = 10^(seq(0, 7, by = 0.25))) +
scale_color_brewer(type="Sample",palette="Set1") +
labs(x="Antibody Dilution",y="% Cell Binding") +
theme_minimal()

How to Plot Time-series data on Horizontal bar in R?

R::How to Plot Single Horizontal Bar Showing different stages on Continous Time-series Data from startdate to present date and for navigating time a horizontal scrollbar in R?
This is my data:
var_events time_date event_duration veh_id
LD 17-06-2018 13:25 6.52 B33
WL 17-06-2018 13:25 14.52 B31
TL 17-06-2018 13:26 0.32 B32
TE 17-06-2018 13:26 4.58 B13
UL 17-06-2018 13:26 3.45 B12
WT 17-06-2018 13:26 5.46 B25
UL 17-06-2018 13:26 1.56 B17
TL 17-06-2018 13:26 13.6 B33
SL 17-06-2018 13:26 0.05 B32
Here is a Example of line chart of Previous code:
require(ggplot2)
require(dplyr)
df = structure(list(Event_stage = c("SE", "MN", "MN", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "WL", "TE", "TE", "SE", "TE", "TE", "WL", "WT", "MN", "WL", "TE", "WL", "WL", "WT", "WL", "LD", "WT", "WL", "WT", "WT", "TE", "WL", "LD", "WT", "LD", "MN", "TL", "TE", "WL", "TL", "TL", "WT", "TE", "TE", "LD", "WT", "TL", "LD" ), event_date = structure(c(1529573704, 1529573710, 1529573713, 1529573724, 1529573855, 1529573874, 1529573880, 1529573895, 1529573906, 1529573918, 1529573925, 1529573931, 1529573931, 1529573941, 1529573947, 1529573969, 1529574006, 1529574054, 1529574088, 1529574114, 1529574120, 1529574123, 1529574134, 1529574137, 1529574148, 1529574163, 1529574164, 1529574148, 1529574169, 1529574170, 1529574178, 1529574188, 1529574189, 1529574196, 1529574178, 1529574188, 1529574203, 1529574213, 1529574214, 1529574214, 1529574215, 1529574227, 1529574231, 1529574242, 1529574244, 1529574245, 1529574248, 1529574260, 1529574262), class = c("POSIXct", "POSIXt"), tzone = "UTC"), stage_duration = c(3.78, 3.47, 2.78, 3.45, 3.32, 4.93, 4.23, 4.22, 3.85, 3.37, 5.88, 5.92, 3.97, 3.7, NA, 4.08, 3.05, 0.57, 11.18, 12.08, 2.6, 3.3, 0.23, 0.85, 0.27, 0.25, 0.82, 10.42, 0.15, 0.43, 1.4, 0.25, 0.7, 0.52, 1.12, 0.45, 12.87, 12.18, 2.92, 0.57, 14.07, 12.72, 17.12, 4.13, 3.13, 0.25, 0.33, 18.98, 1.05), veh_id = c("B35", "B05", "B04", "B08", "B14", "B13", "B04", "B17", "B41", "B05", "B26", "B08", "B35", "B19a", "B10a", "B01a", "B28", "B14", "B14", "B18", "B05", "B37", "B04", "B41", "B04", "B19a", "B04", "B17", "B35", "B13", "B35", "B02b", "B28", "B13", "B19a", "B41", "B02b", "B04", "B15", "B01a", "B41", "B13", "B28", "B27", "B33", "B19a", "B01a", "B19a", "B35")), .Names = c("Event_stage", "event_date", "stage_duration", "veh_id"), row.names = c(NA, -49L), class = c("tbl_df", "tbl", "data.frame"))
# create ggplot
ggplot(data = df %>% filter(veh_id == "B35"), aes(x = event_date,
y = stage_duration)) +
geom_point(aes(color = Event_stage), size= 3) +
geom_line(alpha = 1/2)+
labs(x = "Event date", y = "Stage duration")
enter image description here
This is Sample bar plot, Everything same as in above line chart but instead of line with spikes a Horizontal line or I just want a single bar which is interactive with a Slider/Scrollbar to navigate time ::
enter image description here
Something resembling this plot,But only a Single Horizontal bar with a scrollbar from start-time to present-time::
enter image description here
df %>% filter(veh_id == "B35") %>%
ggplot(
aes(
x = event_date,
y = stage_duration)
) +
geom_bar(stat = "identity") +
labs(x = "Event date", y = "Stage duration") +
coord_flip()

How can I plot this data in R?

I have 4 columns: date & time, stage_duration, various_stages, Vehicle_ID. I want to plot date and time in mins on X-axis and id, stage_duration on Y-axis and fill by various stages on line or bar chart.
Something like this would be good:
Here is my data:
var_events time_date event_duration veh_id
LD 17-06-2018 13:25 6.52 B33
WL 17-06-2018 13:25 14.52 B31
TL 17-06-2018 13:26 0.32 B32
TE 17-06-2018 13:26 4.58 B13
UL 17-06-2018 13:26 3.45 B12
WT 17-06-2018 13:26 5.46 B25
UL 17-06-2018 13:26 1.56 B17
TL 17-06-2018 13:26 13.6 B33
SL 17-06-2018 13:26 0.05 B32
Here is a minimal example that creates the plot
# load data
data(presidential)
data(economics)
# events of interest
events <- presidential[-(1:3),]
# strip year from economics and events data frames
economics$year = as.numeric(format(economics$date, format = "%Y"))
# use dplyr to summarise data by year
#install.packages("dplyr")
library(dplyr)
econonomics_mean <- economics %>%
group_by(year) %>%
summarise(mean_unemployment = mean(unemploy))
# add president terms to summarized data frame as a factor
president <- c(rep(NA,14), rep("Reagan", 8), rep("Bush", 4), rep("Clinton", 8), rep("Bush", 8), rep("Obama", 7))
econonomics_mean$president <- president
# create ggplot
p <- ggplot(data = econonomics_mean, aes(x = year, y = mean_unemployment)) +
geom_point(aes(color = president)) +
geom_line(alpha = 1/3)
Update
This is the output:
structure(list(Event_stage = c("SE", "MN", "MN", "TE", "TE",
"TE", "TE", "TE", "TE", "TE", "TE", "WL", "TE", "TE", "SE", "TE",
"TE", "WL", "WT", "MN", "WL", "TE", "WL", "WL", "WT", "WL", "LD",
"WT", "WL", "WT", "WT", "TE", "WL", "LD", "WT", "LD", "MN", "TL",
"TE", "WL", "TL", "TL", "WT", "TE", "TE", "LD", "WT", "TL", "LD"),
event_date = structure(c(1529573704, 1529573710, 1529573713,
1529573724, 1529573855, 1529573874, 1529573880, 1529573895, 1529573906,
1529573918, 1529573925, 1529573931, 1529573931, 1529573941, 1529573947,
1529573969, 1529574006, 1529574054, 1529574088, 1529574114, 1529574120,
1529574123, 1529574134, 1529574137, 1529574148, 1529574163, 1529574164,
1529574148, 1529574169, 1529574170, 1529574178, 1529574188, 1529574189,
1529574196, 1529574178, 1529574188, 1529574203, 1529574213, 1529574214,
1529574214, 1529574215, 1529574227, 1529574231, 1529574242, 1529574244,
1529574245, 1529574248, 1529574260, 1529574262), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), stage_duration = c(3.78, 3.47, 2.78,
3.45, 3.32, 4.93, 4.23, 4.22, 3.85, 3.37, 5.88, 5.92, 3.97, 3.7,
NA, 4.08, 3.05, 0.57, 11.18, 12.08, 2.6, 3.3, 0.23, 0.85, 0.27,
0.25, 0.82, 10.42, 0.15, 0.43, 1.4, 0.25, 0.7, 0.52, 1.12, 0.45,
12.87, 12.18, 2.92, 0.57, 14.07, 12.72, 17.12, 4.13, 3.13, 0.25,
0.33, 18.98, 1.05), veh_id = c("B35", "B05", "B04", "B08", "B14",
"B13", "B04", "B17", "B41", "B05", "B26", "B08", "B35", "B19a",
"B10a", "B01a", "B28", "B14", "B14", "B18", "B05", "B37", "B04",
"B41", "B04", "B19a", "B04", "B17", "B35", "B13", "B35", "B02b",
"B28", "B13", "B19a", "B41", "B02b", "B04", "B15", "B01a", "B41",
"B13", "B28", "B27", "B33", "B19a", "B01a", "B19a", "B35")),
.Names = c("Event_stage", "event_date", "stage_duration", "veh_id"),
row.names = c(NA, -49L), class = c("tbl_df", "tbl", "data.frame"))
require(ggplot2)
require(dplyr)
df = structure(list(Event_stage = c("SE", "MN", "MN", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "WL", "TE", "TE", "SE", "TE", "TE", "WL", "WT", "MN", "WL", "TE", "WL", "WL", "WT", "WL", "LD", "WT", "WL", "WT", "WT", "TE", "WL", "LD", "WT", "LD", "MN", "TL", "TE", "WL", "TL", "TL", "WT", "TE", "TE", "LD", "WT", "TL", "LD" ), event_date = structure(c(1529573704, 1529573710, 1529573713, 1529573724, 1529573855, 1529573874, 1529573880, 1529573895, 1529573906, 1529573918, 1529573925, 1529573931, 1529573931, 1529573941, 1529573947, 1529573969, 1529574006, 1529574054, 1529574088, 1529574114, 1529574120, 1529574123, 1529574134, 1529574137, 1529574148, 1529574163, 1529574164, 1529574148, 1529574169, 1529574170, 1529574178, 1529574188, 1529574189, 1529574196, 1529574178, 1529574188, 1529574203, 1529574213, 1529574214, 1529574214, 1529574215, 1529574227, 1529574231, 1529574242, 1529574244, 1529574245, 1529574248, 1529574260, 1529574262), class = c("POSIXct", "POSIXt"), tzone = "UTC"), stage_duration = c(3.78, 3.47, 2.78, 3.45, 3.32, 4.93, 4.23, 4.22, 3.85, 3.37, 5.88, 5.92, 3.97, 3.7, NA, 4.08, 3.05, 0.57, 11.18, 12.08, 2.6, 3.3, 0.23, 0.85, 0.27, 0.25, 0.82, 10.42, 0.15, 0.43, 1.4, 0.25, 0.7, 0.52, 1.12, 0.45, 12.87, 12.18, 2.92, 0.57, 14.07, 12.72, 17.12, 4.13, 3.13, 0.25, 0.33, 18.98, 1.05), veh_id = c("B35", "B05", "B04", "B08", "B14", "B13", "B04", "B17", "B41", "B05", "B26", "B08", "B35", "B19a", "B10a", "B01a", "B28", "B14", "B14", "B18", "B05", "B37", "B04", "B41", "B04", "B19a", "B04", "B17", "B35", "B13", "B35", "B02b", "B28", "B13", "B19a", "B41", "B02b", "B04", "B15", "B01a", "B41", "B13", "B28", "B27", "B33", "B19a", "B01a", "B19a", "B35")), .Names = c("Event_stage", "event_date", "stage_duration", "veh_id"), row.names = c(NA, -49L), class = c("tbl_df", "tbl", "data.frame"))
# create ggplot
ggplot(data = df, aes(x = event_date,
y = stage_duration)) +
geom_point(aes(color = Event_stage), size= 3) +
geom_line(alpha = 1/2)+
facet_wrap(~veh_id, nrow = 4) +
labs(x = "Event date", y = "Stage duration")

In R how to run Correlation or simple linear Regression between two variables of unequal lengths from different data frames

In R I'd like to run a correlation or simple linear regression lm(userScoreDF$Score ~ Stock$Adj.Close) between two variables from different data frames but I'm getting an error from the fact that their of unequal length. I have not combined the data because I'm unsure of how to combine them in such a way that matches the two variables by date.
Is there a way to run a correlation or simple linear regression with two variables of unequal lengths from different data frames? Is there a way to how to combine the variables into a data frame in such a way that matches the two variables by date? Here's my data:
dput(userScoreDF)
structure(list(Group.date = structure(c(15737, 15746, 15747,
15748, 15749, 15750, 15751, 15752, 15753, 15754, 15755, 15738,
15756, 15757, 15758, 15759, 15760, 15761, 15762, 15763, 15764,
15739, 15740, 15741, 15742, 15743, 15744, 15745, 15765, 15774,
15775, 15776, 15777, 15778, 15779, 15780, 15781, 15782, 15783,
15766, 15784, 15785, 15786, 15787, 15788, 15789, 15790, 15791,
15792, 15793, 15767, 15794, 15795, 15768, 15769, 15770, 15771,
15772, 15773, 15796, 15805, 15806, 15807, 15808, 15809, 15810,
15811, 15812, 15813, 15814, 15797, 15815, 15816, 15817, 15818,
15819, 15820, 15821, 15822, 15823, 15824, 15798, 15825, 15799,
15800, 15801, 15802, 15803, 15804, 15826, 15835, 15836, 15837,
15838, 15839, 15840, 15841, 15842, 15843, 15844, 15827, 15845,
15846, 15847, 15848, 15849, 15850, 15851, 15852, 15853, 15854,
15828, 15855, 15856, 15829, 15830, 15831, 15832, 15833, 15834,
15857, 15866, 15867, 15868, 15869, 15870, 15871, 15872, 15873,
15874, 15875, 15858, 15876, 15877, 15878, 15879, 15880, 15881,
15882, 15883, 15884, 15885, 15859, 15886, 15860, 15861, 15862,
15863, 15864, 15865, 15887, 15896, 15897, 15898, 15899, 15900,
15901, 15902, 15903, 15904, 15905, 15888, 15906, 15907, 15908,
15909, 15910, 15911, 15912, 15913, 15914, 15915, 15889, 15916,
15917, 15890, 15891, 15892, 15893, 15894, 15895, 15918, 15919,
15920), class = "Date"), Score = c(-1.13, -0.93, -1.14, -1.04,
-0.81, -0.64, -1.12, -1.01, -0.6, -0.82, -1.05, -1.34, -0.86,
-0.93, -0.99, -0.9, -0.76, -0.91, -1.03, -0.95, -1.22, -0.74,
-0.95, -0.98, -0.96, -0.97, -0.95, -0.79, -1.27, -0.72, -1.06,
-0.95, -1.05, -1.02, -0.67, -0.9, -0.7, -1.1, -0.95, -1.14, -1.07,
-1.02, -0.88, -0.79, -1.05, -0.97, -0.9, -1.13, -1.05, -0.8,
-0.84, -0.82, -0.53, -0.96, -0.84, -0.95, -0.99, -1.06, -0.98,
-0.91, -0.94, -0.98, -1.03, -0.77, -0.75, -1.17, -1.02, -0.96,
-0.95, -0.81, -0.96, -1.32, -0.9, -1.11, -1.05, -1.08, -0.8,
-1.14, -0.82, -0.92, -0.96, -1.14, -1, -0.96, -1.14, -0.84, -0.83,
-1.13, -1.11, -0.96, -1.06, -0.94, -0.85, -1.21, -0.95, -0.98,
-0.99, -1.15, -1.18, -0.86, -0.9, -1.09, -1.04, -1.05, -1.07,
-1.11, -1.18, -1.07, -0.99, -1.43, -1.02, -0.96, -1.18, -1.05,
-0.88, -0.84, -1.11, -1.15, -1.18, -1.14, -1.4, -1.6, -1.16,
-1.28, -1.33, -1.07, -0.98, -1.24, -0.81, -1.23, -1.05, -0.99,
-1.53, -1.06, -1.26, -1.18, -1.46, -1.25, -1.31, -1.12, -0.98,
-1.08, -1.13, -1.24, -1, -1.3, -1.04, -1.02, -1.19, -1.09, -1.21,
-0.99, -1.07, -1.21, -1.06, -0.96, -1.05, -1.47, -1.52, -1.36,
-1.22, -1.33, -1.36, -1.27, -1.16, -1.36, -1.25, -1.27, -1.3,
-1.04, -0.71, -1.34, -1.19, -1.26, -1.55, -1.53, -1.59, -1.17,
-1, -1.26, -1.14, -1.19, -1.17, -1.12)), .Names = c("Group.date",
"Score"), row.names = c(NA, -184L), class = "data.frame")
dput(Stock)
structure(list(Date = structure(c(15737, 15740, 15741, 15742,
15743, 15744, 15747, 15748, 15749, 15750, 15751, 15755, 15756,
15757, 15758, 15761, 15762, 15763, 15764, 15765, 15768, 15769,
15770, 15771, 15772, 15775, 15776, 15777, 15778, 15779, 15782,
15783, 15784, 15785, 15786, 15789, 15790, 15791, 15792, 15796,
15797, 15798, 15799, 15800, 15803, 15804, 15805, 15806, 15807,
15810, 15811, 15812, 15813, 15814, 15817, 15818, 15819, 15820,
15821, 15824, 15825, 15826, 15827, 15828, 15831, 15832, 15833,
15834, 15835, 15838, 15839, 15840, 15841, 15842, 15845, 15846,
15847, 15848, 15849, 15853, 15854, 15855, 15856, 15859, 15860,
15861, 15862, 15863, 15866, 15867, 15868, 15869, 15870, 15873,
15874, 15875, 15876, 15877, 15880, 15881, 15882, 15883, 15884,
15887, 15888, 15889, 15891, 15894, 15895, 15896, 15897, 15898,
15901, 15902, 15903, 15904, 15905, 15908, 15909, 15910, 15911,
15912, 15915, 15916, 15917, 15918, 15919), class = "Date"), Adj.Close = c(5.69,
5.74, 5.71, 5.77, 5.74, 5.77, 5.79, 5.91, 5.86, 5.87, 5.91, 5.9,
5.79, 5.79, 5.82, 5.73, 5.78, 5.86, 5.8, 5.8, 5.83, 5.87, 5.87,
5.85, 5.88, 5.86, 5.92, 5.88, 5.86, 5.81, 5.87, 6.03, 6.03, 6.06,
6.14, 6.03, 6.05, 6.04, 6.21, 6.25, 6.23, 6.16, 6.21, 6.23, 6.3,
6.28, 6.25, 6.26, 6.22, 7.06, 7.2, 7.09, 7.19, 7.17, 7.17, 7.1,
7.09, 7.14, 7.12, 7.12, 7.05, 7.06, 7.1, 7.15, 7.2, 7.22, 7.32,
7.35, 7.36, 7.18, 7.26, 7.25, 7.28, 7.32, 7.29, 7.39, 7.3, 7.31,
7.33, 7.27, 7.28, 7.34, 7.3, 7.22, 7.26, 7.2, 7.34, 7.24, 7.18,
7.35, 7.35, 7.32, 7.32, 7.22, 7.32, 7, 7.07, 6.97, 6.86, 6.88,
6.97, 6.98, 7.02, 7.07, 7.15, 7.19, 7.16, 7.07, 7.06, 7.18, 6.28,
6.45, 6.72, 6.48, 6.25, 6.05, 6.07, 5.92, 5.85, 5.77, 5.82, 5.74,
5.74, 6.16, 5.96, 6.38, 6.67)), .Names = c("Date", "Adj.Close"
), row.names = c(NA, 127L), class = "data.frame", na.action = structure(128:231, .Names = c("128",
"129", "130", "131", "132", "133", "134", "135", "136", "137",
"138", "139", "140", "141", "142", "143", "144", "145", "146",
"147", "148", "149", "150", "151", "152", "153", "154", "155",
"156", "157", "158", "159", "160", "161", "162", "163", "164",
"165", "166", "167", "168", "169", "170", "171", "172", "173",
"174", "175", "176", "177", "178", "179", "180", "181", "182",
"183", "184", "185", "186", "187", "188", "189", "190", "191",
"192", "193", "194", "195", "196", "197", "198", "199", "200",
"201", "202", "203", "204", "205", "206", "207", "208", "209",
"210", "211", "212", "213", "214", "215", "216", "217", "218",
"219", "220", "221", "222", "223", "224", "225", "226", "227",
"228", "229", "230", "231"), class = "omit"))
Merge the data frames along their respective dates and perform the regression:
M <- merge(Stock, userScoreDF, by = 1)
lm(Score ~ Adj.Close, M)
or to calculate the correlation:
with(M, cor(Score, Adj.Close))
Based on your description, I'd normally say this is a terrible idea. But you just neglected to specify that they have overlapping dates. You just need to merge them.
Here, I name your first df x and your second df y.
x2 <- merge(x[which(x$Group.date %in% y$Date),], y, by.x= "Group.date", by.y= "Date")
lm(Score ~ Adj.Close, data= x2)
Of course, a better question might be why are you using lm on time series data (ie correlated error structure)? That is to say that you're doing it wrong. But, hey, you didn't ask about the statistical validity of your approach.

Resources