Plotting two longitudinal variables against time in r - r

Say I have a data that included two longitudinal variables (x1, x2), t is time (years), and type is class:
set.seed(20)
x1 = rnorm(20,5,1)
x2 = (x1 + rnorm(20))
t = rep(c(0,1,2,3), 5)
id = rep(1:5,each = 4)
type = as.factor(c(rep(0,8), rep(1,12)))
df = data.frame(id, t, x1, x2, type)
Is it possible to plot x1 and x2 agnist t in one plot? Actually, I am trying to see the relationship between x1 and x1 (but here use rnorm to make it easy) by modified the correlation matrix.

Not sure how you want to treat the ID variable, but maybe try this?
require(reshape)
df <- reshape::melt(df, id.vars = c('id', 't', 'type'))
ggplot(df, aes(x = t, y = value, color = variable)) +
geom_line() +
facet_wrap(~id)

Related

How to write a function to loop through variables and plot using ggplot

I'm having problems figuring out how to loop through variables in a data frame and plot them using ggplot.
An example of my data is below:
head(myData,2)
x1 x2 yhat x11 x3 yhat1 x12
1 -0.8523122 -2.737223 -6.562228 -0.8523122 -1.450288 0.464739 -0.8523122
2 -0.5649950 -2.737223 -6.562228 -0.5649950 -1.450288 0.464739 -0.5649950
x4 yhat2 x21 x31 yhat3
1 -1.267759 -4.624147 -2.737223 -1.450288 -0.6858007
2 -1.267759 -4.624147 -2.267001 -1.450288 -0.6858007
What I'm trying to do is to use geom_raster to plot each pair of variables (i.e., [x1,x2],[x11,x3],etc) and use the corresponding yhat as the fill value.
For example, if I were to plot everything manually I'd do something like:
p<-ggplot(myData, aes(x = x1, y = x2)) + geom_raster(aes(fill = yhat))
pp<-ggplot(myData, aes(x = x11, y = x3)) + geom_raster(aes(fill = yhat1))
ppp<-ggplot(myData, aes(x = x12, y = x4)) + geom_raster(aes(fill = yhat2))
pppp<-ggplot(myData, aes(x = x21, y = x31)) + geom_raster(aes(fill = yhat3))
grid.arrange(p, pp, ppp, pppp, ncol = 2)
But I'm trying to write a function that will loop through the data frame and plot the graphs. I tried to adapt the code from a different question here but I can't make it work for me.
Any suggestions as to how I would achieve this for my data?
One way would be to split data in every 3 columns and apply the code to each list.
library(gridExtra)
library(tidyverse)
library(rlang)
temp <- split.default(df, gl(ncol(myData)/3, 3)) %>%
map(~{
x <- syms(names(.))
ggplot(., aes(x = !!x[[1]], y = !!x[[2]])) + geom_raster(aes(fill = !!x[[3]]))
})
grid.arrange(grobs = temp)
data
Applied this on limited data of 2 rows.
myData <- structure(list(x1 = c(-0.8523122, -0.564995), x2 = c(-2.737223,
-2.737223), yhat = c(-6.562228, -6.562228), x11 = c(-0.8523122,
-0.564995), x3 = c(-1.450288, -1.450288), yhat1 = c(0.464739,
0.464739), x12 = c(-0.8523122, -0.564995), x4 = c(-1.267759,
-1.267759), yhat2 = c(-4.624147, -4.624147), x21 = c(-2.737223,
-2.267001), x31 = c(-1.450288, -1.450288), yhat3 = c(-0.6858007,
-0.6858007)), class = "data.frame", row.names = c("1", "2"))

how to plot many x variable agaist one y variable using ggplot function in

I have an excel file with multiple columns with titles as x, x1, x2, x3, x4 etc. I am using ggplot function in R to plot x against x1. The code is
data %>%
ggplot(aes(x = x1, y = x)) +
geom_point(colour = "red") +
geom_smooth(method = "lm", fill = NA)
How to modify the present code so as to plot x against x1, x against x2, x against x3, x against x4 in the same ggplot function code
You should change the way your data.frame is formated to do this easily with ggplot2 syntax.
Instead of having 5 columns, with x, x1, x2, x3, x4, you may want to have a data.frame with 3 columns : x, y and type with type being a categorical variable indicating from which column your y is from (x1, x2, x3 or x4).
That would be something like this :
df <- data.frame(x = rep(data$x, 4),
y = c(data$x1, data$x2, data$x3, data$x4),
type = rep(c("x1", "x2", "x3", "x4"), each = nrow(data))
Then, with this data.frame, you can set the aes in order to plot x according to y for each category of your variable type thanks to the color argument.
ggplot(df, aes(x = x, y = y, color = type)) + geom_point() + geom_smooth(method = "lm, fill = "NA")
You should check http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization for detailed explanations and customizations.

How do I create a regression line with various variables in R

I have already created the actual regression code but I am trying to get the regression line and a predicted line onto a plot but I can't seem to figure it out.
m1 <- lm(variable1 ~ 2 + 3 + 4 + 5 + 6 + 7 + 8, data = prog)
summary(m1)
and then I want to create the plot on the basis of hyp.data but I am still a bit lost.
Consider two (not 7!) predictor variables; one is numeric, the other categorical (i.e. a factor).
# Simulate data
set.seed(2017);
x1 <- 1:10;
x2 <- as.factor(sample(c("treated", "not_treated"), 10, replace = TRUE));
df <- cbind.data.frame(
y = 2 * x1 + as.numeric(x2) - 1 + rnorm(10),
x1 = x1,
x2 = x2);
In that case you can do the following:
# Fit the linear model
m1 <- lm(y ~ x1 + x2, data = df);
# Get predictions
df$pred <- predict(m1);
# Plot data
library(ggplot2);
ggplot(df, aes(x = x1, y = y)) +
geom_point() +
facet_wrap(~ x2, scales = "free") +
geom_line(aes(x = x1, y = pred), col = "red");

How to make a loop to plot several graphs using ggplot

This is my dataframe:
x1 <- c(1,2,3,4)
x2 <- c(3,4,5,6)
x3 <- c(5,6,7,8)
x4 <- c(7,8,9,10)
x5 <- c(8,7,6,5)
df <- c(x1,x2,x3,x4,x5)
I choose 3 variables from my dataframe to plot 3 separate scatterplots each against x1 and store these in a character vector:
varlist <- c("x2","x4","x5")
So I want to create a function to make 3 independent scatterplots of x1 with x2, x1 with x4 and x1 with x5, using ggplot, where xx and yy will be the different pairs of variables to plot:
ggplot(data = df) +
geom_point(mapping = aes(x = xx, y = yy)) +
geom_smooth(mapping = aes(x = xx, y = yy))
You could do:
mapply(function(y) print(ggplot(data = df) +
geom_point(aes_string(x = "x1", y = y)) +
geom_smooth(aes_string(x = "x1", y = y))), y=c("x2","x4","x5"))
Note : I used df <- data.frame(x1,x2,x3,x4,x5) instead of df <- c(x1,x2,x3,x4,x5)
x is set to x1, mapply will loop over y which contains the different variables we want to have plotted against x1.

Plotting model comparison statistics in R

I combined several data-frames into a data-frame dfc with a fifth column called model specifying which model was used for imputation. I want to plot the distributions by grouping them by model.
dfc looks something like: (1000 rows, 5 columns)
X1 X2 X3 X4 model
1500000 400000 0.542 7.521 actual
250000 32000 2.623 11.423 missForest
...
I use the lines below to plot:
library(lattice)
densityplot(X1 + X2 + X3 + X4, group = dfc$model)
giving:
Note that X1 <- dfc$X1 (and likewise)
My questions are:
How can I add a legend to this plot? (this plot is useless if one can't tell which colour belongs to which model)
Is there, perhaps, a more visually appealing way to plot this? Using ggplot, perhaps?
Is there a better way to compare these models? For example, I could plot for each column separately.
A fast density plot using ggplot.
library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)
data <- data.frame(rnorm = a, runif = b, rchisq = c)
data <- melt(data) #from reshape2 package
ggplot(data) + geom_density(aes(value, color = variable)) +
geom_jitter(aes(value, 0, color = variable), alpha = 0.5, height = 0.02 )
Remark: I added the reshape2 package because ggplot likes "long" data and I think yours are "wide".
Plotting each column seperatly would work like that:
ggplot(data) + geom_density(aes(value, color = variable))
+ geom_point(aes(value, 0, color = variable))
+ facet_grid(.~variable)
Here the color might be redundant but you can just remove the color argument.
All I had to do was set an argument:
densityplot(X1 + X2 + X3 + X4, group = dfc$model, auto.key = TRUE) gives the desired plot
The problem was that I couldn't figure out which densityplot() was R using.
The other parts of the question remain open.
Data copied from #alex
library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)
dat <- data.frame(Hmisc = a, MICE = b, missForest = c)
dat <- melt(dat)
library(lattice) # using lattice package
densityplot(~value,dat,groups = variable,auto.key = T)
individual plots
densityplot(~value|variable,dat,groups = variable,auto.key = T,scales=list(relation="free"))

Resources