Create a graph from a binary column in a dataframe - R

Create a graph from a binary column in a dataframe - R - r

I need to create a point graph using the "ggplot" library based on a binary column of a dataframe.
df <- c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1)
I need a point to be created every time the value "1" appears in the column, and all points are on the same graph. Thanks.

If the binary column you talk about is associated to some other variables, then I think this might work:
(I've just created some random x and y which are the same length as the binary 0, 1s you provided)
x <- rnorm(22)
y <- x^2 + rnorm(22, sd = 0.3)
df <- data.frame("x" = x, "y" = y,
"binary" = c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1))
library(ggplot2)
# this is the plot with all the points
ggplot(data = df, mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with only the points for which the "binary" variable is 1
ggplot(data = subset(df, binary == 1), mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with all points where they are coloured by whether "binary" is 0 or 1
ggplot(data = df, mapping = aes(x = x, y = y, colour = as.factor(binary))) + geom_point()

Something like this?
library(ggplot2)
y <- df
is.na(y) <- y == 0
ggplot(data = data.frame(x = seq_along(y), y), mapping = aes(x, y)) +
geom_point() +
scale_y_continuous(breaks = c(0, 1),
labels = c("0" = "0", "1" = "1"),
limits = c(0, 1))
It only plots points where df == 1, not the zeros. If you also want those, don't run the code line starting is.na(y).

Not sure exactly what you are asking, but here are a few options. Since your data structure is not a data frame, I've renamed it test. First, dotplot with ggplot:
library(ggplot2)
ggplot(as.data.frame(test), aes(x=test)) + geom_dotplot()
Or you could do the same thing as a bar:
qplot(test, geom="bar")
Or, a primitive base R quick look:
plot(test, pch=16, cex=3)

Related

How plot new point in ggplot with older color data?

I know similar questions asked before but my question is different. Consider data points data1 that have colors with respect to x and y coordinates and I plot it with ggplot
x = 1:100
y = 1:100
d = expand.grid(x,y)
data1 <- data.frame(
xval = d$Var1,
yval = d$Var2,
col = d$Var1+d$Var2)
data2 <- data.frame(
xnew = c(1.5, 90.5),
ynew = c(95.5, 4))
ggplot(data1, aes(xval, yval, colour = col)) + geom_point()
But I want the last line don't plot anything and I want plot data2 points with respect to colors of data1. for example I paint what I want to plot for data2 :
I changed the last line to:
ggplot(data1, aes(xval, yval, colour = col)) +
geom_point(data = data2, aes(x = xnew, y = ynew))
Now I expect that ggplot draw just 2 points of data2, but I have an Error:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: Column colour must be a 1d atomic vector or a list

The problem is, that there is no mapping between col out of data1 and your data2.
Please try the following:
ggplot(data2, aes(x = xnew, y = ynew, colour = xnew)) + geom_point() +
scale_fill_gradientn(colours=c(2,1),
values = range(data1$xval),
rescaler = function(x,...) x,
oob = identity)

ggplot linetype causing line to be invisible

I want to plot some lines and have them differentiated on color and linetype. If I differentiate on color alone, it's fine. If I add linetype, the line for group 14 vanishes.
(ggplot2_2.1.0, R version 3.3.1)
library(ggplot2)
g <- paste("Group", 1:14)
group <- factor(rep(g, each=14), levels=g)
x <- rep(1:14, length.out=14*14)
y <- c(runif(14*13), rep(0.55, 14))
d <- data.frame(group, x, y, stringsAsFactors=FALSE)
# Next 2 lines work fine - check the legend for Group 14
ggplot(d, aes(x, y, color=group)) +
geom_line()
# Add linetype
ggplot(d, aes(x, y, color=group, linetype=group)) +
geom_line()
# Group 14 is invisible!
What's going on?

You can solve it by defining a form of each line manually with hex strings (see ?linetype).
?linetype says;
..., the string "33" specifies three units on followed by three off
and "3313" specifies three units on followed by three off followed by
one on and finally three off.
HEX <- c(1:9, letters[1:6]) # can't use 0
## make linetype with (two- or) four‐digit number of hex
# In this example, I made them randomly
set.seed(1); HEXs <- matrix(sample(HEX, 4*14, replace = T), ncol = 4)
my_val <- apply(HEXs, 1, function(x) paste0(x[1], x[2], x[3], x[4]))
# example data
group <- factor(rep(paste("Group", 1:14), each = 20), levels=paste("Group", 1:14))
data <- data.frame(x = 1:20, y = rep(1:14, each=20), group = group)
ggplot(data, aes(x = x, y = y, colour = group, linetype = group)) +
geom_line() +
scale_linetype_manual(values = my_val)

R ggplot2::geom_density with a constant variable

I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):
df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:
df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).
Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.
Thanks

Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:
# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)
# facet plot them
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:
set.seed(1234)
noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.

ggplot creates artefacts in plot with geom_line()

I have two vectors x,y and want to create a line plot for y over x.
y = c(-0.0400785, -0.0304795, -0.0208800, -0.0112805, -0.0016810, 0.0079185, 0.0175180, 0.0271175,
0.0367170, 0.0463160, 0.0559155, 0.0655150, 0.0751145, 0.0847140, 0.0943135, 0.1039130,
0.1135125, 0.1231120, 0.1327110, 0.1423105, 0.1519100)
x = c(-5.867304, -5.879089, -5.987021, -6.309500, -6.770748, -7.189354, -7.455675, -7.545589, -7.463138,
-7.371971, -7.407384, -7.461245, -7.398057, -7.192540, -7.010408, -6.961792, -6.994748, -6.971052,
-6.779542, -6.536575, -6.301766)
If I use the base plot function everything is fine.
plot(x,y, type = "l")
If I create a line plot using ggplot
library(ggplot2)
df = data.frame("y" = y, "x" = x)
ggplot(data = df) + geom_line(aes(x = x, y = y))
It introduces strange artefacts, as if wrong x are ordered to wrong y. Does anyone have the same experience and even better a solution for that problem?

You have to use geom_path:
ggplot(data = df) +
geom_path(aes(x = x, y = y))

How to add ggplot legend of two different lines R?

I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)

Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")

Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create a graph from a binary column in a dataframe - R - r

I need to create a point graph using the "ggplot" library based on a binary column of a dataframe. df <- c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1) I need a point to be created every time the value "1" appears in the column, and all points are on the same graph. Thanks.

Related

How plot new point in ggplot with older color data?

ggplot linetype causing line to be invisible

R ggplot2::geom_density with a constant variable

ggplot creates artefacts in plot with geom_line()

How to add ggplot legend of two different lines R?

Categories

Resources