Question about zero values in grouped_ggbetween stats (R) - r

does anyone know if it's possible to plot a grouped ggbetweenstats (using grouped_ggbetweenstats) plot if some variables in my x-axis hold all zero values for some of the groupings (i.e. it cannot be plotted, but I'd like it to be left blank, or for the graph to add a boxplot/point on the zero mark for those categories)? And if so, how do I do this?
I've tried googling about it but no answers so far

This is a relatively complex question to ask without giving any sample data, and if your data is exactly as you describe it, then it is not clear what your problem is.
Suppose we simulate some data for demonstration purposes:
library(ggstatsplot)
set.seed(1)
df <- data.frame(x = rep(paste("Class", LETTERS[1:3]), each = 20),
y = rnorm(60, rep(1:3, each = 20)),
group = rep(paste("Group", 1:2)))
This gives us 10 random values for each combination of two grouping variables, x, which we plot on the x axis, and group, which we use as the grouping variable. When we plot it looks like this:
grouped_ggbetweenstats(df, x, y, grouping.var = group)
Suppose now that Class B only contains 0 values, which from your description is how your own data is structured.
df$y[df$x == "Class B"] <- 0
But we can still plot the results:
grouped_ggbetweenstats(df, x, y, grouping.var = group)
And the zero-only variable is still plotted with a value of zero, as desired.
Is there some assumption that I have made wrongly?

Related

Plot multiple traces in R

I started learning R for data analysis and, most importantly, for data visualisation.
Since I am still in the switching process, I am trying to reproduce the activities I was doing with Graphpad Prism or Origin Pro in R. In most of the cases everything was smooth, but I could not find a smart solution for plotting multiple y columns in a single graph.
What I usually get from the softwares I use for data visualisations look like this:
Each single black trace is a measurement, and I would like to obtain the same plot in R. In Prism or Origin, this will take a single copy-paste in a XY graph.
I exported the matrix of data (one X, which indicates the time, and multiple Y values, which are the traces you see in the image).
I imported my data in R with the following commands:
library(ggplot2) #loaded ggplot2
Data <- read.csv("Directory/File.txt", header=F, sep="") #imported data
DF <- data.frame(Data) #transformed data into data frame
If I plot my data now, I obtain a series of columns, where the first one (called V1) is the X axis and all the others (V2 to V140) are the traces I want to put on the same graph.
To plot the data, I tried different solutions:
ggplot(data=DF, aes(x=DF$V1, y=DF[V2:V140]))+geom_line()+theme_bw() #did not work
plot(DF, xy.coords(x=DF$V1, y=DF$V2:V140)) #gives me an error
plot(DF, xy.coords(x=V1, y=c(V2:V10))) #gives me an error
I tried the matplot, without success, following the EZH guide:
The code I used is the following: matplot(x=DF$V1, type="l", lty = 2:100)
The only solution I found would be to individually plot a command for each single column, but it is a crazy solution. The number of columns varies among my data, and manually enter commands for 140 columns is insane.
What would you suggest?
Thank you in advance.
Here there are also some data attached.Data: single X, multiple Y
I tried using the matplot(). I used a very sample data which has no trend at all. so th eoutput from my code shall look terrible, but my main focus is on the code. Since you have already tried matplot() ,just recheck with below solution if you had done it right!
set.seed(100)
df = matrix(sample(1:685765,50000,replace = T),ncol = 100)
colnames(df)=c("x",paste0("y", 1:99))
dt=as.data.frame(df)
matplot(dt[["x"]], y = dt[,c(paste0("y",1:99))], type = "l")
If you want to plot in base R, you have to make a plot and add lines one at a time, however that isn't hard to do.
we start by making some sample data. Since the data in the link seemed to all be on the same scale, I will assume your data frame only has y values and the x value is stored separately.
plotData <- as.data.frame(matrix(sort(rnorm(500)),ncol = 5))
xval <- sort(sample(200, 100))
Now we can initialize a plot with the first column.
plot(xval, plotData[[1]], type = "l",
ylim = c(min(plotData), max(plotData)))
type = "l" makes a line plot instead of a scatter plot
ylim = c(min(plotData), max(plotData)) makes sure the y-axis will fit all the data.
Now we can add the rest of the values.
apply(plotData[-1], 2, lines, x = xval)
plotData[-1] removes the column we already plotted,
apply function with 2 as the second parameter means we want to execute a function on every column,
lines defines the function we are applying to the columns. lines adds a new line to the current plot.
x = xval passes an extra parameter (x) to the lines function.
if you wat to plot the data using ggplot2, the data should be transformed to long format;
library(ggplot2)
library(reshape2)
dat <- read.delim('AP.txt', header = F)
# plotting only first 9 traces
# my rstudio will crach if I plot the full data;
df <- melt(dat[1:10], id.vars = 'V1')
ggplot(df, aes(x = V1, y = value, color = variable)) + geom_line()
# if you want all traces to be in same colour, you can use
ggplot(df, aes(x = V1, y = value, group = variable)) + geom_line()

Changing color of a line in plot when certain condition is met

I have a plot of a probability of an event which changes in time (x-axis is time and y-axis is the probability variable). Beside that in data I have a variable that can have value 0 or 1. What I am looking for is that the probability line has black color when the binary value is 0 and when the binary variable is 1 then the probability line is red. So at the moment I have only the plot of the probability. I don't know how to incorporate the binary variable.
plot_ly(df,x=df$time,y=df$probab,type="line")
To make things easy I made an example with multiple steps:
df <- data.frame(time = 1:11
, probab = seq(0.1,0.2, by=0.01)
,bin = rep(1,11))
linecol <- c("black","red")[max(df$bin)+1]
plot_ly(df,x=df$time,y=df$probab,type="scatter", mode="lines"
, line = list(color=linecol))
I assumed the binary value was in the dataframe, but offcourse another source can be used as well.
Your remark on your question makes it a bit more complex. I suggest to divide your line into different sections of lines. This means duplicating the points of the data, to make the sections.
library(plotly)
df <- data.frame(time = 1:11
, probab = seq(0.1,0.2, by=0.01)
,bin = c(rep(0:1,5),1))
df2 <- df[c(1,rep(2:10,each=2),11),]
df2$section = rep(1:10, each=2)
linecol <- c("black","red")[df2$bin[seq(1,20,by=2)]+1]
linecol <- rep(linecol,each=2)
p <- plot_ly(df2 %>% group_by(section),x=~time,y=~probab)
add_trace(p, mode = "markers+lines",color = linecol)
For some reason add_trace did not identify the red and black string as a color. Not sure how to fix this.

R-Programming: Chart the Z distribution of a factor's frequency

I have reviewed a number of posts regarding histograms/barcharts from categorical data but I still can't seem to progress. I have a data set of names (single column) and each name occurs anywhere from once to 8,000 times. I can create a table with variable and frequency and I can move that table to a data frame but o matter what I try I can't even get a barplot much less a histogram with variable on x axis and frequency on the y axis.
Ultimately, I want to use the table or dataframe with name and frequency to calculate the Z score for each name and then graph the distribution. I can do this easily with a series of numbers but doing it with a categorical variable has me stumped.
thanks,
rms
Is this what you're looking for?
example_data <- data.frame(Name = sample(paste0("Name", 1:15), size = 8000, replace=TRUE, prob = (1:15)/sum(1:15)))
counts <- as.data.frame(table(example_data))
colnames(counts) <- c("Name", "Freq")
library(ggplot2)
ggplot(data = counts, aes(x = Name, y = Freq)) + geom_bar(stat="identity")
For future reference, it's a little easier to answer if you provide a reproducible example, or go into more detail about what you've tried already. Hope this helps!

Get a histogram plot of factor frequencies (summary)

I've got a factor with many different values. If you execute summary(factor) the output is a list of the different values and their frequency. Like so:
A B C D
3 3 1 5
I'd like to make a histogram of the frequency values, i.e. X-axis contains the different frequencies that occur, Y-axis the number of factors that have this particular frequency. What's the best way to accomplish something like that?
edit: thanks to the answer below I figured out that what I can do is get the factor of the frequencies out of the table, get that in a table and then graph that as well, which would look like (if f is the factor):
plot(factor(table(f)))
Update in light of clarified Q
set.seed(1)
dat2 <- data.frame(fac = factor(sample(LETTERS, 100, replace = TRUE)))
hist(table(dat2), xlab = "Frequency of Level Occurrence", main = "")
gives:
Here we just apply hist() directly to the result of table(dat). table(dat) provides the frequencies per level of the factor and hist() produces the histogram of these data.
Original
There are several possibilities. Your data:
dat <- data.frame(fac = rep(LETTERS[1:4], times = c(3,3,1,5)))
Here are three, from column one, top to bottom:
The default plot methods for class "table", plots the data and histogram-like bars
A bar plot - which is probably what you meant by histogram. Notice the low ink-to-information ratio here
A dot plot or dot chart; shows the same info as the other plots but uses far less ink per unit information. Preferred.
Code to produce them:
layout(matrix(1:4, ncol = 2))
plot(table(dat), main = "plot method for class \"table\"")
barplot(table(dat), main = "barplot")
tab <- as.numeric(table(dat))
names(tab) <- names(table(dat))
dotchart(tab, main = "dotchart or dotplot")
## or just this
## dotchart(table(dat))
## and ignore the warning
layout(1)
this produces:
If you just have your data in variable factor (bad name choice by the way) then table(factor) can be used rather than table(dat) or table(dat$fac) in my code examples.
For completeness, package lattice is more flexible when it comes to producing the dot plot as we can get the orientation you want:
require(lattice)
with(dat, dotplot(fac, horizontal = FALSE))
giving:
And a ggplot2 version:
require(ggplot2)
p <- ggplot(data.frame(Freq = tab, fac = names(tab)), aes(fac, Freq)) +
geom_point()
p
giving:

Subset of data included in more than one ggplot facet

I have a population and a sample of that population. I've made a few plots comparing them using ggplot2 and its faceting option, but it occurred to me that having the sample in its own facet will distort the population plots (however slightly). Is there a way to facet the plots so that all records are in the population plot, and just the sampled records in the second plot?
Matt,
If I understood your question properly - you want to have a faceted plot where one panel contains all of your data, and the subsequent facets contain only a subset of that first plot?
There's probably a cleaner way to do this, but you can create a new data.frame object with the appropriate faceting variable that corresponds to each subset. Consider:
library(ggplot2)
df <- data.frame(x = rnorm(100), y = rnorm(100), sub = sample(letters[1:5], 100, TRUE))
df2 <- rbind(
cbind(df, faceter = "Whole Sample")
, cbind(df[df$sub == "a" ,], faceter = "Subset A")
#other subsets go here...
)
qplot(x,y, data = df2) + facet_wrap(~ faceter)
Let me know if I've misunderstood your question.
-Chase

Resources