I'd like to do the following in R.
I have a group of individuals (1 - 50) with two datasets each. Each dataset (A & B) has values that can be in two categories (Gains, shown in blue; Losses, shown in red). I'd like to show those two datasets together, as below. The frequency of Gains/Losses would be in the y axis, where Dataset A would go upward from the x axis, and Dataset B would go downward from the x axis. I'd like to be able to cluster the barplot either by Individual (as shown below) OR by Gains or Losses (All gains together, then all losses together).
I know how to make clustered barplots in ggplot, but can't figure out how to combine the two datasets as in my image (with dataset A going up and dataset B going down).
We can do something similar to age pyramids, only without flipping the coordinates
testA <- data.frame(v=as.factor(sample(1:2,1000,replace=T, prob = c(1,5))), dataset='A')
testB <- data.frame(v=as.factor(sample(1:2,1000,replace=T, prob = c(5,1))), dataset='B')
require(ggplot2)
require(plyr)
ggplot(data=rbind(testA, testB),aes(x=as.factor(v),fill=v)) +
geom_bar(subset=.(dataset=="A")) +
geom_bar(subset=.(dataset=="B"),aes(y=..count..*(-1)))
Related
Image of excel data set
I have a table in excel with 100 columns and 100 rows. The column starts at 0% and works up to 100%. Same for the row, starts at 0% and goes up to 100%. It is a 2-way sensitivity analysis, i.e. which drug would be optimal if x(variable in column)=10% and y(variable in row)=30%.
I have 100 by 100 table, with the name of four different drugs scattered across the table. I want to take this data into R and create a scatter plot, essentially a square with 10000 smaller squares. I then want R to colour each square based on the drug which is most optimal for that combination of X and Y.
I've attached an image of dummy data showing the same example in a 10 by 10 table.
Hope you can help!
You'll need to start by prepping the data -- reading it in, using something like the pivot_longer() function from the tidyverse package to make the columns into rows, and then likely doing some clean up on the percentages.
After that, the plot (using ggplot2) itself may be pretty straightforward. The geom_tile() function is the one that creates the squares.
library(tidyverse)
# Create test data
df <- expand_grid(x = 1:100, y = 1:100) %>%
mutate(drug = sample(LETTERS[1:4], size = 10000, replace = TRUE))
# Make the plot
df %>%
ggplot(aes(x, y, fill = drug)) +
geom_tile()
I work with a massive 4D nifti file (x - y - z - subject; MRI data) and due to the size I can't convert to a csv file and open in R. I would like to get a series of overlaying density plots (classic example here) one for each subject with the idea to just visualise that there is not much variance in density distributions across the sample.
I could however, extract summary statistics for each subject (mean, median, SD, range etc. of the variable of interest) and use these to create the density plots (at least for the variables that are normally distributed). Something like this would be fantastic but I am not sure how to do it for density plots.
Your help will be much appreciated.
So these really aren't density plots per se - they are plots of densties of normal distributions with given means and standard deviations.
That can be done in ggplot2, but you need to expand your table of subjects and summaries into grids of points and normal densities at those points.
Here's an example. First, make up some data, consisting of subject IDs and some simulated sample averages and sample standard deviations.
library(tidyverse)
set.seed(1)
foo <- data_frame(Subject = LETTERS[1:10], avg=runif(10, 10,20), stdev=runif(10,1,2))
Now, for each subject we need to obtain a suitable grid of "x" values along with the normal density (for that subject's avg and stdev) evaluated at those "x" values. I've chosen plus/minus 4 standard deviations. This can be done using do. But that produces a funny data frame with a column consisting of data frames. I use unnest to explode out the data frame.
bar <- foo %>%
group_by(Subject) %>%
do(densities=data_frame(x=seq(.$avg-4*.$stdev, .$avg+4*.$stdev, length.out = 50),
density=dnorm(x, .$avg, .$stdev))) %>%
unnest()
Have a look at bar to see what happened. Now we can use ggplot2 to put all these normal densities on the same plot. I'm guessing with lots of subjects you wouldn't want a legend for the plot.
bar %>%
ggplot(aes(x=x, y=density, color=Subject)) +
geom_line(show.legend = FALSE)
I'm (a newbie to R) analyzing a randomized study on the effect of two treatments on gene expression. We evaluated 5 different genes at baseline and after 1 year. The gene fold is calculated as the value at 1 year divided by the baseline value.
Example gene:
IL10_BL
IL10_1Y
IL10_fold
Gene expression is measured as a continuous variable, typically ranging from 0.1 to 5.0.
100 patients have been randomized to either a statin or diet regime.
I would like to do the following plot:
- Y axis should display the mean gene expression with 95% confidence limit
- X axis should be categorical, with the baseline, 1 year and fold value for each of the 5 genes, grouped by treatment. So, 5 genes with 3 values for each gene in two groups would mean 30 categories on the X axis. It would be really nice of the dots for the same gene would be connected with a line.
I have tried to do this myself (using ggplot2) without any success. I've tried to do it directly from the crude data, which looks like this (first 6 observations and 2 different genes):
genes <- read.table(header=TRUE, sep=";", text =
"treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;
diet;1.1;1.5;1.4;1.4;1.4;1.1;
statin;2.5;3.3;1.3;2.7;3.1;1.1;
statin;3.2;4.0;1.3;1.5;1.6;1.1;
diet;3.8;4.4;1.2;3.0;2.9;0.9;
statin;1.1;3.1;2.8;1.0;1.0;1.0;
diet;3.0;6.0;2.0;2.0;1.0;0.5;")
I would greatly appreciate any help (or link to a similar thread) to do this.
First, you need to melt your data into a long format, so that one column (your X column) contains a categorical variable indicating whether an observation is BL, 1Y, orfold.
(your command creates an empty column you might need to get rid of first: genes$X = NULL)
library(reshape2)
genes.long = melt(genes, id.vars='treatment', value.name='expression')
Then you need the gene and measurement (baseline, 1-year, fold) in different columns (from this question).
genes.long$gene = as.character(lapply(strsplit(as.character(genes.long$variable), split='_'), '[', 1))
genes.long$measurement = as.character(lapply(strsplit(as.character(genes.long$variable), split='_'), '[', 2))
And put the measurement in the order that you expect:
genes.long$measurement = factor(genes.long$measurement, levels=c('BL', '1Y', 'fold'))
Then you can plot using stat_summary() calls for the mean and confidence intervals. Use facets to separate the groups (treatment and gene combinations).
ggplot(genes.long, aes(measurement, expression)) +
stat_summary(fun.y = mean, geom='point') +
stat_summary(fun.data = 'mean_cl_boot', geom='errorbar', width=.25) +
facet_grid(.~treatment+gene)
You can reverse the order to facet_grid(.~gene+treatment) if you want the top level to be gene instead of treatment.
I want to visualize many time series at once. I am new at R, and have spent about 6 hours searching the web and reading about how to tackle this relatively simple problem. My dataset has five time points arranged as rows, and 100 columns. I can easily plot any column against the time points with qplot(time, var2, geom="line"). But I want to learn how to do this for a flexible number of columns, and how to print 6 to 12 of the individual graphs on one page.
Here I learned about the multiplot function, got that to work in terms of layout.
What I am stuck on is how for get the list of variables into a FOR statement so I can have one statement to plot all the variables against the same five time points.
this is what I am playing with. It makes 9 plots, 3 columns wide, but I do not know how to get all my variables into the array for yvars?
for (i in 1:9) {
p1 = qplot(symbol,yvar, geom ="smooth", main = i))
plots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = plots, cols = 3)
Stupidly on my part right now it makes 9 identical plots. So how do I create the list so the above will cycle through all my columns and make those plots?
first melt all your data using the reshape2 package
datm <- melt(your.original.data.frame, id = "time")
Now plot it using facets:
qplot(time, value, data = datm, facets= variable ~ ., geom="point")
Let me know if this works. If you could, please upload your data, it would help tremendously.
I have data from some psychophysical experiments that I'd like to plot. My dataframe contains multiple observations from multiple participants in three paradigms of an experiment.
In other words, each participant took part in three psychophysical experiments and I'd like to plot the data on a single graph.
At present, my plot looks like this:
The data on the right of the plot are from one of the experiments (1), whilst the mass of data on the left are from the two other experiments (2 & 3). Essentially, I'm trying to show graphically that experiment 1 yields very different results to experiments 2 & 3.
This plot is of two parameters, 'probability_seen' and 'visual_acuity'. My dataframe also contains two other columns: subject_initials and experiment_type. As you can see, I'm separating out the subjects by colour. I'd also like to join the lines up for each of the experiments (the above plot actually contains three curves for each subject), but if I add geom_line() to my plot, I get this:
Obviously, I haven't asked ggplot2 to respect the state of 'experiment_type'. How do I do this?
n.b. I currently call the plot with the following code:
qplot(visual_acuity, probability_seen, data = dframe1, colour = subject_initials,
xlab = "Visual acuity", ylab = "Probability seen") + geom_line()
As #Baptiste has stated, the solution is to add group = experiment_type to the qplot call.