I have data from some psychophysical experiments that I'd like to plot. My dataframe contains multiple observations from multiple participants in three paradigms of an experiment.
In other words, each participant took part in three psychophysical experiments and I'd like to plot the data on a single graph.
At present, my plot looks like this:
The data on the right of the plot are from one of the experiments (1), whilst the mass of data on the left are from the two other experiments (2 & 3). Essentially, I'm trying to show graphically that experiment 1 yields very different results to experiments 2 & 3.
This plot is of two parameters, 'probability_seen' and 'visual_acuity'. My dataframe also contains two other columns: subject_initials and experiment_type. As you can see, I'm separating out the subjects by colour. I'd also like to join the lines up for each of the experiments (the above plot actually contains three curves for each subject), but if I add geom_line() to my plot, I get this:
Obviously, I haven't asked ggplot2 to respect the state of 'experiment_type'. How do I do this?
n.b. I currently call the plot with the following code:
qplot(visual_acuity, probability_seen, data = dframe1, colour = subject_initials,
xlab = "Visual acuity", ylab = "Probability seen") + geom_line()
As #Baptiste has stated, the solution is to add group = experiment_type to the qplot call.
Related
My data frame has 621 rows and each column describes something about it. I'm trying to do a exploratory data analysis where I plot out all the data into a bar plot.
I have a factor column called phenotype, which has 86 levels which describe the main condition in my cohort. I want to plot this out as 86 separate bar plots, each with the total number of people who have that condition on ggplot.
I've attached a screenshot of my data below, I basically want the x axis to have the condition name like the 'Bardet-Biedl Syndrome', 'Classic Ehlers Danlos Syndrome' etc and on the y axis the number of people who have that condition, such as 3,4,5 as displayed below etc. I got the below data by basically doing
table(data.frame$Phenotype)
I'm using the below code to generate my ggplot
ggplot (tiering, aes(x = Phenotype, y = count(tiering$Phenotype))) +
theme bw() +
geom bar(stat = "identity")
I'm sure the answer is out there, but I've looked on the R help websites and I can't seem to figure this out, so would be very grateful for the help.
EDIT: I got to a marplot with the help of the below code, just trying to reorder the bar/columns in decreasing order and tried this method but it hasn't worked. Would anyone have any suggestions?
I'm rather new to R and have been trying to analyze some of my proteomic data with it. In particular, I'm trying to make a PCA plot so that I can see some of the similarities/differences between my different treatments. I essentially have 5 treatments, each in triplicates, so 15 columns total. My 95 rows are each represented by a different Protein ID. My actual data are normalized LFQ intensities, which is only to say that it's numeric and I know a PCA analysis/plot is appropriate.
I was able to successfully create a PCA plot with this code:
PlantPCA <- prcomp(Plant_Num_Named, center = TRUE, scale = TRUE)
summary(PlantPCA)
#create a quick plot
plot(PlantPCA$x[,1],PlantPCA$x[,2], xlab="PC1 (69.4%)", ylab = "PC2 (9.2%)", main = "PC1 / PC2 - plot")
#full plot
fviz_pca_ind(PlantPCA, geom.ind = "point", pointshape = 21,
pointsize = 2) +
ggtitle("2D PCA-plot from 15 feature dataset") +
theme(plot.title = element_text(hjust = 0.5))
Which gives me this:
My PCA Plot
I know it's basic I just wanted to get it to work first. However, now I want to add ellipses to surround my 5 different treatments, disregarding the different triplicates, and I'm not sure how to do so. I've seen people be able to do this when what they want to color by be another column of categorical data--but that isn't the case here.
Ideally, I would have 5 different colors for my treatments (which again are currently different columns) and the ellipses would match the color.
I want something similar to this:
Example Plot
From this tutorial website: https://towardsdatascience.com/principal-component-analysis-pca-101-using-r-361f4c53a9ff
Is this something that is attainable? I just need a little direction. Any and all advice is welcome!
I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph
I am analyzing US election data volume from Google trend. I type the below command in R studio.
The poliData dataframe contains the SearchVolume for all months for three Politicians.
ggplot(data = poliData, aes(x=Date, group=Politician, colour=Politician)) +
geom_density()
But I only get the density line (blue) for one politician only with the above command.See the attached picture. Can you please help
I guess you got three lines on top each other because Date variable values are the same for all three politicians. My understanding of your analysis could be something like this:
ggplot(data = poliData,
aes(x=Date, colour=Politician,
weight = SearchVolume/sum(SearchVolume))) +
geom_density()
Adding weight should produce distinct lines for different politicians. If this is not what you wanted, please dput your data for others to work out a solution for you. Also, as I do not have the data, I have not tested the above code yet. Please let me know if it does not work.
I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.