Plotting proportions of choices of each participant separately - r

I's like to find a quite efficient way to plot for each participant ($participant_num) the proportion of responses ($resp) every 10 trials ($trial, out of 200 trials per participant).
enter image description here
When I did it for a subset of my sample (only 30 participants) I used a very rudimental code, for which I had first created a separate dataframe for each subject:
whichSubject<-6 # Which subject do want to analyse?
sData<-filter(banditData,subject==whichSubject)
and then I tried to get proportions for each 10 trials and put them in a separate column
sData$newcolumn <- NULL
sData$newcolumn1_10<- table(sData[1:10,]$resp)/length(sData[1:10,]$resp)
sData$newcolumn11_20<- table(sData[11:20,]$resp)/length(sData[11:20,]$resp)
sData$newcolumn21_30<- table(sData[21:30,]$resp)/length(sData[21:30,]$resp)
and so on for all the 200 trials and separately for each subject.. Then, I reshaped the dataframe as long and plotted it with the following script:
ggplot()+
geom_line(data=rewardDF,aes(x=Trial,y=pHappy,colour=Bandit), linetype="dashed", size=1.03)+
geom_point(data=longdf,aes(x=trial, y=resp_prop,colour=bandit,shape=bandit),size=3)+
geom_line(data=longdf,aes(x=trial, y=resp_prop,colour=bandit),size=1)+
scale_shape_manual(values=SymTypes)+
scale_colour_manual(values=cbPalette)+
labs(col='bandit',y='p(choice)',x='trials')+
scale_x_continuous(breaks = seq(0,200,by=10), limits=c(0,203), expand=(c(0,0)))+
scale_y_continuous(breaks = seq(0,1,by=0.1), limits=c(0,1.03), expand=(c(0.02,0)))+
theme_bw()+
ggsave(paste(c("data/S",whichSubject,"p(choice_absorangeblue).png"),collapse=""), scale=2,dpi = 300)
The output was something like this. Each dot represented how many times a participant selected left (resp=0) vs right (resp=1) in 10 trials (e.g., if the participant selected left 3 times out of 10 the dot for left, which corresponded to arm 1 in a task where you were asked to select between two arms, would be presented on the y axis at 0.3 and conversly the dot for right at 0.7)
enter image description here
However, now I have over 200 participants and it is definitely too time consuming using this approach!
I was thinking of using something to add facet_grid(participant_num ~ .)+ to my ggplot code in order to code each participant separately without the need of sub selecting.. However, I haven't found a solution on how to plot the proportion of choices without having to calculate them separately. Do you have any tip on how I could do this within ggplot?
Many thanks in advance for your help!!

Related

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

Label stacked bar chart with variable other than plotted Y

I'm working on some fish electroshocking data and looking at fish species abundance per transects in a river. Essentially, I have an abundance of different species per transect that I'm plotting in a stacked bar chart. But, what I would like to do is label the top of the bar, or underneath the x-axis tick mark with N = Total Preds for that particular transect. The abundance being plotted is the number of that particular species divided by the total number of fish (preds) that were caught at that transect. I am having trouble figuring out a way to do this since I don't want to label the plot with the actual y-value that is being plotted.
Excuse the crude code. I am newer to R and not super familiar with generating random datasets. The following is what I came up with. Obviously in my real data the abundance % per transect always adds up to 100 %, but the idea is to be able to label the graph with TotalPreds for a transect.
#random data
Transect<-c(1:20)
Habitat<-c("Sand","Gravel")
Species<-c("Smallmouth","Darter","Rock Bass","Chub")
Abund<-runif(20,0.0,100.0)
TotalPreds<-sample(1:139,20,replace=TRUE)
data<-data.frame(Transect,Habitat,Species,Abund,TotalPreds)
#Generate plot
AbundChart<-ggplot(data=data,aes(x=Transect,y=Abund,fill=Species))
AbundChart+labs(title="Shocking Fish Abundance")+theme_bw()+
scale_y_continuous("Relative Abundance (%)",expand=c(0.02,0),
breaks=seq(0,100,by=20),labels=seq(0,100,by=20))+
scale_x_discrete("Transect",expand=c(0.03,0))+
theme(plot.title=element_text(face='bold',vjust=2,size=25))+
theme(legend.title=element_text(vjust=5,size=15))+
geom_bar(stat="identity",colour="black")+
facet_grid(~Habitat,labeller=label_both,scales="free_x")
I get this plot that I would like to label with TotalPreds as described previously.
Again my plot would have bars that reached 100% for abundance, and in my real data transects 1-10 are gravel and 11-20 are sand. Excuse my poor sample dataset.
*Update
My actual data looks like this:
Variable in this case is the fish species and value is the abundance of that species at that particular electroshocking transect. Total_Preds is repeated when the data moves to a new species, because total preds is indicative of the total preds caught at that particular transect (i.e. each transect only has 1 total preds value). Maybe the melt function wasn't the right way to analyze this, but I have like 17 fish species that were caught at different rates across these 20 transects. I guess habitat type is singular to a transect as well, with 1-10 being gravel and 11-20 being sand, and that is repeated in my dataset across fish species as well.
Edited in response to the update, you should be able to create a new dataframe containing the TotalPred data (not repeated) and use that in geom_text. Can't test this without data but maybe:
# select non-repeated half of melted data for use in geom_text
textlabels <- data[c(1:19),]
#Generate plot
AbundChart<-ggplot(data=data,aes(x=Transect,y=Abund,fill=Species))
AbundChart+labs(title="Shocking Fish Abundance")+theme_bw()+
scale_y_continuous("Relative Abundance (%)",expand=c(0.02,0),breaks=seq(0,100,by=20),labels=seq(0,100,by=20))+
scale_x_discrete("Transect",expand=c(0.03,0))+
theme(plot.title=element_text(face='bold',vjust=2,size=25))+
theme(legend.title=element_text(vjust=5,size=15))+
geom_bar(stat="identity",colour="black")+
facet_grid(~Habitat,labeller=label_both,scales="free_x") +
geom_text(data = textlabels, aes(x = Transect_ID, y = value, vjust = -0.5,label = TotalPreds))
You might have to play around with different values for vjust to get the labels where you want them.
See the geom_text help page for more info.
Hope that edit works with your data.

Why do geom_line() and geom_freqpoly() give back different graphs?

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

Plotting multiple lines with ggplot2 in R

I have data from some psychophysical experiments that I'd like to plot. My dataframe contains multiple observations from multiple participants in three paradigms of an experiment.
In other words, each participant took part in three psychophysical experiments and I'd like to plot the data on a single graph.
At present, my plot looks like this:
The data on the right of the plot are from one of the experiments (1), whilst the mass of data on the left are from the two other experiments (2 & 3). Essentially, I'm trying to show graphically that experiment 1 yields very different results to experiments 2 & 3.
This plot is of two parameters, 'probability_seen' and 'visual_acuity'. My dataframe also contains two other columns: subject_initials and experiment_type. As you can see, I'm separating out the subjects by colour. I'd also like to join the lines up for each of the experiments (the above plot actually contains three curves for each subject), but if I add geom_line() to my plot, I get this:
Obviously, I haven't asked ggplot2 to respect the state of 'experiment_type'. How do I do this?
n.b. I currently call the plot with the following code:
qplot(visual_acuity, probability_seen, data = dframe1, colour = subject_initials,
xlab = "Visual acuity", ylab = "Probability seen") + geom_line()
As #Baptiste has stated, the solution is to add group = experiment_type to the qplot call.

3 by 2 plot in ggplot2

I have a dataset with the following variables:
condition: 1,2,3
type: friend, foe
Proportion_Choosing_Message: represents the number of participants choosing a particular response
Optimal: the optimal probability of choosing each case
I would like create a 3 by 2 plot, where the two columns represents type and the rows represent condition.
SO I would like to have separate plots for:
type:friend & condition 1, type:friend&condition2, type:friend&condition3
type:foe & condition1, type:foe&condition2, type:foe&condition3
The values to be plotted are Proportion_Choosing_Message and Optimal
Here's the dataset: http://dl.dropbox.com/u/22681355/ggplot.csv
Have looked at the documentation and example on Hadley's site? Have you read through the first few chapters of his book? I ask, because this is a very basic question that is easily answered from even a minimal amount of effort with the documentation.
Here's some code for your example, but in the future, I suggest you do more research before turning to SO for help.
dat <- read.csv("ggplot.csv")
ggplot(dat, aes(x = Optimal, y = Proportion_Choosing_Message)) +
facet_grid(condition~type) +
geom_point()

Resources