3 by 2 plot in ggplot2 - r

I have a dataset with the following variables:
condition: 1,2,3
type: friend, foe
Proportion_Choosing_Message: represents the number of participants choosing a particular response
Optimal: the optimal probability of choosing each case
I would like create a 3 by 2 plot, where the two columns represents type and the rows represent condition.
SO I would like to have separate plots for:
type:friend & condition 1, type:friend&condition2, type:friend&condition3
type:foe & condition1, type:foe&condition2, type:foe&condition3
The values to be plotted are Proportion_Choosing_Message and Optimal
Here's the dataset: http://dl.dropbox.com/u/22681355/ggplot.csv

Have looked at the documentation and example on Hadley's site? Have you read through the first few chapters of his book? I ask, because this is a very basic question that is easily answered from even a minimal amount of effort with the documentation.
Here's some code for your example, but in the future, I suggest you do more research before turning to SO for help.
dat <- read.csv("ggplot.csv")
ggplot(dat, aes(x = Optimal, y = Proportion_Choosing_Message)) +
facet_grid(condition~type) +
geom_point()

Related

Plotting proportions of choices of each participant separately

I's like to find a quite efficient way to plot for each participant ($participant_num) the proportion of responses ($resp) every 10 trials ($trial, out of 200 trials per participant).
enter image description here
When I did it for a subset of my sample (only 30 participants) I used a very rudimental code, for which I had first created a separate dataframe for each subject:
whichSubject<-6 # Which subject do want to analyse?
sData<-filter(banditData,subject==whichSubject)
and then I tried to get proportions for each 10 trials and put them in a separate column
sData$newcolumn <- NULL
sData$newcolumn1_10<- table(sData[1:10,]$resp)/length(sData[1:10,]$resp)
sData$newcolumn11_20<- table(sData[11:20,]$resp)/length(sData[11:20,]$resp)
sData$newcolumn21_30<- table(sData[21:30,]$resp)/length(sData[21:30,]$resp)
and so on for all the 200 trials and separately for each subject.. Then, I reshaped the dataframe as long and plotted it with the following script:
ggplot()+
geom_line(data=rewardDF,aes(x=Trial,y=pHappy,colour=Bandit), linetype="dashed", size=1.03)+
geom_point(data=longdf,aes(x=trial, y=resp_prop,colour=bandit,shape=bandit),size=3)+
geom_line(data=longdf,aes(x=trial, y=resp_prop,colour=bandit),size=1)+
scale_shape_manual(values=SymTypes)+
scale_colour_manual(values=cbPalette)+
labs(col='bandit',y='p(choice)',x='trials')+
scale_x_continuous(breaks = seq(0,200,by=10), limits=c(0,203), expand=(c(0,0)))+
scale_y_continuous(breaks = seq(0,1,by=0.1), limits=c(0,1.03), expand=(c(0.02,0)))+
theme_bw()+
ggsave(paste(c("data/S",whichSubject,"p(choice_absorangeblue).png"),collapse=""), scale=2,dpi = 300)
The output was something like this. Each dot represented how many times a participant selected left (resp=0) vs right (resp=1) in 10 trials (e.g., if the participant selected left 3 times out of 10 the dot for left, which corresponded to arm 1 in a task where you were asked to select between two arms, would be presented on the y axis at 0.3 and conversly the dot for right at 0.7)
enter image description here
However, now I have over 200 participants and it is definitely too time consuming using this approach!
I was thinking of using something to add facet_grid(participant_num ~ .)+ to my ggplot code in order to code each participant separately without the need of sub selecting.. However, I haven't found a solution on how to plot the proportion of choices without having to calculate them separately. Do you have any tip on how I could do this within ggplot?
Many thanks in advance for your help!!

How do you create a vertical bar plot that contains a conditional if statement in R?

I have a data set that is measuring emotions of respondents as they are shown different stimuli. Here is an example:
Sample Attribute Score Rank
A Delighted 180 High
A Happy 200 High
A Tired 130 Medium
B Delighted 160 Medium
B Happy 128 Low
B Tired 115 Low
I am fairly new to R, and I'm having issues actually making a bar chart that only shows sample A. This is what I tried doing:
ggplot(data =
filter(DATA, Category == "A"),
mapping = aes(x = Score, y = Attribute)) +
geom_
But R gives me this error:
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class mts/ts.
Ideally, I am trying to get a bar chart that has the attributes listed on the vertical axis, the scores on the x-axis, and only shows sample A, with the bars color coded by Rank. Any help would be great!
Welcome to SO. Starting with the ggplot2 reference site and working through the examples there will help you get oriented to working with your data and plotting it in ggplot2. That said, here are some suggestions to get you started.
First, as indicated by the error, you need a dataframe, rather than whatever format your data is currently stored in. Based on what you gave me, this is how I got a data.frame, but you might use something else.
t <- "Sample Attribute Score Rank
A Delighted 180 High
A Happy 200 High
A Tired 130 Medium
B Delighted 160 Medium
B Happy 128 Low
B Tired 115 Low"
df <- read.table(text=t, header=T)
## this might be more relevant for your current situation
df <- as.data.frame(DATA)
Try not to use "data" as an object name, as it is already a named function.
Second, if you're learning R, it's probably wise to separate steps in your analysis to ensure you understand what each part is doing. So next do the subset to get just Sample A.
library(tidyverse)
df.sub <- df %>%
filter(Sample=="A")
Now it'll be easier to do the plotting. Your plotting code looked like it was on the right track, but you didn't complete the line.
library(ggplot2)
ggplot(df.sub, aes(x=Score, y=Attribute, fill=Rank)) +
geom_bar(stat="identity")
You'll need to specify stat="identity" to tell ggplot that you want it to recognize the values you provide as data, rather than generating counts (as for a histogram).
This should get you what you are looking for.

Wrong Number of Groups in Choropleth Using sf, ggplot, and cut_interval()

I am trying to control the number of categories in a choropleth using sf, ggplot2 and cut_interval() within ggplot2. Sometimes it works but with some datasets the number of categories is out by 1. Below is my code and the input dataset (7Kb) is here:
ggplot-test-04.geojson
library(sf)
library(ggplot2)
lga.sf <- st_read("ggplot-test-04.geojson")
ggplot() +
geom_sf(data = lga.sf,
aes(fill = cut_interval(value,5))) +
scale_fill_brewer(palette = "RdYlBu",
name = "Legend" )
I am trying to get 5 groups but the result has 4:
In some datasets this code works fine. Sometimes I can work around the issue by choosing say n=6 in cut_interval() to get 5 groups. However, I find that frequently I cannot control the number of groups in the choropleth, which is critical for me. So far I cannot tell if my data has a problem, my code, or there is a software bug.
In this case, cut_interval() is performing correctly, but there are zero observations in one of the cuts, so ggplot() ignores it.
(It happens to be the middle interval - you can actually see the gap in coverage between the second and third legend items.)
You can verify this by looking at cut_interval() with table():
table(cut_interval(lga.sf$value, 5))
[1.44,1.61] (1.61,1.78] (1.78,1.95] (1.95,2.11] (2.11,2.28]
3 7 0 1 1

boxplot single scalar variable "by" multiple true/false variables in r data

I've been limping my way around r data for a few months now. Sorry if any of this seems basic. I've been finding all kinds of close problems and solutions, but somehow I can't seem to adapt them to my situation. Starting to wonder if it's something I should be trying to do at all, but I suppose it can't hurt to ask.
I have a data frame that has a single scalar variable, and multiple T/F (yes/no; 1/0, 1/2) variables. like this:
scal var1 var2 var3
25 0 1 0
21 0 1 1
14 1 1 0
30 1 0 1
I know I can make a boxplot which separates the scalar variable column into categories using "by" for a single variable, like so:
boxplot(df$scal~df$var1)
I also know that I can make box plots for multiple scalar variables at once. I'd like to combine the two somehow to make a boxplot which can plot the dependent variable of each "true" subset and "false" subset of each variable next to one another. In my world, one solution should look something like "boxplot(df$scal~df$var1, df$scal~df$var2, df$scal~df$var3)", but r data doesn't agree with me. something about not being able to force a datatype.
I could also write a rough loop to go through each of the variables and generate all the plots separately, but I'd like to compare them side-by-side.
I've also thought to rearrange the dataset such that the "true" and "false" sets are in different columns (using subset(df$var1, df$var1==1) etc.), then making multiple boxplots as described before. (though this is quite tedius)
var1t var1f var2t var2f var3t var3f
14 25 25 30 21 25
30 21 21 30 14
14
boxplot(df2$var1t, df2$var1f, df2$var2t, df2$var2f, df2$var3t, df2$var3f)
However, the different lengths(number of rows) of the columns is giving me fits when creating the new dataset. I know that I can make a dataset in another program (saved as .csv, .xls, etc.) then import it. The null values would remain intact, but I'd really rather not do this manually. As one might imagine, this becomes quite tedious and prone to errors on larger scales.
Help with either approach would be most welcome.
Learning how to manipulate data in R can be hard when you're starting out. I agree with with #jentjr that learning ggplot2 would be helpful and Hadley's book provides great tips for working with data in addition to covering ggplot2.
To start off, I would suggest using the reshape2 Package to melt your data:
(I created a dummy set so it would be easier for other people to follow along)
library(reshape2)
nObs = 10
df = data.frame(
scal = rnorm(nObs),
var1 = rbinom(nObs, 1, 0.5),
var2 = rbinom(nObs, 1, 0.5),
var3 = rbinom(nObs, 1, 0.5))
Then `melt' the data into long form from wide form.
df2 = melt(df, id.vars = c('scal'),
variable.name ='myVars', value.name = "zeroOne")
Now you may create your desired boxplot using base R:
However, investing the time to learn ggplot2 would allow you to create figures such as this one:
Using code such as this:
library(ggplot2)
ggplot(data = df2, aes(x = zeroOne, y = scal)) +
geom_boxplot(aes(fill = myVars))
Note ggplot2 can make much fancier plots than this (and do so more easily than base R!) and I would encourage you to browse the ggplot2 webpage to see more examples. You may also wish to experiment with swappingzeroOne and myVars because it changes the plot groupings.
Plotluck is a library based on ggplot2 that aims at automating the choice of plot type based on characteristics of 1-3 variables. Here is an example with the resulting plot:
nObs = 100
df = data.frame(
scal = rnorm(nObs),
var1 = rbinom(nObs, 1, 0.5),
var2 = rbinom(nObs, 1, 0.5),
var3 = rbinom(nObs, 1, 0.5))
plotluck.multi(df, y=scal, opts=plotluck.options(use.geom.violin=F))
This command means: Plot column scal (on the y-axis) against each other column in df (on the x-axis; including itself, resulting in a density or histogram). We specify use.geom.violin=F to enforce a box plot, since the default is a violin plot, which can often convey better the shape of the distribution. If the number of rows is very low, individual points will be plotted.

How can I create a (100%) stacked histogram in R?

My dataset:
I have data in the following format (here, imported from a CSV file). You can find an example dataset as CSV here.
PAIR PREFERENCE
1 5
1 3
1 2
2 4
2 1
2 3
… and so on. In total, there are 19 pairs, and the PREFERENCE ranges from 1 to 5, as discrete values.
What I'm trying to achieve:
What I need is a stacked histogram, e.g. a 100% high column, for each pair, indicating the distribution of the PREFERENCE values.
Something similar to the "100% stacked columns" in Excel, or (although not quite the same, a so-called "mosaic plot"):
What I tried:
I figured it'd be easiest using ggplot2, but I don't even know where to start. I know I can create a simple bar chart with something like:
ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")
… that however doesn't get me very far. So I tried this, and it gets me somewhat closer to what I'm trying to achieve, but it still uses the count of PREFERENCE, I suppose? Note the ylab being "count" here, and the values ranging to 19.
qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))
Results in:
So, what do I have to do to get the stacked bars to represent a histogram?
Or do they actually do this already?
If so, what do I have to change to get the labels right (e.g. have percentages instead of the "count")?
By the way, this is not really related to this question, and only marginally related to this (i.e. probably same idea, but not continuous values, instead grouped into bars).
Maybe you want something like this:
ggplot() +
geom_bar(data = dat,
aes(x = factor(PAIR),fill = factor(PREFERENCE)),
position = "fill")
where I've read your data into dat. This outputs something like this:
The y label is still "count", but you can change that manually by adding:
+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")

Resources