I am new to R/ggplot2 and am trying to create a line graph of counts (or percentages, it doesn't matter) for responses to 6 stimuli in ggplot in R. The stimuli should go across the x-axis, and the count on the y-axis. One line will represent the number of participants who responded with a preposition, and the other line will represent the number of participants who responded with a number.
I believe that ggplot with geom_line() requires an x and y (where y is the count or percentage).
Should I create a new data frame with count so that I can use ggplot? And then, a subquestion would be how do I count responses based on the stimulus data (so, how do I count response based on another column in the data frame, or how many preposition responses for stimulus 1, how many number responses for stimulus 1, how many preposition responses for stimulus 2, etc. Maybe with some kind of if statement?)?
or
Is there a way to automatically produce these counts in ggplot?
Of course, it's entirely possible that I'm going about this the wrong way entirely.
I've tried to search this, but cannot find anything. Thank you so much.
As I said in my comment, I ended up creating a frequency table and using ggplot to plot the resulting data frame. Here's the code below!
# creates data frame
resp <- c("number", "number", "preposition", "number")
sound <- c(1, 1, 2, 2)
df <- data.frame(resp, sound)
# creates frequency table
freq.table <- prop.table((xtabs(~resp+sound, data=df)), 2)
freq.table.df <- as.data.frame(freq.table)
# plots lines based on frequency
ggplot(freq.table.df, aes(sound, Freq, group=resp, color=resp)) +
geom_line()
Related
What I want to do
My dataset consists of several cases (id) with different outcomes (outcome) for a given number of repeated meaures (cycle). Each cycle should be counted as 1 (val) or be visualized of equal length.
The plot I want to end up with is a stacked bar chart, where each cycle of each case has the same length. The sequence of cycles must be continous. The sequence of the outcomes is dependent on the according cycles.
My Problem
The sample code below produces a bar chart that sums up the cycles (although being a factor). However, using the val column instead of cycle messes with the sequence of the outcomes, which must not change.
# setup
library(ggplot2)
library(dplyr)
set.seed(0)
# test data
data.frame(
cycle=factor(rep(1:8,2),levels=1:8),
val=1,
id=factor(rep(1:2,each=8)),
outcome=factor(paste("Outcome",sample(1:8,16,T)),levels=paste("Outcome",1:8))) %>%
# plot
ggplot(.,aes(id,cycle,fill=outcome))+
geom_bar(stat="identity",position=position_stack(reverse=T),width=0.99)+
coord_flip()
My Question
Is it possible to make cycles count as 1 for each id, keeping the outcome sequence?
Thank you in advance!
The Plots
This is what I get when using the above code:
This is what I get, when using val instead of cycle:
The goal is to keep the outcome sequence, while counting each cycle as 1 or making them appear of the same length for each id.
As far as I get it you could achieve your desired result using geom_tile:
library(ggplot2)
set.seed(0)
dat <- data.frame(
cycle = factor(rep(1:8, 2), levels = 1:8),
val = 1,
id = factor(rep(1:2, each = 8)),
outcome = factor(paste("Outcome", sample(1:8, 16, T)), levels = paste("Outcome", 1:8))
)
ggplot(dat, aes(cycle, id, fill = outcome)) +
geom_tile()
I's like to find a quite efficient way to plot for each participant ($participant_num) the proportion of responses ($resp) every 10 trials ($trial, out of 200 trials per participant).
enter image description here
When I did it for a subset of my sample (only 30 participants) I used a very rudimental code, for which I had first created a separate dataframe for each subject:
whichSubject<-6 # Which subject do want to analyse?
sData<-filter(banditData,subject==whichSubject)
and then I tried to get proportions for each 10 trials and put them in a separate column
sData$newcolumn <- NULL
sData$newcolumn1_10<- table(sData[1:10,]$resp)/length(sData[1:10,]$resp)
sData$newcolumn11_20<- table(sData[11:20,]$resp)/length(sData[11:20,]$resp)
sData$newcolumn21_30<- table(sData[21:30,]$resp)/length(sData[21:30,]$resp)
and so on for all the 200 trials and separately for each subject.. Then, I reshaped the dataframe as long and plotted it with the following script:
ggplot()+
geom_line(data=rewardDF,aes(x=Trial,y=pHappy,colour=Bandit), linetype="dashed", size=1.03)+
geom_point(data=longdf,aes(x=trial, y=resp_prop,colour=bandit,shape=bandit),size=3)+
geom_line(data=longdf,aes(x=trial, y=resp_prop,colour=bandit),size=1)+
scale_shape_manual(values=SymTypes)+
scale_colour_manual(values=cbPalette)+
labs(col='bandit',y='p(choice)',x='trials')+
scale_x_continuous(breaks = seq(0,200,by=10), limits=c(0,203), expand=(c(0,0)))+
scale_y_continuous(breaks = seq(0,1,by=0.1), limits=c(0,1.03), expand=(c(0.02,0)))+
theme_bw()+
ggsave(paste(c("data/S",whichSubject,"p(choice_absorangeblue).png"),collapse=""), scale=2,dpi = 300)
The output was something like this. Each dot represented how many times a participant selected left (resp=0) vs right (resp=1) in 10 trials (e.g., if the participant selected left 3 times out of 10 the dot for left, which corresponded to arm 1 in a task where you were asked to select between two arms, would be presented on the y axis at 0.3 and conversly the dot for right at 0.7)
enter image description here
However, now I have over 200 participants and it is definitely too time consuming using this approach!
I was thinking of using something to add facet_grid(participant_num ~ .)+ to my ggplot code in order to code each participant separately without the need of sub selecting.. However, I haven't found a solution on how to plot the proportion of choices without having to calculate them separately. Do you have any tip on how I could do this within ggplot?
Many thanks in advance for your help!!
I'm struggling to get the exact output needed for a ggplot line graph. As an example, see the code below. Overall, I have two conditions (A/B), and two treatments (C/D). So four total series, but in a factorial way. The lines can be viewed as a time series but with ordinal markings (rather than numeric).
I'd like to generate a connected line graph for the four types, where the color depends on the condition, and the line type depends on the treatment. Thus two different colors and two line types. To make things a bit more complicated, one condition (B) does not have data for the third time period.
I cannot seem to generate the graph needed for these constraints. The closest I got is shown below. What am I doing wrong? I try to remove the group=condition code, but that doesn't help either.
library(ggplot2)
set.seed<-1
example_df <- data.frame(time = c('time1','time2','time3','time1','time2','time3','time1','time2','time1','time2'),
time_order = c(1,2,3,1,2,3,1,2,1,2),
condition = c('A','A','A','A','A','A','B','B','B','B'),
treatment = c('C','C','C','D','D','D','C','C','D','D'),
value = runif(10))
ggplot(example_df, aes(x=reorder(time,time_order), y=value, color=condition , line_type=treatment, group=condition)) +
geom_line()
You've got 3 problems, from what I can tell.
linetype doesn't have an underscore in it.
With a categorical axis, you need to use the group aesthetic to set which lines get connected. You've made a start with group = condition, but this would imply one line for each condition type (2 lines), but you want one line for each condition:treatment interaction (2 * 2 = 4 lines), so you need group = interaction(condition, treatment).
Your sample data doesn't quite make sense. Your condition B values have two treatment Cs at time 1 and two Ds at time 2, so there is no connection between times 1 and 2. This doesn't much matter, and your real data is probably fine.
This should work:
ggplot(
example_df,
aes(
x = reorder(time, time_order),
y = value,
color = condition,
linetype = treatment,
group = interaction(condition, treatment)
)
) +
geom_line()
I have a huge data frame consisting of binary values (extract):
id,topic,w_hello,w_apple,w_tomato
1,politics,1,1,0
2,sport,0,1,0
3,politics,1,0,1
With:
barplot(col_prefix_matrix)
I plot the number of their occurrences:
As there are many columns, the plot looks very confusing.
Would it be possible to plot only those columns with a specific threshold, say 5, to make it look more clear?
I want to visualize many time series at once. I am new at R, and have spent about 6 hours searching the web and reading about how to tackle this relatively simple problem. My dataset has five time points arranged as rows, and 100 columns. I can easily plot any column against the time points with qplot(time, var2, geom="line"). But I want to learn how to do this for a flexible number of columns, and how to print 6 to 12 of the individual graphs on one page.
Here I learned about the multiplot function, got that to work in terms of layout.
What I am stuck on is how for get the list of variables into a FOR statement so I can have one statement to plot all the variables against the same five time points.
this is what I am playing with. It makes 9 plots, 3 columns wide, but I do not know how to get all my variables into the array for yvars?
for (i in 1:9) {
p1 = qplot(symbol,yvar, geom ="smooth", main = i))
plots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = plots, cols = 3)
Stupidly on my part right now it makes 9 identical plots. So how do I create the list so the above will cycle through all my columns and make those plots?
first melt all your data using the reshape2 package
datm <- melt(your.original.data.frame, id = "time")
Now plot it using facets:
qplot(time, value, data = datm, facets= variable ~ ., geom="point")
Let me know if this works. If you could, please upload your data, it would help tremendously.