I have the following data (sample below):
Participant Group Choice
1 Control 0
2 Control 0
3 Control 0
4 Stress 1
5 Stress 1
6 Stress 1
I want to create a bar graph depicting the frequencies of Choice (0 or 1) for Group (Stress VS Control).
Make a table and use barplot which comes with R.
barplot(with(dat, table(Choice, Group)), main="My plot", beside=T, col=2:3)
Data:
(Forgive me that I chose slightly more interesting data :)
dat <- structure(list(Participant = 1:6, Group = c("Control", "Control",
"Control", "Stress", "Stress", "Stress"), Choice = c(0L, 1L,
0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -6L
))
You can use count to count the frequencies, convert the variables to factor and plot.
library(dplyr)
library(ggplot2)
df %>%
count(Group, Choice) %>%
mutate(Choice = factor(Choice), Group = factor(Group)) %>%
ggplot() + aes(Group, n, fill = Choice) + geom_col()
Related
Hi I am relatively new to R. I am struggling with what seems like it should be a relatively simple task- I am trying to make a frequency histogram using ggplot2 from a subset of data from a longer dataframe.
Here is an example of the data structure us in the picture attached
https://i.stack.imgur.com/HIwQv.png
The data is from a survey where 0 means not selected and 1 means it was selected. There are numeric in the original dataset I want a histogram of the frequency in which each variable was selected. The column variables on the x-axis and frequency counts on the y-axis. I have various subsets like this within a dataframe and I would like each to subset to how their own graph.
I first subset the columns of interest
new dataset <-subset(df, select = c(WAB_R, WAB_B, BDAE, PNT))
When I checked the class it was dataframe and no longer numeric
I tried to use as.numeric to convert it back to a numeric, but with no luck
I could use some guidance in how to structure the data to then obtain a histogram.
Thanks Carla
Maybe try this approach using tidyverse functions. You have to reshape to long selecting the desired variables. Here the code using ggplot2 for the final plot:
library(tidyverse)
#Code 1
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=value))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')
Output:
Or this:
#Code 2
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=factor(value)))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')+xlab('value')
Output:
Some data used:
#Data
df <- structure(list(ID = 1:4, WAB_R = c(0L, 1L, 0L, 1L), WAB_B = c(0L,
1L, 0L, 0L), BDAE = c(0L, 0L, 0L, 1L), PNT = c(0L, 0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -4L))
I'm trying to graph multiple dataframe columns in R.
(like this-> Graphing multiple variables in R)
bid ask date
1 20.12 20.14 2014-10-31
2 20.09 20.12 2014-11-03
3 20.03 20.06 2014-11-04
4 19.86 19.89 2014-11-05
This is my data.
And I can make one line graph like this.
`data%>% select(bid,ask,date) %>% hchart(type='line', hcaes(x='date', y='bid'))`
I want to add ask line graph in this graph.
One way is to reshape (gather) the values to plot and then add a group aesthetic to the hchart function:
library(tidyr)
data %>% select(bid,ask,date) %>%
gather("key", "value", bid, ask) %>%
hchart(type='line', hcaes(x='date', y='value', group='key'))
ps. Don't forget to load all the necessary libraries
You can use the following code
library(reshape2)
library(highcharter)
df_m <- melt(df, id="date")
hchart(df_m, "line", hcaes(x = date, y = value, group = variable))
Here is the data
df = structure(list(bid = c(20.12, 20.09, 20.03, 19.86), ask = c(20.14,
20.12, 20.06, 19.89), date = structure(c(4L, 1L, 2L, 3L), .Label = c("03/11/2014",
"04/11/2014", "05/11/2014", "31/10/2014"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
I have a dataframe comprising two columns, 'host', and 'date'; which describes a series of cyber attacks against a number of different servers on specific dates over a seven month period.
Here's what the data looks like,
> china_atks %>% head(100)
host date
1 groucho-oregon 2013-03-03
2 groucho-oregon 2013-03-03
...
46 groucho-singapore 2013-03-03
48 groucho-singapore 2013-03-04
...
Where 'groucho-oregon', 'groucho-signapore', etc., is the hostname of the server targeted by an attack.
There are around 190,000 records, spanning 03/03/2013 to 08/09/2013, e.g.
> unique(china_atks$date)
[1] "2013-03-03" "2013-03-04" "2013-03-05" "2013-03-06" "2013-03-07"
"2013-03-08" "2013-03-09"
[8] "2013-03-10" "2013-03-11" "2013-03-12" "2013-03-13" "2013-03-14"
"2013-03-15" "2013-03-16"
[15] "2013-03-17" "2013-03-18" "2013-03-19" "2013-03-20" "2013-03-21"
"2013-03-22" "2013-03-23"
...
I'd like to create a multi-line time series chart that visualises how many attacks each individual server received each day over the range of dates, but I can't figure out how to pass the data to ggplot to achieve this. There are nine unique hostnames, and so the chart would show nine lines.
Thanks!
Here's one way to do this.
First Summarize the count frequency by date.
library(plyr)
df <- plyr::count(da,c("host", "date"))
Then Do the plotting.
ggplot(data=df, aes(x=date, y=freq, group=1)) +
geom_line(aes(color = host))
Data
da <- structure(list(host = structure(1:4, .Label = c("groucho-eu",
"groucho-oregon", "groucho-singapore", "groucho-tokyo"), class = "factor"),
date = structure(c(1L, 1L, 1L, 1L), .Label = "2013-03-03", class = "factor"),
freq = c(1L, 4L, 2L, 1L)), .Names = c("host", "date", "freq"
), row.names = c(NA, -4L), class = "data.frame")
ggplot2 library is capable of performing statistics. Hence, an option could be to let ggplot handle count/frequency. This should draw multiple lines (one for each group)
ggplot(df, aes(x=Date, colour = host, group = host)) + geom_line(stat = "count")
Note: Make sure host is converted to factor to have discrete color for lines.
I have a simple trellis scatterplot. Two panels - male/female. ID is a unique number for each participant. The var1 is a total test time. Mean.values is a vector of two numbers (the means for gender).
No point including a best fit line so what I want is to plot a trend line of the mean in each panel. The two panels have different means, say male = 1 minute, female = 2 minutes.
xyplot(var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values)
})
At the minute the graph is coming out so that both trendlines appear in each panel. I want only one trendline in each.
Does anyone have the way to do this?
I have tried a number of different ways including the long code for function Addline which just doesn't work for me. I just want to define which panel im looking at and i've looked at ?panel.number but not sure how that works as its coming up that I don't have a current row. (current.row(prefix)).
There must be a simple way of doing this?
[EDIT - Here's the actual data i'm using]
I've tried to simplify the DF
library(lattice)
dput(head(DF))
structure(list(ID = 1:6, Var1 = c(2333858, 4220644,
2941774, 2368496, 3165740, 3630300), mean = c(2412976, 2412976,
2412976, 2412976, 2412976, 2412976), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = c(NA, 6L), class = "data.frame")
dput(tail(DF))
structure(list(ID = 161:166, Var1= c(2825246, 3552170,
3688882, 2487760, 3849108, 3085342), mean = c(3689805, 3689805,
3689805, 3689805, 3689805, 3689805), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = 109:114, class = "data.frame")
plot i'm using:
xyplot((Var1/1000) ~ ID|Gender, data=DF,
group = Gender,scales=list(x=list(at=NULL)),
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values) })
causes 2 lines.
[EDIT - This is the code which includes the function Addline & is everywhere on all the posts and doesn't seem to work for me]
addLine<- function(a=NULL, b=NULL, v = NULL, h = NULL, ..., once=F) { tcL <- trellis.currentLayout() k<-0 for(i in 1:nrow(tcL)) for(j in 1:ncol(tcL)) if (tcL[i,j] > 0) { k<-k+1 trellis.focus("panel", j, i, highlight = FALSE) if (once) panel.abline(a=a[k], b=b[k], v=v[k], h=h[k], ...) else panel.abline(a=a,b=b, v=v, h=h, ...) trellis.unfocus() } }
then writing after the trellis plot (mean.values being a vector of two numbers, mean for female, mean for male)
addLine(v=(mean.values), once=TRUE)
Update - I managed to do it in ggplot2.
Make the ggplot using facet_wrap then -
hline.data <- data.frame(z = c(2413, 3690), Gender = c("Female","Male"))
This creates a DF of the two means and the Gender, 2x2 DF
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
This adds the lines to the ggplot.
If you just wanted plot the mean of values you are drawing on the plot aready, you can skip the mean.values variable and just do
xyplot(Var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.abline(h=mean(y))
}
)
With the sample data
DF<-data.frame(
ID=1:10,
Gender=rep(c("M","F"), each=5),
Var1=c(5,6,7,6,5,8,9,10,8,9)
)
this produces
I believe lattice has a specific panel function for this, panel.average().
Try replacing panel.abline(h=mean.values) with panel.average(...).
If that doesn't solve the problem, we might need more information; try using dput() on your data (e.g., dput(DF), or some representative subset).
my data frame:
dput(head(x,3))
structure(list(Programs = structure(1:3, .Label = c("400.perlbench",
"401.bzip2", "403.gcc", "429.mcf", "445.gobmk", "456.hmmer",
"458.sjeng", "462.libquantum", "464.h264ref", "471.omnetpp",
"473.astar", "483.xalancbmk"), class = "factor"), Base_Run_Time = c(988.746037,
1401.357446, 821.134215), Base_Rate = c(790.49624, 550.8944,
784.28104), Base_Geo_Mean = c(837.6709, 837.6709, 837.6709),
Bench_Mark_Run_Time = c(827.236707, 1329.663649, 818.863431
), Peak_Rate = c(944.83232, 580.59792, 786.45592), Chip = structure(c(2L,
2L, 2L), .Label = c("E5_2660", "E7_4860", "ultrasparc"), class = "factor"),
Bench_Geo_Mean = c(790.4498, 790.4498, 790.4498), Percent_Difference = c(0.06,
0.06, 0.06)), .Names = c("Programs", "Base_Run_Time", "Base_Rate",
"Base_Geo_Mean", "Bench_Mark_Run_Time", "Peak_Rate", "Chip",
"Bench_Geo_Mean", "Percent_Difference"), row.names = c(NA, 3L
), class = "data.frame")
I need to create a bar chart where for each group, need to have Base_Run_Time and Bench_Mark_Run_Time side by side, one having orange and the other having blue color.
I have tried something like this, but this gives me a stack chart. I need bar chart for each Base_Run_Time and Bench_Mark_Run_Time:
ggplot(x)+geom_bar(data=x, aes(Programs, Base_Run_Time, colour="orange", group=Chip), stat="identity") + geom_bar(data=x, aes(Programs,Bench_Mark_Run_Time, colour="blue", fill="green", group=Chip), stat="identity")
any ideas?
You should melt your data from wide to long format and then plot your data. With position="dodge" you can set that bars are side by side. From melted data frame use value for the y values and the variable for the fill. With scale_fill_manual() you can get desired colors.
library(reshape2)
xx <- melt(x, id = "Programs",measure.vars = c("Base_Run_Time", "Bench_Mark_Run_Time"))
Programs variable value
1 400.perlbench Base_Run_Time 988.7460
2 401.bzip2 Base_Run_Time 1401.3574
3 403.gcc Base_Run_Time 821.1342
4 400.perlbench Bench_Mark_Run_Time 827.2367
5 401.bzip2 Bench_Mark_Run_Time 1329.6636
6 403.gcc Bench_Mark_Run_Time 818.8634
ggplot(xx,aes(Programs,value,fill=variable))+
geom_bar(stat="identity",position="dodge")+
scale_fill_manual(values=c("orange","blue"))