Plot Bar Chart in R - r

I have some data. for example there are two columns. First column data is continuous. second column value is binary value(t|f). I want to plot this in a bar chart in R language. In the first column, I want group the numbers into category like 0-100, 101-200,..... then i want to plot number of t's in y axis. I have used ggplot2 in R. But i am not clear with how to group these x axis data.
1 123 t
2 145 t
3 222 t
4 345 f
5 455 t
6 567 t
7 245 t
8 300 t
9 150 t
10 600 t
11 333 t

First, here's your sample data in a data.frame
dd<-structure(list(V1 = 1:11, V2 = c(123L, 145L, 222L, 345L, 455L,
567L, 245L, 300L, 150L, 600L, 333L), V3 = structure(c(2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("f", "t"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -11L))
Here's a strategy for plotting
ggplot(dd, aes(x=cut(V2, breaks=c(0,1:9*100)), weight=as.numeric(V3=="t"))) +
geom_bar(stat="bin") + xlab("value")
We define x and weights in the aes(). We use cut() to break up with numbers into ranges. Then we use weights to turn each value to a zero/one value that will be added together in the bins.

Related

How to create a new dataset with aggregated values by month in r? [duplicate]

This question already has answers here:
R group by multiple columns and mean value per each group based on different column
(2 answers)
Closed 2 years ago.
My data set "data1" somewhat looks like this
Price class
243 1
32 2
45 3
245 1
67 2
343 3
567 1
.
.
and so on, in class column 1,2,3 repeats itself continuously till the end of data (298 observations).
I want to aggregate it, such that I get the mean of each class. The data should look like. The data should be on a new dataset "classdata"
class column_name
1 mean of all class 1 prices
2 mean of all class 2 prices
3 mean of all class 3 prices
I tried this code
classdata = aggregate(x=data1$Price, by=list(data1$class), FUN="mean")
But I am not getting the desired result. Please help.
You probably want proper column names. To get them also put x= into a list, and name the lists in both arguments.
aggregate(x=list(column_name=data1$Price), by=list(class=data1$class), FUN="mean")
# class column_name
# 1 1 351.6667
# 2 2 49.5000
# 3 3 194.0000
Data:
data1 <- structure(list(Price = c(243L, 32L, 45L, 245L, 67L, 343L, 567L
), class = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA,
-7L))
Welcome to Stack Overflow. Another option is to use the tidyverse data processing model:
# use the data jay.sf made
data1 <- structure(list(Price = c(243L, 32L, 45L, 245L, 67L, 343L, 567L),
class = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)),
class = "data.frame", row.names = c(NA, -7L))
library(tidyverse)
data1 %>% # start with sample data and pipe it to the next line
group_by(class) %>% # group the data by class and pipe it to the next line
summarise(`The Mean Price` = mean(Price)) # Make a variable called "The
# Mean Price" holding the mean of
# the price variable.

How to aggregate a data frame by columns and rows?

I have the following data set:
Class Total AC Final_Coverage
A 1000 1 55
A 1000 2 66
B 1000 1 77
A 1000 3 88
B 1000 2 99
C 1000 1 11
B 1000 3 12
B 1000 4 13
B 1000 5 22
C 1000 2 33
C 1000 3 44
C 1000 4 55
C 1000 5 102
A 1000 4 105
A 1000 5 109
I would like to get the average of the AC and the Final_Coverage for the first three rows of each class. Then, I want to store the average values along with the class name in a new dataframe. To do that, I did the following:
dataset <- read_csv("/home/ad/Desktop/testt.csv")
classes <- unique(dataset$Class)
new_data <- data.frame(Class = character(0), AC = numeric(0), Coverage = numeric(0))
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage <- coverageMean
new_data$AC <- acMean
}
Everything works fine except entering the average value into the new_data frame. I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "Class", value = "A") :
replacement has 1 row, data has 0
Do you know how to solve this?
This should get you the new dataframe by using dplyr.
dataset %>% group_by(Class) %>% slice(1:3) %>% summarise(AC= mean(AC),
Coverage= mean(Final_Coverage))
In your method the error is that you initiated your new dataframe with 0 rows and try to assign a single value to it. This is reflected by the error. You want to replace one row to a dataframe with 0 rows. This would work, though:
new_data <- data.frame(Class = classes, AC = NA, Coverage = NA)
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage[classes == class] <- coverageMean
new_data$AC[classes == class] <- acMean
}
You could look into aggregate().
> aggregate(df1[df1$AC <= 3, 3:4], by=list(Class=df1[df1$AC <= 3, 1]), FUN=mean)
Class AC Final_Coverage
1 A 2 69.66667
2 B 2 62.66667
3 C 2 29.33333
DATA
df1 <- structure(list(Class = structure(c(1L, 1L, 2L, 1L, 2L, 3L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"),
Total = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L),
AC = c(1L, 2L, 1L, 3L, 2L, 1L, 3L, 4L, 5L, 2L, 3L, 4L, 5L,
4L, 5L), Final_Coverage = c(55L, 66L, 77L, 88L, 99L, 11L,
12L, 13L, 22L, 33L, 44L, 55L, 102L, 105L, 109L)), class = "data.frame", row.names = c(NA,
-15L))

Replacing loop in dplyr R

So I am trying to program function with dplyr withou loop and here is something I do not know how to do
Say we have tv stations (x,y,z) and months (2,3). If I group by this say we get
this output also with summarised numeric value
TV months value
x 2 52
y 2 87
z 2 65
x 3 180
y 3 36
z 3 99
This is for evaluated Brand.
Then I will have many Brands I need to filter to get only those which get value >=0.8*value of evaluated brand & <=1.2*value of evaluated brand
So for example from this down I would only want to filter first two, and this should be done for all months&TV combinations
brand TV MONTH value
sdg x 2 60
sdfg x 2 55
shs x 2 120
sdg x 2 11
sdga x 2 5000
As #akrun said, you need to use a combination of merging and subsetting. Here's a base R solution.
m <- merge(df, data, by.x=c("TV", "MONTH"), by.y=c("TV", "months"))
m[m$value.x >= m$value.y*0.8 & m$value.x <= m$value.y*1.2,][,-5]
# TV MONTH brand value.x
#1 x 2 sdg 60
#2 x 2 sdfg 55
Data
data <- structure(list(TV = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("x",
"y", "z"), class = "factor"), months = c(2L, 2L, 2L, 3L, 3L,
3L), value = c(52L, 87L, 65L, 180L, 36L, 99L)), .Names = c("TV",
"months", "value"), class = "data.frame", row.names = c(NA, -6L
))
df <- structure(list(brand = structure(c(2L, 1L, 4L, 2L, 3L), .Label = c("sdfg",
"sdg", "sdga", "shs"), class = "factor"), TV = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "x", class = "factor"), MONTH = c(2L,
2L, 2L, 2L, 2L), value = c(60L, 55L, 120L, 11L, 5000L)), .Names = c("brand",
"TV", "MONTH", "value"), class = "data.frame", row.names = c(NA,
-5L))

Interaction plot in R

I want from the following dataset:
ID Result Days Position
1 70 0 1
1 80 23 1
2 90 15 2
2 89 30 2
2 99 40 2
3 23 24 1
etc...
To make 2 spaghetti plots: 1 for those who are in position 1 and one for those in position 2. I tried a "for & if" loop, but I just got the mixed plot many times. Also I am using ggplot.
dfPr <- df[df$Progress==1]
x11()
ggplot(dfPr, aes(x=OrderToFirstBx, y=result.num, color=factor(MRN))) +
geom_line() + theme_bw() + xlab("Time in Days") + ylab("ALT")
This worked! But if you have another solution please tell me.
Thank you.
You gave such limited example data, and your sample code doesn't seem to match the variable names in your sample data which make it very hard to tell exactly what you wanted.
If you want two separate plots, using facets might be the easiest. Try
#sample data
dfPr <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L, 3L), Result = c(70L,
80L, 90L, 89L, 99L, 23L), Days = c(0L, 23L, 15L, 30L, 40L, 24L
), Position = c(1L, 1L, 2L, 2L, 2L, 1L)), .Names = c("ID", "Result",
"Days", "Position"), class = "data.frame", row.names = c(NA,
-6L))
ggplot(dfPr, aes(x=Days, y=Result, group=ID)) +
geom_line() + facet_wrap(~Position)

starred bar chart

I'm trying to make a simple bar chart that first, distinguishes between two groups say based on sex, male or female, and then after stats, for each sample/ individual, there is a P-value, significant or not. I know how to color code the bars between male and female, but I want R to automatically put a star above each sample/ individual who has a P-value less than 0.05 say.
I'm currently just using the simple barplot(x) function.
I've tried to look around for answers but haven't found anything for this yet.
Below is is a link to my example data set:
[url=http://www.divshare.com/download/22797284-187]DivShare File - test.csv[/url]
I'd like to put the time on the y axis, color code the bars to distinguish between Male and Female, and then for individuals in either group who has a 1 under significance, put a star above their corresponding bar.
Thanks for any suggestions in advance.
I messed with your data a bit to make it friendlier:
## dput(read.csv("barcharttest.csv"))
x <- structure(list(ID = 1:7,
sex = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 2L), .Label = c("female", "male"),
class = "factor"),
val = c(309L, 192L, 384L, 27L, 28L, 245L, 183L),
stat = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("NS", "sig"),
class = "factor")),
.Names = c("ID", "sex", "val", "stat"),
class = "data.frame", row.names = c(NA, -7L))
Which looks like this:
ID sex val stat
1 1 female 309 NS
2 2 female 192 sig
3 3 female 384 sig
4 4 male 27 NS
5 5 male 28 sig
6 6 female 245 NS
7 7 male 183 NS
Now the plot:
sexcols <- c("pink","blue")
## png("barplot.png") ## for output graph
par(las=1,bty="l") ## I prefer these settings; see ?par
b <- with(x,barplot(val,col=sexcols[sex])) ## b saves x coords of bars
legend("topright",levels(x$sex),fill=sexcols,bty="n")
## use xpd=NA to make sure that star on tallest bar doesn't get clipped;
## pos=3 puts the text above the (x,y) location specified
text(b,x$val,ifelse(x$stat=="sig","*",""),pos=3,cex=2,xpd=NA)
axis(side=1,at=b,label=x$ID)
## dev.off()
I should also add "Time" and "ID" labels on the relevant axes.

Resources