Im trying to make a barplot with the following data
Dept
Admit A B C D E F
Admitted 601 370 322 269 147 46
Rejected 332 215 596 523 437 668
and I have tried the following code:
admission_department <- barplot(biasUCB_d, main="Admit by deparment",
xlab="biasUCB_d[['Dept']]",
col=c("darkblue","red"),
legend = rownames(biasUCB_d[['Dept']]),
beside=TRUE)
The name of the coding used to create the dataset is:
biasUCB_d <- margin.table(UCBAdmissions, c(1,3))
What am I doing wrong?
Assuming that Dept is an element of a list this should work:
Data:
biasUCB_d <- list(Dept = read.table(header=T, text='
Admit A B C D E F
Admitted 601 370 322 269 147 46
Rejected 332 215 596 523 437 668'))
Solution:
dmission_department <- barplot(as.matrix(biasUCB_d$Dept[2:7]), main="Admit by deparment",
xlab="biasUCB_d[['Dept']]",
col=c("darkblue","red"),
legend = biasUCB_d$Dept$Admit,
beside=TRUE)
Output:
Try:
admission_department <- barplot(biasUCB_d, main="Admit by deparment",
xlab="Department",
col=c("darkblue","red"),
legend.text = rownames(biasUCB_d),
beside=TRUE)
Related
How can I pseudo-table() two variables but fill with values from third column/ separate list?
Example:
library(ggplot2) # diamonds data
data(diamonds)
T.matrix <- with(diamonds, table(color, clarity))
Produces:
clarity
color I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
D 42 1370 2083 1697 705 553 252 73
E 102 1713 2426 2470 1281 991 656 158
F 143 1609 2131 2201 1364 975 734 385
G 150 1548 1976 2347 2148 1443 999 681
H 162 1563 2275 1643 1169 608 585 299
I 92 912 1424 1169 962 365 355 143
J 50 479 750 731 542 131 74 51
I want a similar table with color by clarity except with fill = reference$value instead of table()'s count
reference <- expand.grid(clarity = c("I1", "SI2", "SI1", "VS2", "VS1","VVS2", "VVS1", "IF"),
color = c("D", "E", "F", "G", "H", "I", "J"))
reference$value <- 1:56
So: [D,I1] would have a value of 1, [SI1, D] = 2, [VS2, H] = 36, etc.
Try tapply:
tapply(diamonds$price, list(diamonds$color, diamonds$clarity), mean)
tapply takes your desired variable, groups it by the list of variables to group by, then applies the last function. The table output is maybe not so useful, depending on your desired use.
If you want your data in a more usable format, you might want to use dplyr:
library(dplyr)
diamonds %>% group_by(clarity, color) %>%
summarise(mean(price))
Edit: It is the same!
tapply(reference$value, list(reference$color, reference$clarity), FUN = sum)
you need the fun or tapply collapses the output
I want to build this matrix
What I tried
table <- matrix(c(163,224,312,314,303,175,119,662,933,909,871,702,522,307,1513,2400,2164,2299,1824,1204,678,1603,2337,2331,2924,2360,1428,808,2834,3903,3826,4884,3115,2093,89), nrow=5, ncol=7, byrow=T)
rownames(table) <- c("Fair", "Good", "Very Good", "Premium", "Ideal")
colnames(table) <- c("D", "E", "F", "G", "H", "I", "J")
but the result is this:
and my question is how to add the color and cut labels
Here, dimnames(table) is a 'list'. In the original matrix 'table', the list elements are not named. We can use names to change the names of the list from 'NULL' to the preferred one.
names(dimnames(table)) <- c('cut', 'color')
table
# color
# cut D E F G H I J
# Fair 163 224 312 314 303 175 119
# Good 662 933 909 871 702 522 307
# Very Good 1513 2400 2164 2299 1824 1204 678
# Premium 1603 2337 2331 2924 2360 1428 808
# Ideal 2834 3903 3826 4884 3115 2093 89
NOTE: table is an R function, so it is better to name the object a different name.
I'm setting up a script to extract the thickness and voltages from a single column text file and perform a Weibull distribution on it. When I try to use fitdistr() I get an error stating "'x' must be a non-empty numeric vector". R is supposed to interpret numbers in text files as numeric but that doesn't seem to be happening. Any thoughts?
filename <- "SampleBreakdownSet.txt"
d <- read.table(filename, header = FALSE, sep = "")
#Extract thickness from the dataset; set to variable t
t = d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV = tail(d,(nrow(d)-1))
#Calculates the breakdown field from the thickness and BDV
BDF = (BDV*10000) / t
#Calculates the Weibull parameters from the input breakdown voltages.
fitdistr(BDF, densfun ="weibull", lower = 0)
fitdistr(BDF, densfun ="weibull", lower = 0)
Error in fitdistr(BDF, densfun = "weibull", lower = 0) :
'x' must be a non-empty numeric vector
Sample data I'm using:
2
200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243
You are passing a data.frame to fitdistr, but you should be passing the vector itself.
Try this:
d <- read.table(text='200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243', header=FALSE)
t <- d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV <- d[-1, 1]
BDF <- (BDV*10000) / t
library(MASS)
fitdistr(BDF, densfun ="weibull", lower = 0)
You could also refer to the relevant column when calling fitdistr, e.g.:
fitdistr(BDF$V1, densfun ="weibull", lower = 0)
# shape scale
# 2.745485e+00 1.997509e+04
# (3.716797e-01) (1.283667e+03)
even if I am getting used to R I am still new with it and I hope that someone can help me deal with this task ...I have tried to look for some previous topics but I couldn't find what I was looking for, so here I am hoping for some help.
I am trying to draw my bar plot but I am not having much luck on some of the settings so I hope someone could help. I am using R 3.1.1 on my mac OS 10.9.4.
my table look like this:
family area1 area2 area3 area4 area5 area6
A 15 20 500 200 17 26
B 170 520 26 13 100 70
C 35 250 358 128 88 26
D 95 375 289 156 169 356
E 425 177 136 144 285 70
since I have the file save it as a csv I am doing this steps:
fam <- read.csv ("family_per_area_count.csv", sep =";", header = T)
I am converting the file as a matrix
fam.mat <- as.matrix(fam_1, ncol = 6, byrow = T)
then I assign row names and col names
rownames(fam.mat) <- c("A", "B", "C", "D", "E")
colnames(fam.mat) <- c("area1", "area2", "area3", "area4", "area5", "area6")
then I am simply running the bar plot command as
barplot(fam.mat, beside = T, col = rainbow(ncol(fam.mat)))
but I am missing most of the labels for the x axis and the plot seems to be pressed together.
I also tried to run the cumulative bar plot using this command
par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE)
prop <- prop.table(data_mat, margin = 2)
barplot(data_mat, col = rainbow(length(rownames(data_mat))), width = 3)
legend("topright", inset = c(-0.25, 0), fill = rainbow(length(rownames(data_mat))),
legend = rownames(data_mat))
but the legend colours don't match the data and again my x-axis seems out of center. I have tried to transpose the matrix but still no luck.
Can anyone make any suggestion?
Thank you so much in advance
F.
Here is a start:
DF <- read.table(text="family area1 area2 area3 area4 area5 area6
A 15 20 500 200 17 26
B 170 520 26 13 100 70
C 35 250 358 128 88 26
D 95 375 289 156 169 356
E 425 177 136 144 285 70", header=TRUE)
library(reshape2)
DF <- melt(DF, id.var="family")
library(ggplot2)
ggplot(DF, aes(x=family, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")
Study ggplot2 documentation and tutorials to learn how to customise the plot.
I have a slightly complicated plotting task. I am half way there, quite sure how to get it. I have a dataset of the form below, with multiple subjects, each in either Treatgroup 0 or Treatgroup 1, each subject contributing several rows of data. Each row corresponds to a single timepoint at which there are values in columns "count1, count2, weirdname3, etc.
Task 1. I need to calculate "Days", which is just the visitdate - the startdate, for each row. Should be an apply type function, I guess.
Task 2. I have to make a multiplot figure with one scatterplot for each of the count variables (a plot for count1, one for count2, etc). In each scatterplot, I need to plot the value of the count (y axis) against "Days" (x-axis) and connect the dots for each subject. Subjects in Treatgroup 0 are one color, subjects in treatgroup 1 are another color. Each scatterplot should be labeled with count1, count2 etc as appropriate.
I am trying to use the base plotting function, and have taken the approach of writing a plotting function to call later. I think this can work but need some help with syntax.
#Enter example data
tC <- textConnection("
ID StartDate VisitDate Treatstarted count1 count2 count3 Treatgroup
C0098 13-Jan-07 12-Feb-10 NA 457 343 957 0
C0098 13-Jan-06 2-Jul-10 NA 467 345 56 0
C0098 13-Jan-06 7-Oct-10 NA 420 234 435 0
C0098 13-Jan-05 3-Feb-11 NA 357 243 345 0
C0098 14-Jan-06 8-Jun-11 NA 209 567 254 0
C0098 13-Jan-06 9-Jul-11 NA 223 235 54 0
C0098 13-Jan-06 12-Oct-11 NA 309 245 642 0
C0110 13-Jan-06 23-Jun-10 30-Oct-10 629 2436 45 1
C0110 13-Jan-07 30-Sep-10 30-Oct-10 461 467 453 1
C0110 13-Jan-06 15-Feb-11 30-Oct-10 270 365 234 1
C0110 13-Jan-06 22-Jun-11 30-Oct-10 236 245 23 1
C0151 13-Jan-08 2-Feb-10 30-Oct-10 199 653 456 1
C0151 13-Jan-06 24-Mar-10 3-Apr-10 936 25 654 1
C0151 13-Jan-06 7-Jul-10 3-Apr-10 1147 254 666 1
C0151 13-Jan-06 9-Mar-11 3-Apr-10 1192 254 777 1
")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
# format date
data1$VisitDate <- with(data1,as.Date(VisitDate,format="%d-%b-%y"))
# stuck: need to define days as VisitDate - StartDate for each row of dataframe (I know I need an apply family fxn here)
data1$Days <- [applyfunction of some kind ](VisitDate,ID,function(x){x-data1$StartDate})))
# Unsure here. Need to define plot function
plot_one <- function(d){
with(d, plot(Days, Count, t="n", tck=1, cex.main = 0.8, ylab = "", yaxt = 'n', xlab = "", xaxt="n", xlim=c(0,1000), ylim=c(0,1200))) # set limits
grid(lwd = 0.3, lty = 7)
with(d[d$Treatgroup == 0,], points(Days, Count1, col = 1))
with(d[d$Treatgroup == 1,], points(Days, Count1, col = 2))
}
#Create multiple plot figure
par(mfrow=c(2,2), oma = c(0.5,0.5,0.5,0.5), mar = c(0.5,0.5,0.5,0.5))
#trouble here. I need to call the column names somehow, with; plyr::d_ply(data1, ???, plot_one)
Task 1:
data1$days <- floor(as.numeric(as.POSIXlt(data1$VisitDate,format="%d-%b-%y")
-as.POSIXlt(data1$StartDate,format="%d-%b-%y")))
Task 2:
par(mfrow=c(3,1), oma = c(2,0.5,1,0.5), mar = c(2,0.5,1,0.5))
plot(data1$days, data1$count1, col=as.factor(data1$Treatgroup), main="count1")
plot(data1$days, data1$count2, col=as.factor(data1$Treatgroup), main="count2")
plot(data1$days, data1$count3, col=as.factor(data1$Treatgroup), main="count3")