Adding information on a graph using R - r

I would like to add some information on my graph which was plotted from this data set:
EDITTED:
#data set:
day <- c(0:28)
ndied <- c(342,335,240,122,74,64,49,60,51,44,35,48,41,34,38,27,29,23,20,15,20,16,17,17,14,10,4,1,2)
pdied <- c(19.1,18.7,13.4,6.8,4.1,3.6,2.7,3.3,2.8,2.5,2.0,2.7,2.3,1.9,2.1,1.5,1.6,1.3,1.1,0.8,1.1,0.9,0.9,0.9,0.8,0.6,0.2,0.1,0.1)
pmort <- data.frame(day,ndied,pdied)
> pmort
day ndied pdied
1 0 342 19.1
2 1 335 18.7
3 2 240 13.4
4 3 122 6.8
5 4 74 4.1
6 5 64 3.6
7 6 49 2.7
8 7 60 3.3
9 8 51 2.8
10 9 44 2.5
11 10 35 2.0
12 11 48 2.7
13 12 41 2.3
14 13 34 1.9
15 14 38 2.1
16 15 27 1.5
17 16 29 1.6
18 17 23 1.3
19 18 20 1.1
20 19 15 0.8
21 20 20 1.1
22 21 16 0.9
23 22 17 0.9
24 23 17 0.9
25 24 14 0.8
26 25 10 0.6
27 26 4 0.2
28 27 1 0.1
29 28 2 0.1
I have put together this script and still trying to improve on it so that the rest of the information can be added:
> barplot(pmort$pdied,xlab="Age(days)",ylab="Percent",xlim=c(0,28),ylim=c(0,20),legend="Mortality")
I am trying to insert the numbers 0 to 28 (age in days) on the x-axis but could not and I know that it could be a simple script. Secondly, I would like to add the number died or ndied (342 to 2) below each day(0 to 28) along the x-axis.
Example:
0 1 2 3 4 5 and so on...
(N=342) (N=335) (N=240) (N=122) (N=74) (N=64)
Graph:
Any help would be appreciated.
Baz

I gave you two ways to plot the info: one above the bars and one below. You can tweak it to meet your needs.
barX <- barplot(pmort$pdied,xlab="Age(days)",
ylab="Percent", names=pmort$day,
xlim=c(0,28),ylim=c(0,20),legend="Mortality")
text(cex=.5, x=barX, y=pmort$pdied+par("cxy")[2]/2, pmort$ndied, xpd=TRUE)
barX <- barplot(pmort$pdied,xlab="Age(days)",
ylab="Percent", names=pmort$day,
xlim=c(0,28),ylim=c(0,20),legend="Mortality")
text(cex=.5, x=barX, y=-.5, pmort$ndied, xpd=TRUE)

Related

How to transfer a column from a dataset sharing the same one with another one

I have two versions of datasets sharing the same columns (more or less). Let's take as an example
db = airquality
db1 = airquality[,-c(6)]
db1$Ozone[db1$Ozone < 30] <- 24
db1$Month[db1$Month == 5] <- 24
db
db1
If I would like to transfer two columns 'Ozone' and 'Wind' from the dataset 'db1' to the 'db' dataset by writing a code using the pipe operator %>% or another iterative method to achieve this result, which code you may possibly suggest?
Thanks
You csn do:
library(dplyr)
db1 %>%
select(Ozone, Wind) %>%
bind_cols(db)
Note that in this example, since some column names will be duplicated in the final result, dplyr will automatically rename the duplicates by appending numbers to the end of the column names.
Base R:
cbind(db, db1[,c(1,3)])
Ozone Solar.R Wind Temp Month Day Ozone Wind
1 41 190 7.4 67 5 1 41 7.4
2 36 118 8.0 72 5 2 36 8.0
3 12 149 12.6 74 5 3 24 12.6
4 18 313 11.5 62 5 4 24 11.5
5 NA NA 14.3 56 5 5 NA 14.3
6 28 NA 14.9 66 5 6 24 14.9
7 23 299 8.6 65 5 7 24 8.6
8 19 99 13.8 59 5 8 24 13.8
9 8 19 20.1 61 5 9 24 20.1
10 NA 194 8.6 69 5 10 NA 8.6
11 7 NA 6.9 74 5 11 24 6.9
12 16 256 9.7 69 5 12 24 9.7
.
.
.

Changing specific values in a data frame

I have the following data (in a data frame), they are grouped by every 4 rows.
x y
1 1.495 0.0
2 1.500 30.0
3 2.500 30.0
4 2.505 0.0
5 8.495 0.0
6 8.500 30.0
7 9.500 30.0
8 9.505 0.0
9 10.495 0.0
10 10.500 30.0
11 11.500 30.0
12 11.505 0.0
13 16.495 0.0 ##From here
14 16.500 30.0
15 17.500 30.0
16 17.505 0.0
17 17.495 0.0
18 17.500 30.0
19 18.500 30.0
20 18.505 0.0 ## End here
21 19.495 0.0
22 19.500 30.0
23 20.500 30.0
24 20.505 0.0
25 23.495 0.0
26 23.500 30.0
27 24.500 30.0
28 24.505 0.0
.
.
.
I am trying to change the y-value of the rows that are overlapped (according to their x-values). For example, rows (13 to 16) are overlapped with row (17 to 20).
x-values of row 13-16: 16.495 16.500 -------- 17.500 17.505
x-values of row 17-20: ------------------ 17.495 17.500 ----------18.500 18.505
There are overlap from 17.495 to 17.505.
I would like to make the "in between" rows into something like:
13 16.495 0.0 ##From here
14 16.500 30.0
15 17.500 30.0
16 17.505 30.0
17 17.495 30.0
18 17.500 30.0
19 18.500 30.0
20 18.505 0.0 ## End here
Any idea how to do this?
Seeing the present sample data, it seems that you want to identify row(s) where a previous value in x is larger than the following value in x. In this case, row 17 is the one. Similarly, you want to identify row(s) where a value in x is larger than the following value in x. In this case, row 16 is the one. So, I tried to get row numbers for these rows in the following way. Note that your data is called mydf here.
ind <- c(which(x = lag(mydf$x) > mydf$x), which(x = lead(mydf$x) < mydf$x))
# Overwrite two specific elements in y
mydf$y[ind] <- 30
Here is the result for the part you specified. I hope this will help you.
#13 16.495 0
#14 16.500 30
#15 17.500 30
#16 17.505 30
#17 17.495 30
#18 17.500 30
#19 18.500 30
#20 18.505 0
Using a for loop, you can do the following (assuming your dataframe is called df):
# defining start and end values to process data by group of 4
start = seq(1,length(df$x),by = 4)
end = seq(4,length(df$x),by = 4)
# loop to inspect data by group of 4 and replace data in df in function of the overlap
for(i in 1:(length(start)-1))
{
if(max(df[start[i]:end[i],"x"]) > min(df[start[i+1]:end[i+1],"x"]))
{
df[end[i],"y"] = 30.0
df[start[i+1],"y"] = 30.0
}
else{}
}
And you get the following dataframe:
> df
x y
1 1.495 0
2 1.500 30
3 2.500 30
4 2.505 0
5 8.495 0
6 8.500 30
7 9.500 30
8 9.505 0
9 10.495 0
10 10.500 30
11 11.500 30
12 11.505 0
13 16.495 0
14 16.500 30
15 17.500 30
16 17.505 30
17 17.495 30
18 17.500 30
19 18.500 30
20 18.505 0
21 19.495 0
22 19.500 30
23 20.500 30
24 20.505 0
25 23.495 0
26 23.500 30
27 24.500 30
28 24.505 0

Wrong Fit using nls function

When I try to fit an exponential decay and my x axis has decimal number, the fit is never correct. Here's my data below:
exp.decay = data.frame(time,counts)
time counts
1 0.4 4458
2 0.6 2446
3 0.8 1327
4 1.0 814
5 1.2 549
6 1.4 401
7 1.6 266
8 1.8 182
9 2.0 140
10 2.2 109
11 2.4 83
12 2.6 78
13 2.8 57
14 3.0 50
15 3.2 31
16 3.4 22
17 3.6 23
18 3.8 20
19 4.0 19
20 4.2 9
21 4.4 7
22 4.6 4
23 4.8 6
24 5.0 4
25 5.2 6
26 5.4 2
27 5.6 7
28 5.8 2
29 6.0 0
30 6.2 3
31 6.4 1
32 6.6 1
33 6.8 2
34 7.0 1
35 7.2 2
36 7.4 1
37 7.6 1
38 7.8 0
39 8.0 0
40 8.2 0
41 8.4 0
42 8.6 1
43 8.8 0
44 9.0 0
45 9.2 0
46 9.4 1
47 9.6 0
48 9.8 0
49 10.0 1
fit.one.exp <- nls(counts ~ A*exp(-k*time),data=exp.decay, start=c(A=max(counts),k=0.1))
plot(exp.decay, col='darkblue',xlab = 'Track Duration (seconds)',ylab = 'Number of Particles', main = 'Exponential Fit')
lines(predict(fit.one.exp), col = 'red', lty=2, lwd=2)
I always get this weird fit. Seems to me that the fit is not recognizing the right x axis, because when I use a different set of data, with only integers in the x axis (time) the fit works! I don't understand why it's different with different units.
You need one small modification:
lines(predict(fit.one.exp), col = 'red', lty=2, lwd=2)
should be
lines(exp.decay$time, predict(fit.one.exp), col = 'red', lty=2, lwd=2)
This way you make sure to plot against the desired values on your abscissa.
I tested it like this:
data = read.csv('exp_fit_r.csv')
A0 <- max(data$count)
k0 <- 0.1
fit <- nls(data$count ~ A*exp(-k*data$time), start=list(A=A0, k=k0), data=data)
plot(data)
lines(data$time, predict(fit), col='red')
which gives me the following output:
As you can see, the fit describes the actual data very well, it was just a matter of plotting against the correct abscissa values.

Does ggplot2 exclude some data?

I want to create some basic grouped barplots with ggplot2 but it seems to exclude some data. If I review my input data everything is there, but some bars are missing and it is also messing with the error bars. I tried to convert into multiple variable types, regrouped, loaded again, saved everything in .csv and loaded all new... I just don't know what is wrong.
Here is my code:
library(ggplot2)
limits <- aes(ymax = DataCm$mean + DataCm$sd,
ymin = DataCm$mean - DataCm$sd)
p <- ggplot(data = DataCm, aes(x = factor(DataCm$Zeit), y = factor(DataCm$mean)
) )
p + geom_bar(stat = "identity",
position = position_dodge(0.9),fill =DataCm$group) +
geom_errorbar(limits, position = position_dodge(0.9),
width = 0.25) +
labs(x = "Time [min]", y = "Individuals per foodsource")
This is DataCm:
Zeit mean sd group
1 30 0.1 0.3162278 1
2 60 0.0 0.0000000 2
3 90 0.1 0.3162278 3
4 120 0.0 0.0000000 4
5 150 0.1 0.3162278 5
6 180 0.1 0.3162278 6
7 240 0.3 0.6749486 1
8 300 0.3 0.6749486 2
9 360 0.3 0.6749486 3
10 30 0.1 0.3162278 4
11 60 0.1 0.3162278 5
12 90 0.2 0.4216370 6
13 120 0.3 0.4830459 1
14 150 0.3 0.4830459 2
15 180 0.4 0.5163978 3
16 240 0.3 0.4830459 4
17 300 0.4 0.5163978 5
18 360 0.4 0.5163978 6
19 30 1.2 1.1352924 1
20 60 1.8 1.6865481 2
21 90 2.2 2.0976177 3
22 120 2.2 2.0976177 4
23 150 2.0 1.8856181 5
24 180 2.3 1.9465068 6
25 240 2.4 2.0655911 1
26 300 2.1 1.8529256 2
27 360 2.0 2.1602469 3
28 30 0.2 0.4216370 4
29 60 0.1 0.3162278 5
30 90 0.1 0.3162278 6
31 120 0.1 0.3162278 1
32 150 0.0 0.0000000 2
33 180 0.1 0.3162278 3
34 240 0.1 0.3162278 4
35 300 0.1 0.3162278 5
36 360 0.1 0.3162278 6
37 30 1.3 1.5670212 1
38 60 1.5 1.5811388 2
39 90 1.5 1.7159384 3
40 120 1.5 1.9002924 4
41 150 1.9 2.1317703 5
42 180 1.9 2.1317703 6
43 240 2.2 2.3475756 1
44 300 2.4 2.3190036 2
45 360 2.2 2.1499354 3
46 30 2.1 2.1317703 4
47 60 3.0 2.2110832 5
48 90 3.3 2.1628171 6
49 120 3.2 2.1499354 1
50 150 3.4 2.6331224 2
51 180 3.5 2.4152295 3
52 240 3.7 2.6267851 4
53 300 3.7 2.4060110 5
54 360 3.8 2.6583203 6
The output is:
Maybe you can help me. Thanks in advance!
Best wishes,
Benjamin
Solved it:
I reshaped everything in Excel and exported it another way. The group variable was also not the way I wanted it. Now it is fixed, but I can't really tell you why.
Your data looks malformed. I guess you wanted to have 6 different group values for each time point, but now the group variable just loops over, and you have:
1 30 0.1 0.3162278 1
...
10 30 0.1 0.3162278 4
...
19 30 1.2 1.1352924 1
...
28 30 0.2 0.4216370 4
geom_bar then probably omits rows that have identical mean and time. Although I am not sure why it chooses to do so, you should solve the group problem first anyway.

Subset Duplicated Values >10

I am looking at a data frame and trying to subset rows that have the same pressure value for more then 5 rows or delete rows that do not have 5 duplicate pressure values...
File Turbidity Pressure
1 3.2 46
2 3.4 46
3 5.4 46
4 3.2 46
5 3.1 46
6 2.3 46
7 2.3 45
8 4.5 45
9 2.3 45
10 3.2 44
11 4.5 44
12 6.5 43
13 3.2 42
14 3.1 41
15 1.2 41
16 2.3 41
17 2.4 41
18 2.1 41
19 1.4 41
25 1.3 41
So basically trying to keep rows that have a pressure of 46 and 41 and delete rows in-between. This is a small portion of my dataset and just need code that will basically keep rows with 5 or more duplicate pressure values and delete others.
Try
library(dplyr)
df %>% group_by(Pressure) %>% filter(n() >= 5)
Which gives:
#Source: local data frame [13 x 3]
#Groups: Pressure
#
# File Turbidity Pressure
#1 1 3.2 46
#2 2 3.4 46
#3 3 5.4 46
#4 4 3.2 46
#5 5 3.1 46
#6 6 2.3 46
#7 14 3.1 41
#8 15 1.2 41
#9 16 2.3 41
#10 17 2.4 41
#11 18 2.1 41
#12 19 1.4 41
#13 25 1.3 41
Here's a data.table solution (relies crucially on Pressure not repeating itself later on):
library(data.table)
setDT(df)[,if(.N>=5) .SD,by=Pressure]
Addendum:
If you expect Pressure values to repeat later on, e.g.:
df<-data.frame(File=c(1:19,25:28),
Pressure=rep(c(46:41,46),c(6,3,2,1,1,7,3)))
Then you'll need to use rleid in order to keep only groups of at least 5 in a row (no gaps):
setDT(df)[,ct:=rleid(Pressure)][,if (.N>=5) .SD,by=ct]
Here is a solution using base R:
df <- data.frame(id=1:10, Pressure=c(rep(1,5),6:10))
p.counts <- table(df[,"Pressure"])
good.pressures <- as.numeric(names(p.counts))[p.counts>=5]
df.sub <- df[df[,"Pressure"]%in%good.pressures,]
Note that I'm using df as an example data set, so you can delete that first line of code and replace all instances of df with the name of your data.frame.

Resources