not dealing properly with dates in R - r

I am trying to use selectByDate from openair package but got stuck in my second try
I have A
> A
date x
23 1982-08-23 0.0
24 1982-08-24 0.0
25 1982-08-25 0.0
26 1982-08-26 9.3
27 1982-08-27 0.0
28 1982-08-28 0.2
29 1982-08-29 0.0
30 1982-08-30 0.0
31 1982-08-31 0.0
32 1982-09-01 0.0
33 1982-09-02 0.2
34 1982-09-03 0.9
35 1982-09-04 4.2
36 1982-09-05 0.0
37 1982-09-06 0.0
38 1982-09-07 1.2
39 1982-09-08 0.0
40 1982-09-09 0.0
and then
> selectByDate(A, month = 9)
date x
10 1982-09-01 0.0
11 1982-09-02 0.2
12 1982-09-03 0.9
13 1982-09-04 4.2
14 1982-09-05 0.0
15 1982-09-06 0.0
16 1982-09-07 1.2
17 1982-09-08 0.0
18 1982-09-09 0.0
but with B
16 1971-04-20 100511
17 1971-04-21 100795
18 1971-04-22 101008
19 1971-04-23 101292
20 1971-04-24 101577
21 1971-04-25 101862
22 1971-04-26 102220
23 1971-04-27 103372
24 1971-04-28 103662
25 1971-04-29 103807
26 1971-04-30 104025
27 1971-05-01 104316
28 1971-05-02 104462
29 1971-05-03 104681
30 1971-05-04 104900
31 1971-05-05 105047
I got
> selectByDate(B, month = 4)
Error in as.POSIXlt.default(x, tz = tz(x)) :
do not know how to convert 'x' to class “POSIXlt”
I am a beginner in R and I cant see why this happens. Any clue?

Convert data to as.POSIXct class and then try :
B$date <- as.POSIXct(B$date, '%Y-%m-%d')
openair::selectByDate(B, month = 4)
You can also do this in base R :
subset(B, as.integer(format(date, '%m')) == 4)

Related

Changing specific values in a data frame

I have the following data (in a data frame), they are grouped by every 4 rows.
x y
1 1.495 0.0
2 1.500 30.0
3 2.500 30.0
4 2.505 0.0
5 8.495 0.0
6 8.500 30.0
7 9.500 30.0
8 9.505 0.0
9 10.495 0.0
10 10.500 30.0
11 11.500 30.0
12 11.505 0.0
13 16.495 0.0 ##From here
14 16.500 30.0
15 17.500 30.0
16 17.505 0.0
17 17.495 0.0
18 17.500 30.0
19 18.500 30.0
20 18.505 0.0 ## End here
21 19.495 0.0
22 19.500 30.0
23 20.500 30.0
24 20.505 0.0
25 23.495 0.0
26 23.500 30.0
27 24.500 30.0
28 24.505 0.0
.
.
.
I am trying to change the y-value of the rows that are overlapped (according to their x-values). For example, rows (13 to 16) are overlapped with row (17 to 20).
x-values of row 13-16: 16.495 16.500 -------- 17.500 17.505
x-values of row 17-20: ------------------ 17.495 17.500 ----------18.500 18.505
There are overlap from 17.495 to 17.505.
I would like to make the "in between" rows into something like:
13 16.495 0.0 ##From here
14 16.500 30.0
15 17.500 30.0
16 17.505 30.0
17 17.495 30.0
18 17.500 30.0
19 18.500 30.0
20 18.505 0.0 ## End here
Any idea how to do this?
Seeing the present sample data, it seems that you want to identify row(s) where a previous value in x is larger than the following value in x. In this case, row 17 is the one. Similarly, you want to identify row(s) where a value in x is larger than the following value in x. In this case, row 16 is the one. So, I tried to get row numbers for these rows in the following way. Note that your data is called mydf here.
ind <- c(which(x = lag(mydf$x) > mydf$x), which(x = lead(mydf$x) < mydf$x))
# Overwrite two specific elements in y
mydf$y[ind] <- 30
Here is the result for the part you specified. I hope this will help you.
#13 16.495 0
#14 16.500 30
#15 17.500 30
#16 17.505 30
#17 17.495 30
#18 17.500 30
#19 18.500 30
#20 18.505 0
Using a for loop, you can do the following (assuming your dataframe is called df):
# defining start and end values to process data by group of 4
start = seq(1,length(df$x),by = 4)
end = seq(4,length(df$x),by = 4)
# loop to inspect data by group of 4 and replace data in df in function of the overlap
for(i in 1:(length(start)-1))
{
if(max(df[start[i]:end[i],"x"]) > min(df[start[i+1]:end[i+1],"x"]))
{
df[end[i],"y"] = 30.0
df[start[i+1],"y"] = 30.0
}
else{}
}
And you get the following dataframe:
> df
x y
1 1.495 0
2 1.500 30
3 2.500 30
4 2.505 0
5 8.495 0
6 8.500 30
7 9.500 30
8 9.505 0
9 10.495 0
10 10.500 30
11 11.500 30
12 11.505 0
13 16.495 0
14 16.500 30
15 17.500 30
16 17.505 30
17 17.495 30
18 17.500 30
19 18.500 30
20 18.505 0
21 19.495 0
22 19.500 30
23 20.500 30
24 20.505 0
25 23.495 0
26 23.500 30
27 24.500 30
28 24.505 0

make grouping bar plot ggplot2

I'm new to ggplot2 but trying to use it. I have two variables: SA( with 4 levels :0, 1000,2000 and 3000) and GA (with 4 levels:0, 0.5,1 and 2). I would like to group these by SA (like this Figure)
> G<- read.table("k.csv", sep=";",header = TRUE)
> G
SA GA PH
1 0 0.0 41
2 0 0.0 27
3 0 0.0 28
4 0 0.0 25
5 0 0.5 35
6 0 0.5 45
7 0 0.5 35
8 0 0.5 55
9 0 1.0 45
10 0 1.0 35
11 0 1.0 38
12 0 1.0 46
13 0 2.0 52
14 0 2.0 40
15 0 2.0 40
16 0 2.0 35
17 1000 0.0 30
18 1000 0.0 30
19 1000 0.0 30
20 1000 0.0 30
21 1000 0.5 28
22 1000 0.5 33
23 1000 0.5 31
24 1000 0.5 42
25 1000 1.0 38
26 1000 1.0 30
27 1000 1.0 27
28 1000 1.0 25
29 1000 2.0 30
30 1000 2.0 22
31 1000 2.0 31
32 1000 2.0 44
33 2000 0.0 18
34 2000 0.0 25
35 2000 0.0 24
36 2000 0.0 31
37 2000 0.5 24
38 2000 0.5 22
39 2000 0.5 36
40 2000 0.5 40
41 2000 1.0 27
42 2000 1.0 29
43 2000 1.0 42
44 2000 1.0 33
45 2000 2.0 20
46 2000 2.0 40
47 2000 2.0 30
48 2000 2.0 25
49 3000 0.0 0
50 3000 0.0 0
51 3000 0.0 0
52 3000 0.0 0
53 3000 0.5 24
54 3000 0.5 20
55 3000 0.5 25
56 3000 0.5 NA
57 3000 1.0 37
58 3000 1.0 NA
59 3000 1.0 38
60 3000 1.0 25
61 3000 2.0 24
62 3000 2.0 15
63 3000 2.0 20
64 3000 2.0 32
> ggplot(G, aes(x=SA, y=PH, fill=factor(GA))) +
stat_summary(geom="bar",positiGon=position_dodge(1))
but it does not give me what I need. It gives me something different (here it is)
Also, I would like to add error bar to the bars.
Any ideas?
Solution where I use data.table to calculate standard error and mean.
library(data.table)
library(ggplot2)
setDT(G)
pd <- G[, .(SE = sd(PH, na.rm = TRUE) / sqrt(.N),
MN = mean(PH, na.rm = TRUE)),
.(SA, GA)]
ggplot(pd, aes(factor(SA), fill = factor(GA))) +
geom_bar(aes(y = MN),
stat = "identity", position = "dodge") +
geom_errorbar(aes(ymin = MN - SE, ymax = MN + SE),
position = "dodge") +
labs(x = "SA",
y = "PH",
fill = "GA") +
theme_classic()
library(tidyr)
std.error <- function(x) sd(x)/sqrt(length(x))
means <- function(x)mean(x, na.rm=TRUE)
df2 <- G %>%
group_by(SA, GA) %>%
mutate(error=std.error(PH)) %>%
summarise_at(vars(PH:error), funs(means))
ggplot(df2, aes(as.factor(SA), PH, fill=as.factor(GA))) +
geom_bar(stat="identity", position="dodge") +
geom_errorbar(aes(ymin=PH-error, ymax=PH+error),
width=.2, position=position_dodge(.9))

Wrong Fit using nls function

When I try to fit an exponential decay and my x axis has decimal number, the fit is never correct. Here's my data below:
exp.decay = data.frame(time,counts)
time counts
1 0.4 4458
2 0.6 2446
3 0.8 1327
4 1.0 814
5 1.2 549
6 1.4 401
7 1.6 266
8 1.8 182
9 2.0 140
10 2.2 109
11 2.4 83
12 2.6 78
13 2.8 57
14 3.0 50
15 3.2 31
16 3.4 22
17 3.6 23
18 3.8 20
19 4.0 19
20 4.2 9
21 4.4 7
22 4.6 4
23 4.8 6
24 5.0 4
25 5.2 6
26 5.4 2
27 5.6 7
28 5.8 2
29 6.0 0
30 6.2 3
31 6.4 1
32 6.6 1
33 6.8 2
34 7.0 1
35 7.2 2
36 7.4 1
37 7.6 1
38 7.8 0
39 8.0 0
40 8.2 0
41 8.4 0
42 8.6 1
43 8.8 0
44 9.0 0
45 9.2 0
46 9.4 1
47 9.6 0
48 9.8 0
49 10.0 1
fit.one.exp <- nls(counts ~ A*exp(-k*time),data=exp.decay, start=c(A=max(counts),k=0.1))
plot(exp.decay, col='darkblue',xlab = 'Track Duration (seconds)',ylab = 'Number of Particles', main = 'Exponential Fit')
lines(predict(fit.one.exp), col = 'red', lty=2, lwd=2)
I always get this weird fit. Seems to me that the fit is not recognizing the right x axis, because when I use a different set of data, with only integers in the x axis (time) the fit works! I don't understand why it's different with different units.
You need one small modification:
lines(predict(fit.one.exp), col = 'red', lty=2, lwd=2)
should be
lines(exp.decay$time, predict(fit.one.exp), col = 'red', lty=2, lwd=2)
This way you make sure to plot against the desired values on your abscissa.
I tested it like this:
data = read.csv('exp_fit_r.csv')
A0 <- max(data$count)
k0 <- 0.1
fit <- nls(data$count ~ A*exp(-k*data$time), start=list(A=A0, k=k0), data=data)
plot(data)
lines(data$time, predict(fit), col='red')
which gives me the following output:
As you can see, the fit describes the actual data very well, it was just a matter of plotting against the correct abscissa values.

Does ggplot2 exclude some data?

I want to create some basic grouped barplots with ggplot2 but it seems to exclude some data. If I review my input data everything is there, but some bars are missing and it is also messing with the error bars. I tried to convert into multiple variable types, regrouped, loaded again, saved everything in .csv and loaded all new... I just don't know what is wrong.
Here is my code:
library(ggplot2)
limits <- aes(ymax = DataCm$mean + DataCm$sd,
ymin = DataCm$mean - DataCm$sd)
p <- ggplot(data = DataCm, aes(x = factor(DataCm$Zeit), y = factor(DataCm$mean)
) )
p + geom_bar(stat = "identity",
position = position_dodge(0.9),fill =DataCm$group) +
geom_errorbar(limits, position = position_dodge(0.9),
width = 0.25) +
labs(x = "Time [min]", y = "Individuals per foodsource")
This is DataCm:
Zeit mean sd group
1 30 0.1 0.3162278 1
2 60 0.0 0.0000000 2
3 90 0.1 0.3162278 3
4 120 0.0 0.0000000 4
5 150 0.1 0.3162278 5
6 180 0.1 0.3162278 6
7 240 0.3 0.6749486 1
8 300 0.3 0.6749486 2
9 360 0.3 0.6749486 3
10 30 0.1 0.3162278 4
11 60 0.1 0.3162278 5
12 90 0.2 0.4216370 6
13 120 0.3 0.4830459 1
14 150 0.3 0.4830459 2
15 180 0.4 0.5163978 3
16 240 0.3 0.4830459 4
17 300 0.4 0.5163978 5
18 360 0.4 0.5163978 6
19 30 1.2 1.1352924 1
20 60 1.8 1.6865481 2
21 90 2.2 2.0976177 3
22 120 2.2 2.0976177 4
23 150 2.0 1.8856181 5
24 180 2.3 1.9465068 6
25 240 2.4 2.0655911 1
26 300 2.1 1.8529256 2
27 360 2.0 2.1602469 3
28 30 0.2 0.4216370 4
29 60 0.1 0.3162278 5
30 90 0.1 0.3162278 6
31 120 0.1 0.3162278 1
32 150 0.0 0.0000000 2
33 180 0.1 0.3162278 3
34 240 0.1 0.3162278 4
35 300 0.1 0.3162278 5
36 360 0.1 0.3162278 6
37 30 1.3 1.5670212 1
38 60 1.5 1.5811388 2
39 90 1.5 1.7159384 3
40 120 1.5 1.9002924 4
41 150 1.9 2.1317703 5
42 180 1.9 2.1317703 6
43 240 2.2 2.3475756 1
44 300 2.4 2.3190036 2
45 360 2.2 2.1499354 3
46 30 2.1 2.1317703 4
47 60 3.0 2.2110832 5
48 90 3.3 2.1628171 6
49 120 3.2 2.1499354 1
50 150 3.4 2.6331224 2
51 180 3.5 2.4152295 3
52 240 3.7 2.6267851 4
53 300 3.7 2.4060110 5
54 360 3.8 2.6583203 6
The output is:
Maybe you can help me. Thanks in advance!
Best wishes,
Benjamin
Solved it:
I reshaped everything in Excel and exported it another way. The group variable was also not the way I wanted it. Now it is fixed, but I can't really tell you why.
Your data looks malformed. I guess you wanted to have 6 different group values for each time point, but now the group variable just loops over, and you have:
1 30 0.1 0.3162278 1
...
10 30 0.1 0.3162278 4
...
19 30 1.2 1.1352924 1
...
28 30 0.2 0.4216370 4
geom_bar then probably omits rows that have identical mean and time. Although I am not sure why it chooses to do so, you should solve the group problem first anyway.

Adding information on a graph using R

I would like to add some information on my graph which was plotted from this data set:
EDITTED:
#data set:
day <- c(0:28)
ndied <- c(342,335,240,122,74,64,49,60,51,44,35,48,41,34,38,27,29,23,20,15,20,16,17,17,14,10,4,1,2)
pdied <- c(19.1,18.7,13.4,6.8,4.1,3.6,2.7,3.3,2.8,2.5,2.0,2.7,2.3,1.9,2.1,1.5,1.6,1.3,1.1,0.8,1.1,0.9,0.9,0.9,0.8,0.6,0.2,0.1,0.1)
pmort <- data.frame(day,ndied,pdied)
> pmort
day ndied pdied
1 0 342 19.1
2 1 335 18.7
3 2 240 13.4
4 3 122 6.8
5 4 74 4.1
6 5 64 3.6
7 6 49 2.7
8 7 60 3.3
9 8 51 2.8
10 9 44 2.5
11 10 35 2.0
12 11 48 2.7
13 12 41 2.3
14 13 34 1.9
15 14 38 2.1
16 15 27 1.5
17 16 29 1.6
18 17 23 1.3
19 18 20 1.1
20 19 15 0.8
21 20 20 1.1
22 21 16 0.9
23 22 17 0.9
24 23 17 0.9
25 24 14 0.8
26 25 10 0.6
27 26 4 0.2
28 27 1 0.1
29 28 2 0.1
I have put together this script and still trying to improve on it so that the rest of the information can be added:
> barplot(pmort$pdied,xlab="Age(days)",ylab="Percent",xlim=c(0,28),ylim=c(0,20),legend="Mortality")
I am trying to insert the numbers 0 to 28 (age in days) on the x-axis but could not and I know that it could be a simple script. Secondly, I would like to add the number died or ndied (342 to 2) below each day(0 to 28) along the x-axis.
Example:
0 1 2 3 4 5 and so on...
(N=342) (N=335) (N=240) (N=122) (N=74) (N=64)
Graph:
Any help would be appreciated.
Baz
I gave you two ways to plot the info: one above the bars and one below. You can tweak it to meet your needs.
barX <- barplot(pmort$pdied,xlab="Age(days)",
ylab="Percent", names=pmort$day,
xlim=c(0,28),ylim=c(0,20),legend="Mortality")
text(cex=.5, x=barX, y=pmort$pdied+par("cxy")[2]/2, pmort$ndied, xpd=TRUE)
barX <- barplot(pmort$pdied,xlab="Age(days)",
ylab="Percent", names=pmort$day,
xlim=c(0,28),ylim=c(0,20),legend="Mortality")
text(cex=.5, x=barX, y=-.5, pmort$ndied, xpd=TRUE)

Resources