New R user here, working with meteorological data (data frame is called "Stations"). Trying to plot 3 time series with temperature on y-axis with a regression line on each one, but I encounter a few problems and there is no error messages.
Loop doesn't seem to be working and I can't figure out why.
Didn't manage to change x-axis graduation values for years ("Année" in the data frame) instead of a number.
Title is the same for the 3 plots, how do I change it so each plot has its own title?
Regression line is not shown on the graph.
Thanks in advance!
Here is my code :
for (i in c(6,8,10))
plot(ts(Stations[,i]), col="dodgerblue4", xlab="Temps", ylab="Température", main="Genève")
for (i in c(6,8,10))
abline(h=Stations[,i])```
Nb.enr time Année Mois Jour T2m_GE pcp_GE T2m_PU pcp_PU T2m_NY
1 19810101 1981 1 1 1.3 0.3 2.8 0.0 2.3
2 19810102 1981 1 2 1.2 0.1 2.3 1.2 1.6
3 19810103 1981 1 3 4.1 21.8 4.9 5.2 3.8
4 19810104 1981 1 4 5.1 10.3 5.1 17.4 4.9
5 19810105 1981 1 5 0.9 0.0 1.0 0.1 0.8
6 19810106 1981 1 6 0.5 5.7 0.7 6.0 0.5
7 19810107 1981 1 7 -2.7 0.0 -2.1 0.1 -1.9
8 19810108 1981 1 8 -3.2 0.0 -4.1 0.0 -3.8
9 19810109 1981 1 9 -5.2 0.0 -3.5 0.0 -5.1
10 19810110 1981 1 10 -3.1 10.6 -0.9 6.0 -2.6
This question already has answers here:
calculate the mean for each column of a matrix in R
(10 answers)
Closed 3 years ago.
I have a data frame:
colA colB
1 15.3 1.76
2 10.8 1.34
3 8.1 1.27
4 19.5 1.47
5 7.2 1.27
6 5.3 1.49
7 9.3 1.31
8 11.1 1.09
9 7.5 1.18
10 12.2 1.22
11 6.7 1.25
12 5.2 1.19
13 19.0 1.95
14 15.1 1.28
15 6.7 1.52
16 8.6 NA
17 4.2 1.12
18 10.3 1.37
19 12.5 1.19
20 16.1 1.05
21 13.3 1.32
22 4.9 1.03
23 8.8 1.12
24 9.5 1.70
How would I be able to remove/change the value of all NAs such that when I use sapply (i.e. sapply(x, mean)), I am taking the mean of 24 rows in the case of colA and 23 columns for colB?
I understand that data frames have to have the same number of rows so using something like na.omit() would not work because it'd remove, in this case, row 16; I'd lose a row of data when I'm calculating the mean for colA.
Thanks!
You should be able to pass na.rm = TRUE and get the mean.
Example:
df <- data.frame(A = 1:3, B = c(NA, 1, 2))
apply(df, 2, mean, na.rm = TRUE)
# A B
# 2.0 1.5
I have a problem with the way aggregate or N/A deals with sums.
I would like the sums per area.code from following table
test <- read.table(text = "
area.code A B C D
1 0 NA 0.00 NA NA
2 1 0.0 3.10 9.6 0.0
3 1 0.0 3.20 6.0 0.0
4 2 0.0 6.10 5.0 0.0
5 2 0.0 6.50 8.0 0.0
6 2 0.0 6.90 4.0 3.1
7 3 0.0 6.70 3.0 3.2
8 3 0.0 6.80 3.1 6.1
9 3 0.0 0.35 3.2 6.5
10 3 0.0 0.67 6.1 6.9
11 4 0.0 0.25 6.5 6.7
12 5 0.0 0.68 6.9 6.8
13 6 0.0 0.95 6.7 0.0
14 7 1.2 NA 6.8 0.0
")
So, seems pretty easy:
aggregate(.~area.code, test, sum)
area.code A B C D
1 1 0 6.30 15.6 0.0
2 2 0 19.50 17.0 3.1
3 3 0 14.52 15.4 22.7
4 4 0 0.25 6.5 6.7
5 5 0 0.68 6.9 6.8
6 6 0 0.95 6.7 0.0
Apparently not so simple, because area code 7 is completely omitted from the aggregate() command.
I would however like the N/As to be completely ignored or computed as zero values, which na= command gives that option?
replacing all N/As with 0 is an option if I just want the sum... but the mean is really problematic then (since it can't differentiate between 0 and N/A anymore)
If you are willing to consider an external package (data.table):
setDT(test)
test[, lapply(.SD, sum), area.code]
area.code A B C D
1: 0 NA 0.00 NA NA
2: 1 0.0 6.30 15.6 0.0
3: 2 0.0 19.50 17.0 3.1
4: 3 0.0 14.52 15.4 22.7
5: 4 0.0 0.25 6.5 6.7
6: 5 0.0 0.68 6.9 6.8
7: 6 0.0 0.95 6.7 0.0
8: 7 1.2 NA 6.8 0.0
One option is to create a function that gives NA when all the values are NA or otherwise use sum. Along with that, use na.action argument in aggregate as aggregate can remove the row if there is at least one NA
f1 <- function(x) if(all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(.~area.code, test, f1, na.action = na.pass)
# area.code A B C D
#1 0 NA 0.00 NA NA
#2 1 0.0 6.30 15.6 0.0
#3 2 0.0 19.50 17.0 3.1
#4 3 0.0 14.52 15.4 22.7
# 4 0.0 0.25 6.5 6.7
#6 5 0.0 0.68 6.9 6.8
#7 6 0.0 0.95 6.7 0.0
#8 7 1.2 NA 6.8 0.0
When there are only NA elements and we use sum with na.rm = TRUE, it returns 0
sum(c(NA, NA), na.rm = TRUE)
#[1] 0
Another solution is to use dplyr:
test %>%
group_by(area.code) %>%
summarise_all(sum, na.rm = TRUE)
This is my sample data:
date label type exdate x y z w
1 10 A 2 15 0.25 0.35 13.49
1 10 A 2 12.5 1.30 1.45 13.49
1 10 B 2 10 1.7 1.8 13.49
1 10 B 2 12.5 0.3 0.4 13.49
1 10 B 2 17.5 1.8 0.3 13.49
1 11 A 3 15 0.75 0.8 13.49
1 11 A 3 12.5 1.8 1.9 13.49
1 11 A 3 17.5 0.2 0.35 13.49
1 11 B 3 10 0.1 0.25 13.49
1 11 B 3 15 2.15 2.3 13.49
1 11 B 3 12.5 0.8 0.85 13.49
1 11 B 3 17.5 4.1 4.3 13.49
2 11 A 4 10 3.7 4 13.49
2 11 A 4 15 1 1.1 13.49
2 11 A 4 12.5 2.05 2.2 13.49
2 11 A 4 17.5 0.4 0.55 13.49
2 11 B 4 10 0.3 0.4 13.49
2 11 B 4 15 2.45 2.6 13.49
2 11 B 4 12.5 1.05 1.15 13.49
2 11 B 4 17.5 4.3 4.6 13.49
Firstly, I will group my data set by c(date,label,exdate), and for each group it will be A and B inside variable 'type'. BUT I want to let the number of rows for type A and type B is the same.
Filter conditions:
To make data to be the same number of rows, the distance between x and w should be same or almost the same for any pairs of type A and type B.
For example:
type x w
A 2 3.5
A 3 3.5
A 4 3.5
B 1 3.5
B 2 3.5
# The output after filter
type x w
A 2 3.5 (pair with type_B ; x = 1)
A 3 3.5 (pair with type_B ; x = 2)
B 1 3.5
B 2 3.5
So, for the sample data above, the result I hope:
date label type exdate x y z w
1 10 A 2 15 0.25 0.35 13.49
1 10 A 2 12.5 1.30 1.45 13.49
1 10 B 2 12.5 0.3 0.4 13.49
1 10 B 2 17.5 1.8 0.3 13.49
1 11 A 3 15 0.75 0.8 13.49
1 11 A 3 12.5 1.8 1.9 13.49
1 11 A 3 17.5 0.2 0.35 13.49
1 11 B 3 15 2.15 2.3 13.49
1 11 B 3 12.5 0.8 0.85 13.49
1 11 B 3 17.5 4.1 4.3 13.49
2 11 A 4 10 3.7 4 13.49
2 11 A 4 15 1 1.1 13.49
2 11 A 4 12.5 2.05 2.2 13.49
2 11 A 4 17.5 0.4 0.55 13.49
2 11 B 4 10 0.3 0.4 13.49
2 11 B 4 15 2.45 2.6 13.49
2 11 B 4 12.5 1.05 1.15 13.49
2 11 B 4 17.5 4.3 4.6 13.49
To make this result, how can I code? Is it insert else if condition inside filter()?
There are two sensors. The collected data should be changing with time. How can identify the data stuck and replace it with another sensor?
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
so the data has
d
a b c
1 0.1 0.05
2 0.2 0.20
3 0.3 0.30
4 0.4 0.40
5 0.5 0.40
6 0.6 0.40
7 0.7 0.40
8 0.8 0.40
9 0.9 0.40
10 1.0 0.40
11 1.1 0.40
12 1.2 0.40
13 1.3 0.40
14 1.4 0.40
15 1.5 0.40
16 1.6 0.40
17 1.7 0.40
18 1.8 0.40
19 1.9 0.40
20 2.0 0.40
21 2.1 0.40
22 2.2 2.20
23 2.3 2.30
24 2.4 2.40
Sensor c stuck at 0.4 from time a4 to a20, is there a quick way to identify it and replace the stuck part using data from sensor b?
The new column c_updated is what you want. I've created some helpful columns (c_previous and c_is_stuck) which you can remove if you want.
library(dplyr)
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d %>%
mutate(c_previous = lag(c, default = 0), # get previous measurement for sensor c
c_is_stuck = ifelse(c == c_previous, 1 ,0), # flag stuck for sensor c when current measurement is same as previous one
c_updated = ifelse(c_is_stuck == 1, b, c)) # if sensor c is stuck use measurement from sensor b
# a b c c_previous c_is_stuck c_updated
# 1 1 0.1 0.05 0.00 0 0.05
# 2 2 0.2 0.20 0.05 0 0.20
# 3 3 0.3 0.30 0.20 0 0.30
# 4 4 0.4 0.40 0.30 0 0.40
# 5 5 0.5 0.40 0.40 1 0.50
# 6 6 0.6 0.40 0.40 1 0.60
# 7 7 0.7 0.40 0.40 1 0.70
# 8 8 0.8 0.40 0.40 1 0.80
# 9 9 0.9 0.40 0.40 1 0.90
# 10 10 1.0 0.40 0.40 1 1.00
# 11 11 1.1 0.40 0.40 1 1.10
# 12 12 1.2 0.40 0.40 1 1.20
# 13 13 1.3 0.40 0.40 1 1.30
# 14 14 1.4 0.40 0.40 1 1.40
# 15 15 1.5 0.40 0.40 1 1.50
# 16 16 1.6 0.40 0.40 1 1.60
# 17 17 1.7 0.40 0.40 1 1.70
# 18 18 1.8 0.40 0.40 1 1.80
# 19 19 1.9 0.40 0.40 1 1.90
# 20 20 2.0 0.40 0.40 1 2.00
# 21 21 2.1 0.40 0.40 1 2.10
# 22 22 2.2 2.20 0.40 0 2.20
# 23 23 2.3 2.30 2.20 0 2.30
# 24 24 2.4 2.40 2.30 0 2.40
This is a pretty simple way. Duplicate the c column with an offset of 1 and check if the two values are identical. If so, take the value from b.
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d$d <- c(NA, d$c[1:23])
d$replaced <- ifelse(d$c == d$d, d$b, d$c)
a b c d replaced
1 1 0.1 0.05 NA NA
2 2 0.2 0.20 0.05 0.2
3 3 0.3 0.30 0.20 0.3
4 4 0.4 0.40 0.30 0.4
5 5 0.5 0.40 0.40 0.5
6 6 0.6 0.40 0.40 0.6
7 7 0.7 0.40 0.40 0.7
8 8 0.8 0.40 0.40 0.8
9 9 0.9 0.40 0.40 0.9
10 10 1.0 0.40 0.40 1.0
11 11 1.1 0.40 0.40 1.1
12 12 1.2 0.40 0.40 1.2
13 13 1.3 0.40 0.40 1.3
14 14 1.4 0.40 0.40 1.4
15 15 1.5 0.40 0.40 1.5
16 16 1.6 0.40 0.40 1.6
17 17 1.7 0.40 0.40 1.7
18 18 1.8 0.40 0.40 1.8
19 19 1.9 0.40 0.40 1.9
20 20 2.0 0.40 0.40 2.0
21 21 2.1 0.40 0.40 2.1
22 22 2.2 2.20 0.40 2.2
23 23 2.3 2.30 2.20 2.3
24 24 2.4 2.40 2.30 2.4
The bellow solution is as basic as it gets I think. No additional packages required. Cheers!
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d$diff.b <- c(NA, diff(d$b))
d$diff.c <- c(NA, diff(d$c))
stuck.index <- which(d$diff.c==0)
d[stuck.index, "c"] <- d[stuck.index, "b"]
# changing to original data frame format
d$diff.b <- NULL
d$diff.c <- NULL