I have a 24h temporal series (it's a really big data) which I have to work only in certain periods of time. Like in hour = 7, 15 and 23.
I have to remove all the other rows that correspond to the 1, 2, 3, 4... hours. I have to filter the rows in a gap of 40 hours. I will have to always stay with the ones that correspond to the 7, 15 and 23h of a day.
I've been struggling to create another data.frame including only these rows and even just NULL the worthless ones.
I added one print from my data. The columns represent the months, days & hours of 1955. It keeps going until the last day and hour record of 1955.
Data Example
One way to select the 7th, 15th, and 23rd hours based on the data in the screen capture, using the base R extract operator, is as follows.
Tempo <- c(0,3600,7200,10800,1440,18000,21600,25200,28800,
23400,36000,39600,43200,46800,50400,5400,57600,
61200,64800,68400,72000,75600,79200,82800)
Ano <- rep(1955,24)
Mes <- rep(1,24)
Dia <- rep(1,24)
Hora <- 0:23
NiveldoMar <- c(1.07,0.91,0.81,0.78,0.91,1.05,1.32,1.57,
1.60,1.48,1.30,1.07,1.10,1.22,1.42,1.45,1.40,1.32,
1.27,1.40,1.62,NA,NA,NA)
ContoleInterno <- rep(0,24)
data <- data.frame(Tempo,Ano,Mes,Dia,Hora,NiveldoMar)
# select data from hours 7,15, and 23
data[data$Hora %in% c(7,15,23),]
...and the output:
> data[data$Hora %in% c(7,15,23),]
Tempo Ano Mes Dia Hora NiveldoMar
8 25200 1955 1 1 7 1.57
16 5400 1955 1 1 15 1.45
24 82800 1955 1 1 23 NA
>
Related
I am trying to average values in different months over vectors of dates. Basically, I have a dataframe with monthly values of a variable, and I'm trying to get a representative average of the experienced values for samples that sometimes span month boundaries.
I've ended up with a dataframe of monthly values, and vectors of the representative number of "month-year" combinations of every sampling duration (e.g. if a sample was out from Jan 28, 2000 to Feb 1, 2000, the vector would show 4 values of Jan 2000, 1 value of Feb 2000). Later I'm going to average the values with these weights, so it's important that the returned variable values appear in representative numbers.
I am having trouble figuring out how to index the dataframe pulling the representative value repeatedly. See below.
# data frame of monthly values
reprex_df <-
tribble(
~my, ~value,
"2000-01", 10,
"2000-02", 11,
"2000-03", 15,
"2000-04", 9,
"2000-05", 13
) %>%
as.data.frame()
# vector of month-year dates from Jan 28 to Feb 1:
reprex_vec <- c("2000-01","2000-01","2000-01","2000-01","2000-02")
# I want to index the df using the vector to get a return vector of
# January value*4, Feb value*1, or 10, 10, 10, 10, 11
# I tried this:
reprex_df[reprex_df$my %in% reprex_vec,"value"]
# but %in% only returns each value once ("10 11", not "10 10 10 10 11").
# is there a different way I should be indexing to account for repeated values?
# eventually I will take an average, e.g.:
mean(reprex_df[reprex_df$my %in% reprex_vec,"value"])
# but I want this average to equal 10.2 for mean(c(10,10,10,10,11)), not 10.5 for mean(c(10,11))
Simple tidy solution with inner_join:
dplyr::inner_join(reprex_df, data.frame(my = reprex_vec), by = "my")$value
in base R:
merge(reprex_df, list(my = reprex_vec))
my value
1 2000-01 10
2 2000-01 10
3 2000-01 10
4 2000-01 10
5 2000-02 11
Perhaps use match from base R to get the index
reprex_df[match(reprex_vec, reprex_df$my),]
my value
1 2000-01 10
1.1 2000-01 10
1.2 2000-01 10
1.3 2000-01 10
2 2000-02 11
Another base R option using setNames
with(
reprex_df,
data.frame(
my = reprex_vec,
value = setNames(value, my)[reprex_vec]
)
)
gives
my value
1 2000-01 10
2 2000-01 10
3 2000-01 10
4 2000-01 10
5 2000-02 11
I am facing one problem, I calculated a monthly interest rate for a mortgage, however, I would need to sum the results in order to have it yearly (always 12 months).
H <- 2000000 # mortgage
i.m <- 0.03/12 # rate per month
year <- 15 # years
a <- (H*i.m*(1+i.m)^(12*year))/
((1+i.m)^(12*year)-1)
a # monthly payment
interest <- a*(1-(1/(1+i.m)^(0:(year*12))))
interest
cumsum(a*(1-(1/(1+i.m)^(0:(year*12))))) # first 12 values together and then next 12 values + first values and ... (I want to have for every year a value)
You may do this with tapply in base R.
monthly <- cumsum(a*(1-(1/(1+i.m)^(0:(year*12)))))
yearly <- tapply(monthly, ceiling(seq_along(monthly)/12), sum)
I think you can use the following solution:
monthly <- cumsum(a*(1-(1/(1+i.m)^(0:(year*12)))))
sapply(split(monthly, ceiling(seq_along(monthly) / 12)), function(x) x[length(x)])
1 2 3 4 5 6 7 8
2254.446 9334.668 21098.218 37406.855 58126.414 83126.695 112281.337 145467.712
9 10 11 12 13 14 15 16
182566.812 223463.138 268044.605 316202.434 367831.057 422828.023 481093.905 486093.905
Dears
I am trying to breakdown quarterly data into monthly data using R. I am not concerned about dates as I can generate a vector of months corresponding to values without problems. The problem is with the vector of values and imputing the missing data. See the example below:
Quarter Value
2010-Q1 10
2010-Q2 15
2010-Q3 18
2010-Q4 12
The new data set should look like the following
Month Value
2010-3 10
2010-4 11.67
2010-5 13.34
2010-6 15
2010-7 16
2010-8 17
2010-9 18
2010-10 16
2010-11 14
2010-12 12
Now, the months within each quarter are filled using the following formula
The first month of the quarter[i] = The previous quarter value [i-1] + ((The difference between the quarter [i] and [i-1])/3)
The second month of the quarter[i] = The previous quarter value [i-1] + 2*((The difference between the quarter [i] and [i-1])/3)
For example:
2020-Q1 = 10
2020-Q2 = 15
Difference/3 = 5/3
2020-April = 10 + diff
2020-May = 10 + 2*diff
2020-June = 15 (end of the quarter stays the same) or can be calculated as 10 + 3*diff
I am wondering how to generate a new variable that can break down the values as mentioned above.
Thanks
1) Convert the input to a zoo series z with yearqtr index (which directly represents year and quarter without month or day) and then pad out with NAs and apply na.approx to fill them in linearly giving Value. Assuming that the series is regularly spaced we can just convert the first index value to yearmon (which directly represents a year and month without day) using a frequency of 12 months per year. Finally, either leave it as Value or else use the last line to convert it back to data frame DF2. Another possibility would be to use as.ts(Value) to convert it to a ts series.
Note that yearmon class displays as shown below but represents year and month internally as year plus a fraction equal to 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec so as.integer(time(Value)) will give the year and cycle(time(Value)) will give the month number (Jan = 1, ..., Dec = 12).
library(zoo)
z <- read.zoo(DF, FUN = function(x) as.yearqtr(x, "%Y-Q%q"))
Value <- zooreg(na.approx(c(t(cbind(z, NA, NA)))),
start = as.yearmon(start(z)), freq = 12)
DF2 <- fortify.zoo(Value) # optional
giving:
> DF2
Index Value
1 Jan 2010 10.00000
2 Feb 2010 11.66667
3 Mar 2010 13.33333
4 Apr 2010 15.00000
5 May 2010 16.00000
6 Jun 2010 17.00000
7 Jul 2010 18.00000
8 Aug 2010 16.00000
9 Sep 2010 14.00000
10 Oct 2010 12.00000
Graphically it looks like this:
plot(Value, type = "o")
(continued after plot)
2) A second method starting with z from (1) is to first create the output yearmon time sequence tt, convert the time index of z to yearmon giving z.ym and then merge them generating NA's and finally apply na.approx to fill them in.
tt <- seq(as.yearmon(start(z)), as.yearmon(end(z)), 1/12)
z.ym <- aggregate(z, as.yearmon, c)
Value <- na.approx(merge(z.ym, zoo(, tt)))
Note
The input in reproducible form:
Lines <- "Quarter Value
2010-Q1 10
2010-Q2 15
2010-Q3 18
2010-Q4 12"
DF <- read.table(text = Lines, header = TRUE)
I am working on climate data analysis. After loading file in R, my interest is to subset data based upon hours in a day.
for time analysis we can use $hour with the variable in which time vector has been stored if our interest is to deal with hours.
I want to subset my data for each hour in a day for 365 days and then take an average of the data at a particular hour throughout the year. Say I am interested to take values of irradiation/wind speed etc at 12:OO PM for a year and then take mean of these values to get the desired result.
I know how to subset a data frame based upon conditions. If for example my data is in a matrix called data and contains 2 rows say time and wind speed and I'm interested to subset rows of data in which irradiationb isn't zero. We can do this using the following code
my_data <- subset(data, data[,1]>0)
but now in order to deal with hours values in time column which is a variable stored in data, how can I subset values?
My data look like this:
I hope I made sense in this question.
Thanks in advance!
Here is a possible solution. You can create a hourly grouping with format(df$time,'%H'), so we obtain only the hour for each period, we can then simply group by this new column and calculate the mean for each group.
df = data.frame(time=seq(Sys.time(),Sys.time()+2*60*60*24,by='hour'),val=sample(seq(5),49,replace=T))
library(dplyr)
df %>% mutate(hour=format(df$time,'%H')) %>%
group_by(hour) %>%
summarize(mean_val = mean(val))
To subset the non-zero values first, you can do either:
df = subset(df,val!=0)
or start the dplyr chain with:
df %>% filter(df$val!=0)
Hope this helps!
df looks as follows:
time val
1 2018-01-31 12:43:33 4
2 2018-01-31 13:43:33 2
3 2018-01-31 14:43:33 2
4 2018-01-31 15:43:33 3
5 2018-01-31 16:43:33 3
6 2018-01-31 17:43:33 1
7 2018-01-31 18:43:33 2
8 2018-01-31 19:43:33 4
... ... ... ...
And the output:
# A tibble: 24 x 2
hour mean_val
<chr> <dbl>
1 00 3.50
2 01 3.50
3 02 4.00
4 03 2.50
5 04 3.00
6 05 2.00
.... ....
This assumes your time column is already of class POSIXct, otherwise you'd first have to convert it using for example as.POSIXct(x,format='%Y-%m-%d %H:%M:%S')
I have a dataset, a daily timeseries and I want to arrange into a single column, this is my data:
Date Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 .... Day 31
01/01/1964 0 0 0 0 0 0 3
01/02/1964 NA NA NA NA NA NA ...
01/03/1964 195 445 329 121 61,6 44 ...
01/04/1964 17,2 14,9 17,1 102 54,3 9,33 ...
I want this:
Day1 0
Day2 0
.
.
.
Day31 3
I having problems because of leap years that have 366 days, i trying this, but no succes, thanks in advanced.
EDIT:
I finally got it, but if anyone knows a more easy way, using some package or function, I'm grateful. Or I'll create my own function.
EDIT 2:
Now I have a problem, when I not start in the first month of a year.
rm(list = ls())
cat("\014")
setwd("C:/")
require(XLConnect)
# Load Streamflow Gauging Station
wb <- loadWorkbook("rainfall.xls")
Data<- readWorksheet(wb, sheet = "rainfall",header = FALSE,region = "B02:AF517")
R<- Data; ##1964 - 2006
sum(R[is.na(R)==FALSE])
# Number of days in each month
Ny<- c(31,28,31,30,31,30,31,31,30,31,30,31); # Normal Year
Ly<- c(31,29,31,30,31,30,31,31,30,31,30,31); # Leap/bissextile Year
S1<- c(1,0,0,0) # Leap year, normal year...
S2<- c(0,1,0,0) # Normal year, leap year...
S3<- c(0,0,1,0) #...
S4<- c(0,0,0,1) #...
Iab<- rep(S1,times=ceiling((nrow(R)/12)/4)); # Index of years
Iab<- Iab[1:(nrow(R)/12)];
Rnew<- matrix(numeric(0), 0,0);
#Organize data in a only collumn
for(i in 1:(nrow(R)/12)){
for(j in 1:12){
if(Iab[i]==0){
Rnew<-c(Rnew, t(R[12*(i-1)+j,1:Ny[j]]))
}else{
Rnew<-c(Rnew, t(R[12*(i-1)+j,1:Ly[j]]))
}
}
}
sum(R[is.na(R)==FALSE])==sum(Rnew[is.na(Rnew)==FALSE]) #Test for succes of organize
sum(R[is.na(R)==FALSE])
sum(Rnew[is.na(Rnew)==FALSE])
I have a similar problem. However in a way even worse, since I have discharge data (Brasilian ANA station) with several interruptions of several month and years. Vazao01 stands for the discharge at the first day of the month, Vazao02 for the second and the data frame goes up to Vazao31 (which is obviously NA for month with less days, but can as well be NA for existing days without record). The data looks like this and is the data.frame "ANAday"
Date Vazao01 Vazao02 Vazao03...
20 01.05.1989 3463.00 3476.500 3463.000
21 01.06.1989 1867.70 1835.900 1809.400
22 01.07.1989 809.90 798.200 774.800
23 01.08.1989 344.60 308.700 297.900
24 01.11.1989 376.50 388.100 391.000
25 01.12.1989 279.00 289.800 319.500
26 01.01.1990 1715.00 1649.000 1573.200
27 01.02.1990 1035.20 1005.800 972.200
28 01.03.1990 2905.60 2962.100 NA
29 01.06.1990 NA NA NA
30 01.07.1990 297.90 284.400 271.200
31 01.08.1990 228.00 223.200 218.400
32 01.08.1999 NA NA 144.000
33 01.09.1999 20.74 18.620 16.500
34 01.10.1999 119.85 111.450 95.385
35 01.11.1999 11.20 23.705 48.370
36 01.12.1999 160.10 179.000 187.400
37 01.01.2000 843.00 865.300 914.500
38 01.02.2000 1331.30 1368.900 1387.800
39 01.04.2000 1823.60 1808.000 1789.800
40 01.05.2000 1579.00 1524.100 1445.700
I made a list of the month with data
ANAm=as.Date(ANAday[,1], format="%d.%m.%Y")
format(ANAm, format="%Y-%m")
Than I used the "monthDays" function of the Hmisc package to list the number of days in each month
require(Hmisc)
nodm=monthDays(ANAm)
Nodm=cbind.data.frame(ANAm,nodm)
I prepared a data.frame for the data I want to have with 3 columns for "YEAR MONTH", "DAY" and "DISCHARGE"
ANATS=array(NA,c(1,3))
colnames(ANATS)=c("mY","d","Q")
And used a simple "for" loop to extract the data into one column according to the number of days in each month
for(i in 1:nrow(Nodm)){
selectANA=as.vector(ANAd[i,1:(Nodm[i,2]) ])
selectANA=as.vector(t(selectANA))##to generate a simple vector
dayANA=c(1:(Nodm[i,2]))
monthANA=rep(format(as.Date(Nodm[i,1]),format="%Y-%m"),times=as.numeric(Nodm[i,2]))
ANAts=cbind(monthANA,dayANA,auswahlANA)
ANATS<<-rbind(ANATS,ANAts)
}
The ANATS can than be transferred into a timeseries:
combine.date=as.character(paste(ANATS[,1],ANATS[,2],sep="-"))
DATE=as.Date(combine.date, format="%Y-%m-%d")
rownames(ANATS)=as.character(DATE)
ANATS=ANATS[-1,]
ANAXTS=as.xts(ANATS)
Maybe I'm having trouble understanding exactly what you're looking for, but are you trying to transpose the data?
t(data)