I am new in using R and I have a question for which I am trying to find the answer. I have a file organized as follows (it has thousands of rows but I just show a sample for simplicity):
YEAR Month day S1 T1 T2 R1
1965 3 2 11.7 20.6 11.1 18.8
1965 3 3 14.0 16.7 3.3 0.0
1965 3 4 -99.9 -99.9 -99.9 -99.9
1965 3 5 9.2 5.6 0.0 -99.9
1965 3 6 10.1 6.7 0.0 -99.9
1965 3 7 9.7 7.2 1.1 0.0
I would like to know for each column (T1, T2, and R1) in which Year, Month and Day the -99.9 are located; e.g. from 1980/1/3 to 1980/1/27 there are X -99.9 for T1, from 1990/2/10 to 1990/3/30 there are Y-99.9 for T1....and so on. Same for T2, and R.
How can do this in R?
This is only one file like this but I have almost 2000 files with the same problem (I know I need to loop it) but if I know how to do it for one file then I will just create a loop.
I really appreciate any help. Thank you very much in advance for reading and helping!!!
I took the liberty of renaming your last dataframe column "R1"
lapply(c('T1', 'T2', 'R1'), function(x) { dfrm[ dfrm[[x]]==-99.9 , # rows to select
1:3 ] }# columns to return
)
#-------------
[[1]]
YEAR Month day
3 1965 3 4
[[2]]
YEAR Month day
3 1965 3 4
[[3]]
YEAR Month day
3 1965 3 4
4 1965 3 5
5 1965 3 6
It wasn't clear whether you wanted the values or counts (and I don't think you can have both in the same report.) If you wanted to name the entries:
> misdates <- .Last.value
> names(misdates) <- c('T1', 'T2', 'R1')
If you wanted a count:
lapply(misdates, NROW)
$T1
[1] 1
$T2
[1] 1
$R1
[1] 3
(You might want to learn how to use NA values. Using numbers as missing values is not recommended data management.)
If I understand correctly, you want to obtain how many "-99.9"s you get per month AND by column,
Here's my code for S1 using plyr. You'll note that I expanded your example to get one more month of data.
library(plyr)
my.table <-read.table(text="YEAR Month day S1 T1 T2 R1
1965 3 2 11.7 20.6 11.1 18.8
1965 3 3 14.0 16.7 3.3 0.0
1965 3 4 -99.9 -99.9 -99.9 -99.9
1965 3 5 9.2 5.6 0.0 -99.9
1965 3 6 10.1 6.7 0.0 -99.9
1965 3 7 9.7 7.2 1.1 0.0
1966 1 7 -99.9 7.2 1.1 0.0
1966 1 8 -99.9 7.2 1.1 0.0
", header=TRUE, as.is=TRUE,sep = " ")
#Create a year/month column to summarise per month
my.table$yearmonth <-paste(my.table$YEAR,"/",my.table$Month,sep="")
S1 <-count(my.table[my.table$S1==-99.9,],"yearmonth")
S1
yearmonth freq
1 1965/3 1
2 1966/1 2
Related
I currently have 2 data frames.
df_A contains a multiplication factor for everyday of the year. e.g.,
day
multiplication factor
Jan 1
0.004
Jan 2
0.007
Jan 3
0.005
Jan 4
0.007
df_B contains yearly volumes across 5 different categories
Category
2022
2023
2024
2025
cat 1
1000
xx
xx
cat 2
xx
xx
xx
xx
cat 3
xx
xx
xx
xx
cat 4
xx
xx
xx
xx
cat 5
xx
xx
xx
xx
I require a new data frame that multiples (1) the volume form each year (2) for each category, by the (3) multiplication factor for each day.
e.g for jan 1, cat 1, 2022
1000 * 0.004
The final data frame will look something like:
Day
category
2022
2023
2024
jan 1
cat 1
1000 * 0.04
jan 1
cat 2
xx * 0.04
jan 1
cat 3
jan1
cat 4
jan 1
cat 5
jan 2
cat 1
jan 2
cat 2
jan 2
cat 3
Been trying various for loops but does seem to crack it
You can cross join the two dataframes by using merge to create a cartesian join. After this you can use dplyr to multiply your year columns to the multiplication factor.
Here is a working example, you can rearrange the resulting columns if you want:
df_A <- readr::read_table(
"day multiplication_factor
Jan_1 0.004
Jan_2 0.007
Jan_3 0.005
Jan_4 0.007"
)
df_B <- readr::read_table(
"Category 2022 2023 2024 2025
cat_1 1000 1200 1400 1600
cat_2 600 800 1000 1200"
)
library(dplyr)
merge(df_A, df_B) %>%
mutate(across(starts_with("20"), ~ .x * multiplication_factor))
#> day multiplication_factor Category 2022 2023 2024 2025
#> 1 Jan_1 0.004 cat_1 4.0 4.8 5.6 6.4
#> 2 Jan_2 0.007 cat_1 7.0 8.4 9.8 11.2
#> 3 Jan_3 0.005 cat_1 5.0 6.0 7.0 8.0
#> 4 Jan_4 0.007 cat_1 7.0 8.4 9.8 11.2
#> 5 Jan_1 0.004 cat_2 2.4 3.2 4.0 4.8
#> 6 Jan_2 0.007 cat_2 4.2 5.6 7.0 8.4
#> 7 Jan_3 0.005 cat_2 3.0 4.0 5.0 6.0
#> 8 Jan_4 0.007 cat_2 4.2 5.6 7.0 8.4
I want to convert long time series data of temperature, rainfall to monthly and want to interpolate and compute the temporal trend for a country using 90 meteorological stations.
Data format is like
Year Month Day Rain MaxT MinT
1 1970 1 1 0.0 23.0 -99.9
2 1970 1 2 0.0 23.0 -99.9
3 1970 1 3 0.0 23.0 -99.9
4 1970 1 4 0.0 24.0 -99.9
5 1970 1 5 0.0 23.0 -99.9
6 1970 1 6 0.0 23.0 -99.9
7 1970 1 7 0.0 23.0 -99.9
I am trying to read in a time series and do a plot.ts(), however I am getting weird results. Perhaps I did something wrong. I tried including the start and end dates but the output is still wrong.
Any help appreciated. Thank you.
This is the code and output:
sales1 <- read.csv("TimeS.csv",header=TRUE)
sales1
salesT <- ts(sales1)
salesT
plot.ts(salesT)
output:
> sales1 <- read.csv("TimeS.csv",header=TRUE)
> sales1
year q1 q2 q3 q4
1 1991 4.8 4.1 6.0 6.5
2 1992 5.8 5.2 6.8 7.4
3 1993 6.0 5.6 7.5 7.8
4 1994 6.3 5.9 8.0 8.4
> salesT <- ts(sales1)
> salesT
Time Series:
Start = 1
End = 4
Frequency = 1
year q1 q2 q3 q4
1 1991 4.8 4.1 6.0 6.5
2 1992 5.8 5.2 6.8 7.4
3 1993 6.0 5.6 7.5 7.8
4 1994 6.3 5.9 8.0 8.4
> plot.ts(salesT)
It looks like I can't paste the plot. instead of 1 graph it has 5 separate
plots stacked onto each other.
Try this
salesT<-ts(unlist(t(sales1[,-1])),start=c(1991,1),freq=4)
The format of the original data is difficult to use directly for a time series. You could try this instead:
sales1 <- t(sales1[,-1])
sales1 <- as.vector(sales1)
my_ts <- ts(sales1, frequency = 4, start=c(1991,1))
plot.ts(my_ts)
Here I think you need to format it correctly try this:
salesT <- ts(sales1)
ts.plot(salesT, frequency = 4, start = c(1991, 1), end = c(1994, 4)))
This line is making the times into one of the series which is unlikely what you want:
> salesT <- ts(sales1)
We need to transpose the data frame in order that it reads across the rows rather than down and we use c to turn the resulting matrix into a vector forming the data portion of the series. (continued after chart)
# create sales1
Lines <- "year q1 q2 q3 q4
1 1991 4.8 4.1 6.0 6.5
2 1992 5.8 5.2 6.8 7.4
3 1993 6.0 5.6 7.5 7.8
4 1994 6.3 5.9 8.0 8.4"
sales1 <- read.table(text = Lines, header = TRUE)
# convert to ts and plot
salesT <- ts(c(t(sales1[-1])), start = sales1[1, 1], freq = 4)
plot(salesT)
Regarding the comment, if the data looks like this then it is more straight forward and the lines below will produce the above plot. We have assumed that the data is sorted and starts at the bginning of a year so we do not need to use the second column:
Lines2 <- "year qtr sales
1 1991 q1 4.8
2 1991 q2 4.1
3 1991 q3 6.0
4 1991 q4 6.5
5 1992 q1 5.8
6 1992 q2 5.2
7 1992 q3 6.8
8 1992 q4 7.4
9 1993 q1 6.0
10 1993 q2 5.6
11 1993 q3 7.5
12 1993 q4 7.8
13 1994 q1 6.3
14 1994 q2 5.9
15 1994 q3 8.0
16 1994 q4 8.4"
sales2 <- read.table(text = Lines2, header = TRUE)
salesT2 <- ts(sales2$sales, start = sales2$year[1], freq = 4)
plot(salesT2)
Update fixed. Added response to comments.
I'd like to calculate monthly temperature anomalies on a time-series with several stations.
I call here "anomaly" the difference of a single value from a mean calculated on a period.
My data frame looks like this (let's call it "data"):
Station Year Month Temp
A 1950 1 15.6
A 1980 1 12.3
A 1990 2 11.4
A 1950 1 15.6
B 1970 1 12.3
B 1977 2 11.4
B 1977 4 18.6
B 1980 1 12.3
B 1990 11 7.4
First, I made a subset with the years comprised between 1980 and 1990:
data2 <- subset(data, Year>=1980& Year<=1990)
Second, I used plyr to calculate monthly mean (let's call this "MeanBase") between 1980 and 1990 for each station:
data3 <- ddply(data2, .(Station, Month), summarise,
MeanBase = mean(Temp, na.rm=TRUE))
Now, I'd like to calculate, for each line of data, the difference between the corresponding MeanBase and the value of Temp... but I'm not sure to be in the right way (I don't see how to use data3).
You can use ave in base R to get this.
transform(data,
Demeaned=Temp - ave(replace(Temp, Year < 1980 | Year > 1990, NA),
Station, Month, FUN=function(t) mean(t, na.rm=TRUE)))
# Station Year Month Temp Demeaned
# 1 A 1950 1 15.6 3.3
# 2 A 1980 1 12.3 0.0
# 3 A 1990 2 11.4 0.0
# 4 A 1950 1 15.6 3.3
# 5 B 1970 1 12.3 0.0
# 6 B 1977 2 11.4 NaN
# 7 B 1977 4 18.6 NaN
# 8 B 1980 1 12.3 0.0
# 9 B 1990 11 7.4 0.0
The result column will have NaNs for Month-Station combinations that have no years in your specified range.
Hi
i have a 10 year, 5 minutes resolution data set of dust concentration
and i have seperetly a 15 year data set with a day resolution of the synoptic clasification
how can i combine these two datasets they are not the same length or resolution
here is a sample of the data
> head(synoptic)
date synoptic
1 01/01/1995 8
2 02/01/1995 7
3 03/01/1995 7
4 04/01/1995 20
5 05/01/1995 1
6 06/01/1995 1
>
head(beit.shemesh)
X........................ StWd SHT PRE GSR RH Temp WD WS PM10 CO O3
1 NA 64 19.8 0 -2.9 37 15.2 61 2.2 241 0.9 40.6
2 NA 37 20.1 0 1.1 38 15.2 344 2.1 241 0.9 40.3
3 NA 36 20.2 0 0.7 39 15.1 32 1.9 241 0.9 39.4
4 NA 52 20.1 0 0.9 40 14.9 20 2.1 241 0.9 38.7
5 NA 42 19.0 0 0.9 40 14.6 11 2.0 241 0.9 38.7
6 NA 75 19.9 0 0.2 40 14.5 341 1.3 241 0.9 39.1
No2 Nox No SO2 date
1 1.4 2.9 1.5 1.6 31/12/2000 24:00
2 1.7 3.1 1.4 0.9 01/01/2001 00:05
3 2.1 3.5 1.4 1.2 01/01/2001 00:10
4 2.7 4.2 1.5 1.3 01/01/2001 00:15
5 2.3 3.8 1.5 1.4 01/01/2001 00:20
6 2.8 4.3 1.5 1.3 01/01/2001 00:25
any idea's
Make an extra column for calculating the dates, and then merge. To do this, you have to generate a variable in each dataframe bearing the same name, hence you first need some renaming. Also make sure that the merge column you use has the same type in both dataframes :
beit.shemesh$datetime <- beit.shemesh$date
beit.shemesh$date <- as.Date(beith.shemesh$datetime,format="%d/%m/%Y")
synoptic$date <- as.Date(synoptic$date,format="%d/%m/%Y")
merge(synoptic, beit.shemesh,by="date",all.y=TRUE)
Using all.y=TRUE keeps the beit.shemesh dataset intact. If you also want empty rows for all non-matching rows in synoptic, you could use all=TRUE instead.