I have two CSV data sets. The first on is monthly data or a certain variable(PM_2.5) given by spacial constrains (latitude and longitude) which can be seen as the place variables. The second data frame contains different start and end dates for the observation. Also, those are given under the same spacial constraint and for each individual v1. You can see the data structure in the pictures. enter image description here enter image description here
I want to sum all observations of (PM_2.5) for one individual (ID) over the observation period (start date to end date) given the constraint that the geospatial identification (latitude, longitude) is the same.
Thanks a lot for your help.
Best,
Luise
I have time series data with a repeating pattern that I wish to label/group as separate factor levels each instance it occurs. The example below uses a sequence of 1:100, but my real data is based on a time series: day-of-the-year. I've used a conditional test in my data to identify each instance that the pattern is present - this is simulated with a simple TRUE/FALSE column below (In my real data I'm looking for each instance where a trend line goes outside a particular threshold (standard-deviation)).
From the data below, is there a way to group each set of 'TRUE' values? Perhaps a way to identify the start/stop of each TRUE sequence (rows 26:50, and 76:100) and somehow create two separate group factor levels?
I've attached an image below to illustrate my end goal of trying to identify areas of a curve that exceed a particular boundary.
data = data.frame(order = seq(1,100),
test = rep(c('False','True'), each = 25))
plot(data, col = data.table::rleid(data[,2]))
I have a set of true/false data I need to prepare for a chi-squared analysis in R. Currently it's organized by time of day in several lists. What would be the best way to add a variable to each of these lists for time of day, fill in each list's points with the time they were collected, then combine them into one table?
My data contains 29 different ID with each ID containing sub ID with their measurement value, upper bound value and lower bound value, time stamp.
I am trying to create a ggplot with points, where I could filter for one ID and plot their sub ID with measurement value, lower bound and upperbound value.
I am using the below, code, helped by satckoverflow, but anyone could how I could incorporate the filter function here.
My data looks like this. for sample
data<- data.frame(c(ST_10,ST_11,ST_10,ST_10,ST_11,ST_10) , c(M1,M3,M2,M5,M7,M9), c(0.3,0.5,1.98,198),c(0.2,0.4,1.98,199),c(0.1,0.3,1.0,190) )
this is just a sample, and I am using the below code.
ggplot(data)+
geom_jitter(aes(x=Day.ID., y=Measurement.Value, col=as.factor(Day.ID.)))+
geom_line(aes(x=Day.ID., y=LSL.ID), colour='orange')+
geom_line(aes(x=Day.ID., y=USL.ID), colour='orange')+
facet_wrap(~DESC)
according to the code, data is my data name, day is the timestamp, LSL is lower bound, USL is upperbound, and Desc is the sub ID
I have a large data frame with around 190000 rows. The data frame has a label column storing 12 nominal categories. I want to change the weight column value of each row based on the label value of that row. For example, if the label of a row is "Res", I want to change its weight field value to 0.5 and if it is "Condo", I want to change its weight value to 2.
I know it is easy to implement this by if else statement but given the number of rows, the processing time takes so much long. I wanted to use cut() but it seems that cut categorizes based on intervals not nominal categories. I would appreciate any suggestion that can decrease the processing time.