I have a very large data frame with Timestamp, StationId and Value as column names.
I would like to create a new matrix where the rows are Timestamps, columns are StationIds and the matrix elements are Values.
I have tried doing so using a loop but it is taking very long:
for (row in 1:nrow(res))
{
rmatrix[toString(res[row,"Timestamp"]),toString(res[row,"StationId"])] <-
res[row,"Value"]
}
The 'res' data frame looks like this. The timestamps are for a year, at 5mins interval. There are 62 unique station ids. The elements in the Value column are actually rainfall values.
The rmatrix I'm trying to rearrange the data into looks like this. Each row is a unique timestamp at 5mins interval. Each column is the id of a station. The elements of the matrix are supposed to be the rainfall value for that station at that time.
Is there a faster way to do this?
library(tidyverse)
df <- res %>% spread(StationIds,Values)
Related
This question already has answers here:
Reshape multiple value columns to wide format
(5 answers)
Closed 4 months ago.
I have a data frame, in which data for different countries are listed vertically with this pattern:
country | time | value
I want to transform it in a data frame, in which each row is a specific time period, and every column is the value relative to that country. Data are monthly.
time | countryA-value | countryB-value |countryC-value
Moreover, not all periods are present, when data is missing, the row is just absent, and not filled with NA or similar. I thought to two possible solutions, but they seem too complicated and inefficient. I do not write here the code,
If the value in a cell of the column "time" is more than one month after the cell above, while the cells to the left are the same (i.e. the data pertains to the same country), then we have a gap. I have to fill the gap and to this recursively until all missing dates are included.
At this point I have for each country the same number of observations, and I can simply copy a number of cells equal to the number of observations.
Drawbacks: it does not seem very efficient.
I could create a list of time periods using the command
allDates <- seq.Date(from = as.Date('2020-02-01'), to = as.Date('2021-01-01'), by = 'month')-1)
Then I look up the table about each period of allDates for each subset of the table of each country. If the value exist, copy the value, if there is not, fill with NA.
Drawbacks: I have no idea of which function I could use to this purpose.
Below the code to create a small table with two missing rows, namely data2
data <- data.frame(matrix(NA, 24, 3))
colnames(data) <- c("date", "country", "value")
data["date"] <- rep((seq.Date(from = as.Date('2020-02-01'), to = as.Date('2021-01-01'), by = 'month')-1), 2)
data["country"] <- rep(c("US", "CA"), each = 12)
data["value"] <- round(runif(24, 0, 1), 2)
data2 <- data[c(-4,-5),]
I solved the problem following the suggestion of r2evans, I checked the function dcast, and I obtained exactly what I wanted.
I used the code
reshape2::dcast(dataFrame, yearMonth ~ country, fill = NA)
Where dataFrame is the name of the data frame, yearMonth is the name of the column, in which the date is written, and country is the name of the column, in which the country is written.
The option fill=NA allowed to fill all gaps in the data with NA.
I have 2 data frames of unequal lengths, each with a column of timestamps. I would like to return the corresponding ID contained in df2 to df1 as a new column if the time difference is less than 60 minutes, meaning that so I know ID#1 in df2 with a specific appointment time are responsible for some of the entries in df1. Each ID should have 8 entries in df1.
To calculate the difference between each element in df1 and df2, I've tried
outer(df1$DataEntryTime, df2$ApptTime, '-')
and got a matrix of results.
enter image description here
What do I need to do next to build a conditional statement so it can return the ID# to df1 based on the results?
Many thanks!
I have a data frame (size:36 rows, 2000 columns), which in the last row it contains dates for each column in this format "YYYY-MM-DD".
How can I sort the columns using the dates in the last row?
My attempts so far:
df[order(as.Date(df["Dates",], format="%Y-%m-%d")),]
df[order(lubridate::ymd(df["Dates",])),]
Thanks
We can extract the last row and do the order
df1[order(as.Date(unlist(tail(df1, 1))))]
I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}
I have a time series data frame (1 entry per min) which has a date column, and I have made a list of all the unique dates in the column. The purpose of this is so I can perform functions on the data for one day at a time. So I want to have a loop where I take all the rows where the date is equal to the first date on my unique dates list, work with this, output, then same for second unique date, third, fourth etc.
When I use the following code I am getting zero rows returned; days and df$Date both are factors.
df <- read.csv("file1.csv")
days <- as.list(unique(df$Date))
for(i in 1:length(days)){
df <-df[(df$Date == days[i]),] #Only from this date
# Further work with data here
}