How to add a date to each row for a column in a data frame? - r

df <- data.frame(DAY = character(), ID = character())
I'm running a (for i in DAYS[i]) and get IDs for each day and storing them in a data frame
df <- rbind(df, data.frame(ID = IDs))
I want to add the DAY[i] in a second column across each row in a loop.
How do I do that?

As #Pascal says, this isn't the best way to create a data frame in R. R is a vectorised language, so generally you don't need for loops.
I'm assuming each ID is unique, so you can create a vector of IDs from 1 to 10:
ID <- 1:10
Then, you need a vector for your DAYs which can be the same length as your IDs, or can be recycled (i.e. if you only have a certain number of days that are repeated in the same order you can have a smaller vector that's reused). Use c() to create a vector with more than one value:
DAY <- c(1, 2, 9, 4, 4)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 1
# 2 2 2
# 3 3 9
# 4 4 4
# 5 5 4
# 6 6 1
# 7 7 2
# 8 8 9
# 9 9 4
# 10 10 4
Or with a vector for DAY that includes unique values:
DAY <- sample(1:100, 10, replace = TRUE)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 61
# 2 2 30
# 3 3 32
# 4 4 97
# 5 5 32
# 6 6 74
# 7 7 97
# 8 8 73
# 9 9 16
# 10 10 98

Related

Expand data frame and and add rowsums from another dataframe

I am trying to find a faster way of accomplishing the following code since my actual dataset is very large. I would like to get rid of the for loop altogether. I am trying to duplicate each row in xdf into a new data frame based on the number of columns in values. Then, next to each entry in the new dataset, show the row sums from column 1 in values up to the column j.
xdf <- data_frame(
x = c('a', 'b', 'c'),
y = c(4, 5, 6),
)
values <- data_frame(
col_1 = c(5, 9, 1),
col_2 = c(4, 7, 6),
col_3 = c(1, 5, 2),
col_4 = c(7, 8, 5)
)
for (j in seq(ncol(values))){
if (j==1){
Temp <- cbind(xdf, z= rowSums(values[1:j]))
}
else{
Temp <- rbind(Temp, cbind(xdf, z= rowSums(values[1:j])))
}
}
print(Temp)
The output should be:
x y z
1 a 4 5
2 b 5 9
3 c 6 1
4 a 4 9
5 b 5 16
6 c 6 7
7 a 4 10
8 b 5 21
9 c 6 9
10 a 4 17
11 b 5 29
12 c 6 14
Is there a shorter way to accomplish this?
This is the closest answer that I could get on SO.
How to expand data frame based on values?
I am new to R, so sorry for the longwinded code.
Here's one base R option :
Repeat the rows in xdf as there are number of columns in values, iteratively increment one column at a time to find rowSums and add it as a new column in the final dataframe.
newdf <- xdf[rep(seq(nrow(xdf)), ncol(values)), ]
newdf$z <- c(sapply(seq(ncol(values)), function(x) rowSums(values[1:x])))
newdf
# A tibble: 12 x 3
# x y z
# <chr> <dbl> <dbl>
# 1 a 4 5
# 2 b 5 9
# 3 c 6 1
# 4 a 4 9
# 5 b 5 16
# 6 c 6 7
# 7 a 4 10
# 8 b 5 21
# 9 c 6 9
#10 a 4 17
#11 b 5 29
#12 c 6 14
A concise one-liner as suggested by #sindri_baldur doesn't require repeating the rows explicitly.
cbind(xdf, z = c(sapply(seq(ncol(values)), function(x) rowSums(values[1:x]))))

Assign a sequence of numbers to data frame rows

I am trying to assign a sequence of numbers to rows in a data frame based off of the row position. I have 330 rows and I want each set of nine rows to be named one through nine in a new ID column. For example, I want rows 1-9 labeled 1-9, rows 10-18 labeled 1-9, rows 19-27 labeled 1-9 and so on.
I have tried to use this code:
test <- temp.df %>% mutate(id = seq(from = 1, to = 330, along.width= 9))
test
but it just ends up just creating a new column that labels the rows 1-330 as shown below.
Time Temperature ID
09:36:52 25.4 1
09:36:59 25.4 2
09:37:07 25.4 3
09:37:14 25.4 4
09:37:21 25.4 5
09:37:29 25.4 6
09:37:36 25.4 7
09:37:43 25.5 8
09:37:51 25.5 9
09:37:58 25.5 10
What is the best way to accomplish my goal?
I think if you could provide a snippet of the data.frame temp.df, it would be easier to help you out. Maybe the following line could help you by adding it to your data.frame, however, it is not a very flexible solution, but it is based on the information you provided.
n_repeated <- 9 #block of ID
N_rows <- 330 #number of observations
df <- data.frame(id = rep(seq(1,n_repeated ),N_rows ))
head(df,n = 15)
#> head(df,n = 15)
# id
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6
# 7 7
# 8 8
# 9 9
# 10 1
# 11 2
# 12 3
# 13 4
# 14 5
# 15 6
[Edited]
using mutate from dplyr this line should do it:
test <- temp.df %>% mutate(id = rep(seq(1,9), nrow(temp.df)))

Moving down columns in data frames in R

Suppose I have the next data frame:
df<-data.frame(step1=c(1,2,3,4),step2=c(5,6,7,8),step3=c(9,10,11,12),step4=c(13,14,15,16))
step1 step2 step3 step4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
and what I have to do is something like the following:
df2<-data.frame(col1=c(1,2,3,4,5,6,7,8,9,10,11,12),col2=c(5,6,7,8,9,10,11,12,13,14,15,16))
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
How can I do that? consider that more steps can be included (example, 20 steps).
Thanks!!
We can design a function to achieve this task. df_final is the final output. Notice that bin is an argument that the users can specify how many columns to transform together.
# A function to conduct data transformation
trans_fun <- function(df, bin = 3){
# Calculate the number of new columns
new_ncol <- (ncol(df) - bin) + 1
# Create a list to store all data frames
df_list <- lapply(1:new_ncol, function(num){
return(df[, num:(num + bin - 1)])
})
# Convert each data frame to a vector
dt_list2 <- lapply(df_list, unlist)
# Convert dt_list2 to data frame
df_final <- as.data.frame(dt_list2)
# Set the column and row names of df_final
colnames(df_final) <- paste0("col", 1:new_ncol)
rownames(df_final) <- 1:nrow(df_final)
return(df_final)
}
# Apply the trans_fun
df_final <- trans_fun(df)
df_final
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
Here is a method using dplyr and reshape2 - this assumes all of the columns are the same length.
library(dplyr)
library(reshape2)
Drop the last column from the dataframe
df[,1:ncol(df)-1]%>%
melt() %>%
dplyr::select(col1=value) -> col1
Drop the first column from the dataframe
df %>%
dplyr::select(-step1) %>%
melt() %>%
dplyr::select(col2=value) -> col2
Combine the dataframes
bind_cols(col1, col2)
This should do the work:
df2 <- data.frame(col1 = 1:(length(df$step1) + length(df$step2)))
df2$col1 <- c(df$step1, df$step2, df$step3)
df2$col2 <- c(df$step2, df$step3, df$step4)
Things to point:
The important thing to see in the first line of the code, is the need for creating a table with the right amount of rows
Calling a columns that does not exist will create one, with that name
Deleting columns in R should be done like this df2$col <- NULL
Are you not just looking to do:
df2 <- data.frame(col1 = unlist(df[,-nrow(df)]),
col2 = unlist(df[,-1]))
rownames(df2) <- NULL
df2
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16

R - compare rows consecutively in two data frames and return a value

I have the following two data frames:
df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"))
df2 <- data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"),
max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"),
group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))
I would like to:
Consecutively for each row, check if the value in the month column in df1 matches that in the period column of df2, i.e. df1$month == df2$period.
If step 1 is not TRUE, i.e. df1$month != df2$period, then repeat step 1 and compare the value in df1 with the value in the next row of df2, and so forth until df1$month == df2$period.
If df1$month == df2$period, check if the value in the temp column of df1 is less than or equal to that in the max_temp column of df2, i.e. df1$temp <= df$max_temp.
If df1$temp <= df$max_temp, return value in that row for the group column in df2 and add this value to df1, in a new column called "new_group".
If step 3 is not TRUE, i.e. df1$temp > df$max_temp, then go back to step 1 and compare the same row in df1 with the next row in df2.
An example of the output data frame I'd like is:
df3 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"),
new_group=c("1","1","1","2","3","4","4","5","6","7","7","8"))
I've been playing around with the ifelse function and need some help or re-direction. Thanks!
I found the procedure for computing new_group hard to follow as stated. As I understand it, you're trying to create a variable called new_group in df1. For row i of df1, the new_group value is the group value of the first row in df2 that:
Is indexed i or higher
Has a period value matching df1$month[i]
Has a max_temp value no less than df1$temp[i]
I approached this by using sapply called on the row indices of df1:
fxn = function(idx) {
# Potentially matching indices in df2
pm = idx:nrow(df2)
# Matching indices in df2
m = pm[df2$period[pm] == df1$month[idx] &
as.numeric(as.character(df1$temp[idx])) <=
as.numeric(as.character(df2$max_temp[pm]))]
# Return the group associated with the first matching index
return(df2$group[m[1]])
}
df1$new_group = sapply(seq(nrow(df1)), fxn)
df1
# month temp new_group
# 1 1 10 1
# 2 1 15 1
# 3 1 16 1
# 4 1 25 2
# 5 2 13 3
# 6 2 17 4
# 7 2 20 4
# 8 3 5 5
# 9 3 16 6
# 10 3 25 7
# 11 3 30 7
# 12 3 37 8
library(data.table)
dt1 <- data.table(df1, key="month")
dt2 <- data.table(df2, key="period")
## add a row index
dt1[, rn1 := seq(nrow(dt1))]
dt3 <-
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1")
## Keep only the columns you want
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE]
month temp max_temp new_group
1: 1 1 19 1
2: 1 3 19 1
3: 1 4 19 1
4: 1 7 19 1
5: 2 2 1 3
6: 2 5 1 3
7: 2 6 1 4
8: 3 10 18 5
9: 3 4 18 5
10: 3 7 18 5
11: 3 8 18 5
12: 3 9 18 5

Keep columns of a data frame based on a data frame

I have a data frame, called df, which contains 4000 values. I have a list of 1000 column numbers, in a data frame called list, which is 1000 rows by 1 column. How can I keep the rows with the numbers in list in the data frame df and throw the rest out. I already tried using:
listv <- as.vector(list)
and then using
dfnew <- df[,listv]
but I get the error
Error in .subset(x, j) : invalid subscript type 'list'
You're mixing up rows and columns subsetting. Here is a minimal example:
df <- data.frame(matrix(1:21, ncol = 3))
df
# X1 X2 X3
# 1 1 8 15
# 2 2 9 16
# 3 3 10 17
# 4 4 11 18
# 5 5 12 19
# 6 6 13 20
# 7 7 14 21
list <- data.frame(V1 = c(1, 4, 6))
list
# V1
# 1 1
# 2 4
# 3 6
df[list[, 1], ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
df[unlist(list), ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
Note also that as.vector(list) doesn't create a vector, as you thought it would. You need unlist here (as I used in the last example).

Resources