Assign a sequence of numbers to data frame rows - r

I am trying to assign a sequence of numbers to rows in a data frame based off of the row position. I have 330 rows and I want each set of nine rows to be named one through nine in a new ID column. For example, I want rows 1-9 labeled 1-9, rows 10-18 labeled 1-9, rows 19-27 labeled 1-9 and so on.
I have tried to use this code:
test <- temp.df %>% mutate(id = seq(from = 1, to = 330, along.width= 9))
test
but it just ends up just creating a new column that labels the rows 1-330 as shown below.
Time Temperature ID
09:36:52 25.4 1
09:36:59 25.4 2
09:37:07 25.4 3
09:37:14 25.4 4
09:37:21 25.4 5
09:37:29 25.4 6
09:37:36 25.4 7
09:37:43 25.5 8
09:37:51 25.5 9
09:37:58 25.5 10
What is the best way to accomplish my goal?

I think if you could provide a snippet of the data.frame temp.df, it would be easier to help you out. Maybe the following line could help you by adding it to your data.frame, however, it is not a very flexible solution, but it is based on the information you provided.
n_repeated <- 9 #block of ID
N_rows <- 330 #number of observations
df <- data.frame(id = rep(seq(1,n_repeated ),N_rows ))
head(df,n = 15)
#> head(df,n = 15)
# id
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6
# 7 7
# 8 8
# 9 9
# 10 1
# 11 2
# 12 3
# 13 4
# 14 5
# 15 6
[Edited]
using mutate from dplyr this line should do it:
test <- temp.df %>% mutate(id = rep(seq(1,9), nrow(temp.df)))

Related

Create new columns using mutate and across beside existing columns in R

I have the following sample data where I am trying to the new columns to be directly beside the existing columns and not at the end of the data frame. I do not want to use sort as I need to keep the order.
library (dplyr)
df <- data.frame(data_in= 2:10, #Data frame
data_ft= 3:11,
data_mile= 4:12)
df`
data_in data_ft data_mile
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
8 9 10 11
9 10 11 12
df_new<- df%>%
mutate(across(contains("in"), #Why this does not work?
~cbind(.x * 25.4),
.names = "{sub('in', 'mm', col)}")) # ETC
How can I let the new columns be directly beside the existing columns they came from and NOT at the end of the data frame? Also I do NOT want to use sort as I have many columns and need to maintain the order of the data frame.
I'm new to R so please bare with me.
I have tried using add_column as well.
I'm expecting the data frame to look like this:
data_in data_mm data_ft data_cm data_mile data_km
1 25.4 2 60.96 4 6.4
This gets the desired order:
df <- data.frame(data_in= 2:10, #Data framedata_ft= 3:11,data_mile= 4:12)
df<- df %>%mutate(across(everything(), ~ cbind(., .*2))) #Works like this
df
But when I used contains() and names() in the first code above, the outcome is different.
You can specify the position by setting the .after argument in mutate. See documentation here: https://dplyr.tidyverse.org/reference/mutate.html
df_new <- df %>%
mutate(across(contains("in"),
~ .x * 25.4,
.names = "{sub('in', 'mm', col)}"), .after = 1)
This will produce the desired output:
> df_new
data_in data_mm data_ft data_mile
1 2 50.8 3 4
2 3 76.2 4 5
3 4 101.6 5 6
4 5 127.0 6 7
5 6 152.4 7 8
6 7 177.8 8 9
7 8 203.2 9 10
8 9 228.6 10 11
9 10 254.0 11 12

multiple condition then creating new column

I have a dataset with two columns, I need to create a third column carries conditions on first one and second one.
set.seed(1)
x1=(sample(1:10, 100,replace=T))
y1=sample(seq(1,10,0.1),100,replace=T)
z=cbind(x1,y1)
unique(as.data.frame(z)$x1)
z%>%as.data.frame()%>%dplyr::filter(x1==3)
table(x1)
1 2 3 4 5 6 7 8 9 10
7 6 11 14 14 5 11 15 11 6
> z%>%as.data.frame()%>%dplyr::filter(x1==3)
x1 y1
1 3 6.9
2 3 9.5
3 3 10.0
4 3 5.6
5 3 4.1
6 3 2.5
7 3 5.3
8 3 9.5
9 3 5.5
10 3 8.9
11 3 1.2
for example when I filter x==3 then y1 values can be seen, I need to write 1 on 11th row, rest will be 0. I need to find a minimum in that column. My original dataset has 43545 rows but only 638 unique numbers like x1. table x1 shows that 1 repeated 7 times but in my dataset some have a frequency of 1 some have frequency of 100. I should use case_when but how can I check every y1 to find the smallest to put 1?
If I understand correctly, you are looking for the row with minimal y1 value for each value of x1
library(tidyverse)
z %>% as.data.frame() %>%
group_by(x1) %>%
arrange(y1) %>% # sort values by increasing order within each group
mutate(flag = ifelse(row_number()==1,1,0)) %>% # create flag for first row in group
ungroup()

How to create a function which loops through column index numbers in R?

Consider the following data frame (df):
"id" "a1" "b1" "c1" "not_relevant" "p_a1" "p_b1" "p_c1"
a 2 6 0 x 2 19 12
a 4 2 7 x 3.5 7 11
b 1 9 4 x 7 1.5 4
b 7 5 11 x 8 12 5
I would like to create a new column which shows the sum of the product between two corresponding columns. To write less code I address the columns by their index number. Unfortunately I have no experience in writing functions, so I ended up doing this manually, which is extremely tedious and not very elegant.
Here a reproducible example of the data frame and what I have tried so far:
id <- c("a","a","b","b")
df <- data.frame(id)
df$a1 <- as.numeric((c(2,4,1,7)))
df$b1 <- as.numeric((c(6,2,9,5)))
df$c1 <- as.numeric((c(0,7,4,11)))
df$not_relevant <- c("x","x","x","x")
df$p_a1 <- as.numeric((c(2,3.5,7,8)))
df$p_b1 <- as.numeric((c(19,7,1.5,12)))
df$p_c1 <- as.numeric((c(12,11,4,5)))
require(dplyr)
df %>% mutate(total = .[[2]]*.[[6]] + .[[3]] *.[[7]]+ .[[4]] *.[[8]])
This leads to the desired result, but as I mentioned is not very efficient:
"id" "a1" "b1" "c1" "not_relevant" "p_a1" "p_b1" "p_c1" "total"
a 2 6 0 x 2 19 12 118.0
a 4 2 7 x 3.5 7 11 105.0
b 1 9 4 x 7 1.5 4 36.5
b 7 5 11 x 8 12 5 171.0
The real data I am working with has much more columns, so I would be glad if someone could show me a way to pack this operation into a function which loops through the column index numbers and matches the correct columns to each other.
Column indices are not a good way to do this. (Not a good way in general...)
Here's a simple dplyr method that assumes the columns are in the correct corresponding order (that is, it will give the wrong result if the "x1", "x2", "x3" is in a different order than "p_x3", "p_x2", "p_x1"). You may also need to refine the selection criteria for your real data:
df$total = rowSums(select(df, starts_with("x")) * select(df, starts_with("p_")))
df
# id x1 x2 x3 not_relevant p_x1 p_x2 p_x3 total
# 1 a 2 6 0 x 2.0 19.0 12 118.0
# 2 a 4 2 7 x 3.5 7.0 11 105.0
# 3 b 1 9 4 x 7.0 1.5 4 36.5
# 4 b 7 5 11 x 8.0 12.0 5 171.0
The other good option would be to convert your data to a long format, where you have a single x column and a single p column, with an "index" column indicating the 1, 2, 3. Then the operation could be done by group, finally moving back to a wide format.

calculating column sum for certain row

I am trying to calculate column sum of per 5 rows for each row, in R using the following code:
df <- data.frame(count=1:10)
for (loop in (1:nrow(df)))
{df[loop,"acc_sum"] <- sum(df[max(1,loop-5):loop,"count"])}
But I don't like the explicit loop here, how can I modify it? Thanks.
According to your question, your desired result is:
df
# count acc_sum
# 1 1 1
# 2 2 3
# 3 3 6
# 4 4 10
# 5 5 15
# 6 6 21
# 7 7 27
# 8 8 33
# 9 9 39
# 10 10 45
This can be done like this:
df <- data.frame(count=1:10)
library(zoo)
df$acc_sum <- rev(rollapply(rev(df$count), 6, sum, partial = TRUE, align = "left"))
To obtain this result, we are reversing the order of df$count, we sum the elements (using partial = TRUE and align = "left" is important here), and we reverse the result to have the vector needed.
rev(rollapply(rev(df$count), 6, sum, partial = TRUE, align = "left"))
# [1] 1 3 6 10 15 21 27 33 39 45
Note that this sums 6 elements, not 5. According to the code in your question, this gives the same output. If you just want to sum 5 rows, just replace the 6 with a 5.

How to add a date to each row for a column in a data frame?

df <- data.frame(DAY = character(), ID = character())
I'm running a (for i in DAYS[i]) and get IDs for each day and storing them in a data frame
df <- rbind(df, data.frame(ID = IDs))
I want to add the DAY[i] in a second column across each row in a loop.
How do I do that?
As #Pascal says, this isn't the best way to create a data frame in R. R is a vectorised language, so generally you don't need for loops.
I'm assuming each ID is unique, so you can create a vector of IDs from 1 to 10:
ID <- 1:10
Then, you need a vector for your DAYs which can be the same length as your IDs, or can be recycled (i.e. if you only have a certain number of days that are repeated in the same order you can have a smaller vector that's reused). Use c() to create a vector with more than one value:
DAY <- c(1, 2, 9, 4, 4)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 1
# 2 2 2
# 3 3 9
# 4 4 4
# 5 5 4
# 6 6 1
# 7 7 2
# 8 8 9
# 9 9 4
# 10 10 4
Or with a vector for DAY that includes unique values:
DAY <- sample(1:100, 10, replace = TRUE)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 61
# 2 2 30
# 3 3 32
# 4 4 97
# 5 5 32
# 6 6 74
# 7 7 97
# 8 8 73
# 9 9 16
# 10 10 98

Resources