Replace n number of rows with condition in R? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a df :
number=c(3,3,3,3,3,1,1,1,1,4,4,4,4,4,4)
data.frame(number)
but with thousands of rows.
How can i replace n number of rows out of way more and turn 3 into 1 for example.
If you can explain the logic too would be great.
No special requirements just replace a certain amount of 3 into 1. Not all.
Either randomly or the first n numbers.

Here are two versions for you. The first assumes you randomly want to convert n rows from 3 to 1. The second assumes that you want to choose the first n rows from 3 to 1.
To randomly select n of the rows where the value is currently 3, and then convert to 1:
> number=c(3,3,3,3,3,1,1,1,1,4,4,4,4,4,4)
>
>
> # to randomly change n rows (assume here that n = 4)
> set.seed(1)
> df <- data.frame(v1 = number)
> df$v1[sample(which(df$v1 == 3), 4)] <- 1
> df
v1
1 1
2 1
3 1
4 1
5 3
6 1
7 1
8 1
9 1
10 4
11 4
12 4
13 4
14 4
15 4
To change to the first n rows (assume again that n = 4):
> df <- data.frame(v1 = number)
> df$v1[which(df$v1 == 3)[1:4]] <- 1
> df
v1
1 1
2 1
3 1
4 1
5 3
6 1
7 1
8 1
9 1
10 4
11 4
12 4
13 4
14 4
15 4
Since you wanted the logic for how this works:
Both answers rely on the which() command. Which will give you the location where a vector is TRUE, so when we do which(df$v1 == 3) this is going to give us the location of all the rows where the df$v1 is 3:
> df$v1 == 3
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> which(df$v1 == 3)
[1] 1 2 3 4 5
We then simply specify that we want to reassign df$v1 at those positions to 1. However, since you wanted to specify how many rows to do this for, we subset the result of our which() vector by using [1:n] to select the first n results, or sample(x, n) to randomly select n results.

I am assuming you want to select n appearances of some value in a data.frame column.
For that you can sample, with or without replacement, all the values that match your requirements.
Below I show how to do that for 3 instances of 3's
number =c (3,3,3,3,3,1,1,1,1,4,4,4,4,4,4)
foo = data.frame(number)
indexes = sample(which(foo$number == 3), size = 3, replace = F)
foo$number[indexes] = 'your value'

Related

Add a column in a data frame depending of another column [duplicate]

I have the data.frame below. I want to add a column 'g' that classifies my data according to consecutive sequences in column h_no. That is, the first sequence of h_no 1, 2, 3, 4 is group 1, the second series of h_no (1 to 7) is group 2, and so on, as indicated in the last column 'g'.
h_no h_freq h_freqsq g
1 0.09091 0.008264628 1
2 0.00000 0.000000000 1
3 0.04545 0.002065702 1
4 0.00000 0.000000000 1
1 0.13636 0.018594050 2
2 0.00000 0.000000000 2
3 0.00000 0.000000000 2
4 0.04545 0.002065702 2
5 0.31818 0.101238512 2
6 0.00000 0.000000000 2
7 0.50000 0.250000000 2
1 0.13636 0.018594050 3
2 0.09091 0.008264628 3
3 0.40909 0.167354628 3
4 0.04545 0.002065702 3
You can add a column to your data using various techniques. The quotes below come from the "Details" section of the relevant help text, [[.data.frame.
Data frames can be indexed in several modes. When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list.
my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector
The data.frame method for $, treats x as a list
my.dataframe$new.col <- a.vector
When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they act like indexing a matrix
my.dataframe[ , "new.col"] <- a.vector
Since the method for data.frame assumes that if you don't specify if you're working with columns or rows, it will assume you mean columns.
For your example, this should work:
# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))
# find where one appears and
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs
# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
len <- length(seq(from = x[1], to = y[1]))
return(rep(z, times = len))
})
# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)
no h_freq h_freqsq group
1 1 0.40998238 0.06463876 1
2 2 0.98086928 0.33093795 1
3 3 0.28908651 0.74077119 1
4 4 0.10476768 0.56784786 1
5 1 0.75478995 0.60479945 2
6 2 0.26974011 0.95231761 2
7 3 0.53676266 0.74370154 2
8 4 0.99784066 0.37499294 2
9 5 0.89771767 0.83467805 2
10 6 0.05363139 0.32066178 2
11 7 0.71741529 0.84572717 2
12 1 0.10654430 0.32917711 3
13 2 0.41971959 0.87155514 3
14 3 0.32432646 0.65789294 3
15 4 0.77896780 0.27599187 3
16 5 0.06100008 0.55399326 3
Easily: Your data frame is A
b <- A[,1]
b <- b==1
b <- cumsum(b)
Then you get the column b.
If I understand the question correctly, you want to detect when the h_no doesn't increase and then increment the class. (I'm going to walk through how I solved this problem, there is a self-contained function at the end.)
Working
We only care about the h_no column for the moment, so we can extract that from the data frame:
> h_no <- data$h_no
We want to detect when h_no doesn't go up, which we can do by working out when the difference between successive elements is either negative or zero. R provides the diff function which gives us the vector of differences:
> d.h_no <- diff(h_no)
> d.h_no
[1] 1 1 1 -3 1 1 1 1 1 1 -6 1 1 1
Once we have that, it is a simple matter to find the ones that are non-positive:
> nonpos <- d.h_no <= 0
> nonpos
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[13] FALSE FALSE
In R, TRUE and FALSE are basically the same as 1 and 0, so if we get the cumulative sum of nonpos, it will increase by 1 in (almost) the appropriate spots. The cumsum function (which is basically the opposite of diff) can do this.
> cumsum(nonpos)
[1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2
But, there are two problems: the numbers are one too small; and, we are missing the first element (there should be four in the first class).
The first problem is simply solved: 1+cumsum(nonpos). And the second just requires adding a 1 to the front of the vector, since the first element is always in class 1:
> classes <- c(1, 1 + cumsum(nonpos))
> classes
[1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3
Now, we can attach it back onto our data frame with cbind (by using the class= syntax, we can give the column the class heading):
> data_w_classes <- cbind(data, class=classes)
And data_w_classes now contains the result.
Final result
We can compress the lines together and wrap it all up into a function to make it easier to use:
classify <- function(data) {
cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}
Or, since it makes sense for the class to be a factor:
classify <- function(data) {
cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}
You use either function like:
> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column
(This method of solving this problem is good because it avoids explicit iteration, which is generally recommend for R, and avoids generating lots of intermediate vectors and list etc. And also it's kinda neat how it can be written on one line :) )
In addition to Roman's answer, something like this might be even simpler. Note that I haven't tested it because I do not have access to R right now.
# Note that I use a global variable here
# normally not advisable, but I liked the
# use here to make the code shorter
index <<- 0
new_column = sapply(df$h_no, function(x) {
if(x == 1) index = index + 1
return(index)
})
The function iterates over the values in n_ho and always returns the categorie that the current value belongs to. If a value of 1 is detected, we increase the global variable index and continue.
Approach based on identifying number of groups (x in mapply) and its length (y in mapply)
mytb<-read.table(text="h_no h_freq h_freqsq group
1 0.09091 0.008264628 1
2 0.00000 0.000000000 1
3 0.04545 0.002065702 1
4 0.00000 0.000000000 1
1 0.13636 0.018594050 2
2 0.00000 0.000000000 2
3 0.00000 0.000000000 2
4 0.04545 0.002065702 2
5 0.31818 0.101238512 2
6 0.00000 0.000000000 2
7 0.50000 0.250000000 2
1 0.13636 0.018594050 3
2 0.09091 0.008264628 3
3 0.40909 0.167354628 3
4 0.04545 0.002065702 3", header=T, stringsAsFactors=F)
mytb$group<-NULL
positionsof1s<-grep(1,mytb$h_no)
mytb$newgroup<-unlist(mapply(function(x,y)
rep(x,y), # repeat x number y times
x= 1:length(positionsof1s), # x is 1 to number of nth group = g1:g3
y= c( diff(positionsof1s), # y is number of repeats of groups g1 to penultimate (g2) = 4, 7
nrow(mytb)- # this line and the following gives number of repeat for last group (g3)
(positionsof1s[length(positionsof1s )]-1 ) # number of rows - position of penultimate group (g2)
) ) )
mytb
I believe that using "cbind" is the simplest way to add a column to a data frame in R. Below an example:
myDf = data.frame(index=seq(1,10,1), Val=seq(1,10,1))
newCol= seq(2,20,2)
myDf = cbind(myDf,newCol)
The data.table function rleid is handy for things like this. We subtract the sequence 1:nrow(data) to transform consecutive sequences to constants, and then use rleid to create the group IDs:
data$g = data.table::rleid(data$h_no - 1:nrow(data))
Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

Subsetting values not adding up in R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a dataframe (df) in R. All columns are character class.
> dim(df)
[1] 1000 6
I'm trying to remove rows where df$entry == c("7795").
entries_to_remove <- subset(df, entry == c("7795"))
> dim(entries_to_remove)
[1] 35 6
So as you can see above, I have 35 entries to remove from the data frame. However, when I go to remove these using subset, it doesn't remove the correct amount:
entries_to_remove <- subset(df, entry != c("7795"))
> dim(entries_to_remove)
[1] 648 6
The above command was supposed to remove 35 entries, but instead it removed 352. Does anyone know why this might be happening?
Here's another solution, which takes up just one line:
df[-which(grepl("7995", apply(df, 1, paste0, collapse = " "))),]
RESULT:
v1 entry1 entry2 entry3
2 2 5 5 2
3 3 2 4 2
4 4 2 3 1
6 6 1 2 1
7 7 2 4 4
8 8 4 5 5
9 9 5 1 5
DATA:
set.seed(121)
df <- data.frame(
v1 = 1:10,
entry1 = c(sample(1:5, 9, replace = T), 7995),
entry2 = c(sample(1:5, 4), 7995, sample(1:5, 5)),
entry3 = c(7995, sample(1:5, 9, replace = T))
)
df[2:4] <- lapply(df[2:4], as.character) # convert to character, as in your data
df
v1 entry1 entry2 entry3
1 1 1 2 7995
2 2 5 5 2
3 3 2 4 2
4 4 2 3 1
5 5 3 7995 2
6 6 1 2 1
7 7 2 4 4
8 8 4 5 5
9 9 5 1 5
10 10 7995 3 5
The above solutions didn't work, I do not think the issue is with NA. However, I solved the problem myself. It is a workaround but it worked:
# list the row numbers for the entries to remove
row_remove <- rownames(entries_to_remove )
# make a list of all the row numbers
all_rows <- 1:dim(df)[1]
# create a vector with only the rows to keep
subset_row <- all_rows[!(all_rows%in%row_remove)]
# subset the dataframe with these rows
df<- df[subset_row,]
The issue has to do with NAs, some of the other solutions will work, but the easiest and I think most inutive is just to use %in% rather than ==
entries_to_remove <- subset(df, !(entry %in% c("7795")))
entries_to_remove <- subset(df, entry %in% c("7795"))
This should explain whats happening. Notice how the ==, returns NA rather than FALSE.
> c( 5, 6, 7) == 5
[1] TRUE FALSE FALSE
> c( 5, 6, 7 , NA) == 5
[1] TRUE FALSE FALSE NA
> c( 5, 6, 7 , NA) %in% 5
[1] TRUE FALSE FALSE FALSE
and you can't subset using an NA

Assigning vector elements a value associated with preceding matching value [duplicate]

This question already has answers here:
Calculating cumulative sum for each row
(6 answers)
Sum of previous rows in a column R
(1 answer)
Closed 3 years ago.
I have a vector of alternating TRUE and FALSE values:
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
I'd like to number each instance of TRUE with a unique sequential number and to assign each FALSE value the number associated with the TRUE value preceding it.
therefore, my desired output using the example dat above (which has 4 TRUE values):
1 1 1 2 2 2 2 3 3 4 4 4 4 4
What I tried:
I've tried the following (which works), but I know there must be a simpler solution!!
whichT <- which(dat==T)
whichF <- which(dat==F)
l1 <- lapply(1:length(whichT),
FUN = function(x)
which(whichF > whichT[x] & whichF < whichT[(x+1)])
)
l1[[length(l1)]] <- which(whichF > whichT[length(whichT)])
replaceFs <- unlist(
lapply(1:length(whichT),
function(x) l1[[x]] <- rep(x,length(l1[[x]]))
)
)
replaceTs <- 1:length(whichT)
dat2 <- dat
dat2[whichT] <- replaceTs
dat2[whichF] <- replaceFs
dat2
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
I need a simpler and quicker solution b/c my real data set is 181k rows long!
Base R solutions preferred, but any solution works
cumsum(dat) will do what you want. When used in mathematical functions TRUE gets converted to 1 and FALSE to 0 so taking the cumulative sum will add 1 every time you see a TRUE and add nothing when there is a FALSE which is what you want.
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
cumsum(dat)
# [1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
Instead of doing the indexing, it can be easily done with cumsum from base R. Here, TRUE/FALSE gets coerced to 1/0 and when we do the cumulative sum, whereever there is 1, it gets increment by 1
cumsum(dat)
#[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
cumsum() is the most straightforward way, however, you can also do:
Reduce("+", dat, accumulate = TRUE)
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4

Customer latency in r [duplicate]

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 5 years ago.
New to R doing customer latency.
In the dataset i have around 300,000 rows with 15 columns. Some
relevant columns are "Account", "Account Open Date",
"Shipment pick up date" etc.
Account numbers are repeated and just want the rows with account numbers where it is recorded for the first time, not the subsequent rows.
For eg. acc # 610829952 is in the first row as well as in the 5th row, 6th row etc. I need to filter out the first row alone and i need to do this for all the account numbers.
I am not sure how to do this. Could someone please help me with this?
There is a function in R called duplicated(). It allows you to check whether a certain value, like your account, has already been recorded.
First you check in the relevant column account which account numbers have already appeared before using duplicated(). You will get a TRUE / FALSE vector (TRUE indicating that the corresponding value has already appeared). With that information, you will index your data.frame in order to only retrieve the rows you are interested in. I will assume you have your data looks like df below:
df <- data.frame(segment = sample(LETTERS, 20, replace = TRUE),
account = sample(1:5, 20, replace = TRUE))
# account segment
# 1 3 N
# 2 2 V
# 3 4 T
# 4 4 Y
# 5 4 M
# 6 4 E
# 7 5 H
# 8 3 A
# 9 3 J
# 10 3 Y
# 11 4 R
# 12 5 O
# 13 4 O
# 14 1 R
# 15 5 U
# 16 2 Q
# 17 5 F
# 18 2 J
# 19 4 E
# 20 2 H
inds <- duplicated(df$account)
# [1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
# [11] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
df <- df[!inds, ]
# account segment
# 1 3 N
# 2 2 V
# 3 4 T
# 7 5 H
# 14 1 R

Create a group number for each consecutive sequence

I have the data.frame below. I want to add a column 'g' that classifies my data according to consecutive sequences in column h_no. That is, the first sequence of h_no 1, 2, 3, 4 is group 1, the second series of h_no (1 to 7) is group 2, and so on, as indicated in the last column 'g'.
h_no h_freq h_freqsq g
1 0.09091 0.008264628 1
2 0.00000 0.000000000 1
3 0.04545 0.002065702 1
4 0.00000 0.000000000 1
1 0.13636 0.018594050 2
2 0.00000 0.000000000 2
3 0.00000 0.000000000 2
4 0.04545 0.002065702 2
5 0.31818 0.101238512 2
6 0.00000 0.000000000 2
7 0.50000 0.250000000 2
1 0.13636 0.018594050 3
2 0.09091 0.008264628 3
3 0.40909 0.167354628 3
4 0.04545 0.002065702 3
You can add a column to your data using various techniques. The quotes below come from the "Details" section of the relevant help text, [[.data.frame.
Data frames can be indexed in several modes. When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list.
my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector
The data.frame method for $, treats x as a list
my.dataframe$new.col <- a.vector
When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they act like indexing a matrix
my.dataframe[ , "new.col"] <- a.vector
Since the method for data.frame assumes that if you don't specify if you're working with columns or rows, it will assume you mean columns.
For your example, this should work:
# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))
# find where one appears and
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs
# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
len <- length(seq(from = x[1], to = y[1]))
return(rep(z, times = len))
})
# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)
no h_freq h_freqsq group
1 1 0.40998238 0.06463876 1
2 2 0.98086928 0.33093795 1
3 3 0.28908651 0.74077119 1
4 4 0.10476768 0.56784786 1
5 1 0.75478995 0.60479945 2
6 2 0.26974011 0.95231761 2
7 3 0.53676266 0.74370154 2
8 4 0.99784066 0.37499294 2
9 5 0.89771767 0.83467805 2
10 6 0.05363139 0.32066178 2
11 7 0.71741529 0.84572717 2
12 1 0.10654430 0.32917711 3
13 2 0.41971959 0.87155514 3
14 3 0.32432646 0.65789294 3
15 4 0.77896780 0.27599187 3
16 5 0.06100008 0.55399326 3
Easily: Your data frame is A
b <- A[,1]
b <- b==1
b <- cumsum(b)
Then you get the column b.
If I understand the question correctly, you want to detect when the h_no doesn't increase and then increment the class. (I'm going to walk through how I solved this problem, there is a self-contained function at the end.)
Working
We only care about the h_no column for the moment, so we can extract that from the data frame:
> h_no <- data$h_no
We want to detect when h_no doesn't go up, which we can do by working out when the difference between successive elements is either negative or zero. R provides the diff function which gives us the vector of differences:
> d.h_no <- diff(h_no)
> d.h_no
[1] 1 1 1 -3 1 1 1 1 1 1 -6 1 1 1
Once we have that, it is a simple matter to find the ones that are non-positive:
> nonpos <- d.h_no <= 0
> nonpos
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[13] FALSE FALSE
In R, TRUE and FALSE are basically the same as 1 and 0, so if we get the cumulative sum of nonpos, it will increase by 1 in (almost) the appropriate spots. The cumsum function (which is basically the opposite of diff) can do this.
> cumsum(nonpos)
[1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2
But, there are two problems: the numbers are one too small; and, we are missing the first element (there should be four in the first class).
The first problem is simply solved: 1+cumsum(nonpos). And the second just requires adding a 1 to the front of the vector, since the first element is always in class 1:
> classes <- c(1, 1 + cumsum(nonpos))
> classes
[1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3
Now, we can attach it back onto our data frame with cbind (by using the class= syntax, we can give the column the class heading):
> data_w_classes <- cbind(data, class=classes)
And data_w_classes now contains the result.
Final result
We can compress the lines together and wrap it all up into a function to make it easier to use:
classify <- function(data) {
cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}
Or, since it makes sense for the class to be a factor:
classify <- function(data) {
cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}
You use either function like:
> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column
(This method of solving this problem is good because it avoids explicit iteration, which is generally recommend for R, and avoids generating lots of intermediate vectors and list etc. And also it's kinda neat how it can be written on one line :) )
In addition to Roman's answer, something like this might be even simpler. Note that I haven't tested it because I do not have access to R right now.
# Note that I use a global variable here
# normally not advisable, but I liked the
# use here to make the code shorter
index <<- 0
new_column = sapply(df$h_no, function(x) {
if(x == 1) index = index + 1
return(index)
})
The function iterates over the values in n_ho and always returns the categorie that the current value belongs to. If a value of 1 is detected, we increase the global variable index and continue.
Approach based on identifying number of groups (x in mapply) and its length (y in mapply)
mytb<-read.table(text="h_no h_freq h_freqsq group
1 0.09091 0.008264628 1
2 0.00000 0.000000000 1
3 0.04545 0.002065702 1
4 0.00000 0.000000000 1
1 0.13636 0.018594050 2
2 0.00000 0.000000000 2
3 0.00000 0.000000000 2
4 0.04545 0.002065702 2
5 0.31818 0.101238512 2
6 0.00000 0.000000000 2
7 0.50000 0.250000000 2
1 0.13636 0.018594050 3
2 0.09091 0.008264628 3
3 0.40909 0.167354628 3
4 0.04545 0.002065702 3", header=T, stringsAsFactors=F)
mytb$group<-NULL
positionsof1s<-grep(1,mytb$h_no)
mytb$newgroup<-unlist(mapply(function(x,y)
rep(x,y), # repeat x number y times
x= 1:length(positionsof1s), # x is 1 to number of nth group = g1:g3
y= c( diff(positionsof1s), # y is number of repeats of groups g1 to penultimate (g2) = 4, 7
nrow(mytb)- # this line and the following gives number of repeat for last group (g3)
(positionsof1s[length(positionsof1s )]-1 ) # number of rows - position of penultimate group (g2)
) ) )
mytb
I believe that using "cbind" is the simplest way to add a column to a data frame in R. Below an example:
myDf = data.frame(index=seq(1,10,1), Val=seq(1,10,1))
newCol= seq(2,20,2)
myDf = cbind(myDf,newCol)
The data.table function rleid is handy for things like this. We subtract the sequence 1:nrow(data) to transform consecutive sequences to constants, and then use rleid to create the group IDs:
data$g = data.table::rleid(data$h_no - 1:nrow(data))
Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

Resources