New category column based on column value - r

I am trying to use if and ifelse statement to create a new column in my dataframe based on the values of an existing column. For example, I have a variable column which has numbers from 1 to 10000. I want to categorize them into 8 buckets (using 1250 size) in my new column. So if I have 1 in my column, I should get b1 in the new column. If I have 9999, I should get b8, etc. My if else code is failing so far.

You can use cut :
paste0('b', cut(1:10000, 8, labels = FALSE))
Replace 1:10000 by column values (df$colum_name).


How do I subtract values from one column from another in a dataframe?

I have a dataframe where the rows are the names of different genes, with 2 columns called: Control_mean and Patient_mean.
I want to create a third column where I store the value of "Patient_mean - Control_mean" for each row respectively but I cant figure out how!
I tried to do so using this:
for(i in 1:nrow(newdf8)){
newdf8$log2FC[i] <- (newdf8[,2] - newdf8[,1])
but it didnt work, since all the values in the new column became the same number, and not the value of the actual difference.

Can I replace values in cells conditional on a string?

I am new to R and trying to create my own dataset by modifying Eurostat data. I now have regions with names such as AT111, AT112 and ITC11. I want to give each country a number, so that all regions from AT have a country code equal to 1.
For that I have added a new empty column to my dataset. Is there a way for me to do this:
NUTS3.3[NUTS3.3$geo == "AT111", "country"] <- 1
for all observations whose geo string contains "AT" at once?
I have >26 000 observations, so doing it for every single regional code would be tedious.
We can get the substr of the column and do the ==
NUTS3.3$country[substr(NUTS3.3$geo, 1, 2)=="AT"] <- 1

How to add new vector that includes 0 and cumsum to dataframe?

I want to add a new vector to a dataframe based on cumsum of previous column but starting from 0.
I've tried to create a vector with 0 and then the cumsum function but I have an additional row now from this. I've tried to remove the additional row but cannot.
Error in $<*tmp*, time, value = c(0, 5, 9, 15.4, :
replacement has 1138 rows, data has 1137
We may need to remove the last element and then do the cumulative sum, otherwise, it would have a mismatch between the number of rows of the original column and the new vector created
cumsum(c(0, mydata$duration[-nrow(mydata)]))

Insert all missing rows into data table for a range of values for 2 columns

I am interested in inserting all missing rows into a data table for a new range of values for 2 columns.
Example, dt1[,a] has some values from 1 to 5, as does dt1[,b], but i'd like not only all pair wise combinations to be present in columns a and b, but all combinations to be present in a newly defined range, e.g. 1 to 7 instead.
# Example data.table
dt1 <- data.table(a=c(1,1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5),
# CJ in data.table will create all rows to ensure all
# pair wise combinations are present (using the nominated columns).
The above is great but will only use the max and min in the nominated columns. I'd like the inserted rows to give me all combinations between a new, nominated range, e.g. 1 to 7. There would be 49 rows.
# the following is a temporary workaround
template <- data.table(a1=rep(1:7,each=7),b1=rep(1:7,7))
full <- dt1[template]
Instead of the already existing values in 'a' column, we can have a range of values to pass into 'CJ' for the 'a'
dt1[CJ(a = 1:7, b, unique = TRUE)]

R: returning the 5 rows with the highest values

Sample data
mysample <- data.frame(ID = 1:100, kWh = rnorm(100))
I'm trying to automate the process of returning the rows in a data frame that contain the 5 highest values in a certain column. In the sample data, the 5 highest values in the "kWh" column can be found using the code:
(tail(sort(mysample$kWh), 5))
which in my case returns:
[1] 1.477391 1.765312 1.778396 2.686136 2.710494
I would like to create a table that contains rows that contain these numbers in column 2.
I am attempting to use this code:
mysample[mysample$kWh == (tail(sort(mysample$kWh), 5)),]
This returns:
ID kWh
87 87 1.765312
I would like it to return the r rows that contain the figures above in the "kWh" column. I'm sure I've missed something basic but I can't figure it out.
We can use rank
mysample$Rank <- rank(-mysample$kWh)
if we don't need to create column, directly use order (as #Jaap mentioned in three alternative methods)
#order descending and get the first 5 rows
#order ascending and get the last 5 rows
#or just use sequence as index to get the rows.
