Making a vector on condition in R [duplicate] - r

This question already has answers here:
Logical comparison of two vectors with binary (0/1) result
(2 answers)
Closed 3 years ago.
I'm completely new to R, have a little background in Python only.
Say I have 2 columns in my dataframe df that are
col1 = c(1,3,4,5,2,6,7)
col2 = c(2,5,1,5,6,5,3)
and I want to add a new column in df consisting elements 0s and 1s only, it takes 1 if the element in col1 is less than the element in col2, and 0 otherwise. So it should be like
col3 = c(1,1,0,0,1,0,0)
I think there's a way to do it in one line,
df$col3 <- c(...)
but I don't know how to fill in (...) part. Any help would be greatly appreciated.

You may simply compare the vectors themselves:
df <- data.frame(c1 = col1, c2 = col2)
df$c3 <- as.integer(df$c1 < df$c2)
df
c1 c2 c3
1 1 2 1
2 3 5 1
3 4 1 0
4 5 5 0
5 2 6 1
6 6 5 0
7 7 3 0

Related

How to modify the variable names by combining current variable names and row 1 values? [duplicate]

This question already has answers here:
Concatenate column name and 1st row in R
(2 answers)
Pasting the first row to the column name within a list
(1 answer)
Closed 1 year ago.
How can I modify raw_dataframe to wished_dataframe?
raw_dataframe <- data.frame(
category=c('a','1','2','3','4'),
subcategory=c('b','3','2','1','0'),
item=c('wd','4','5','7','0'))
wished_dataframe <- data.frame(
category_a=c('1','2','3','4'),
subcategory_b=c('3','2','1','0'),
item_wd=c('4','5','7','0'))
I actually have many csv files, the structure like 'raw_dataframe ' and (I want to combine row 1 and row 2 as the variable name. Any one can help?
# Paste colnames with values of row 1
colnames(raw_dataframe) <- paste0(colnames(raw_dataframe), "_", raw_dataframe[1, ])
# Remove row 1 and save in `wished_dataframe`
wished_dataframe <- raw_dataframe[-1, ]
A dplyr way: We could use rename_with:
library(dplyr)
raw_dataframe %>%
rename_with(~paste0(.,"_", raw_dataframe[1,])) %>%
slice(-1)
category_a subcategory_b item_wd
1 1 3 4
2 2 2 5
3 3 1 7
4 4 0 0
An option with janitor
library(janitor)
library(stringr)
library(dplyr)
row_to_names(raw_dataframe, 1) %>%
rename_with(~ str_c(names(raw_dataframe), '_', .))
category_a subcategory_b item_wd
2 1 3 4
3 2 2 5
4 3 1 7
5 4 0 0

How add and subtract a vector from a data frame row wise in R [duplicate]

This question already has answers here:
subtract a constant vector from each row in a matrix in r
(5 answers)
How to divide each row of a matrix by elements of a vector in R
(3 answers)
Closed 1 year ago.
I want to subtract corresponding elements from each row in a df from a vector.
> test
A B
1 0 0
2 0 0
3 0 0
Expected output:
> test + c(3,4)
A B
1 3 4
2 3 4
3 3 4
Actual output:
A B
1 3 4
2 4 3
3 3 4
What is the correct way to do this?
This will give what you want although it's a little tricky.
t(t(test) + c(3,4))
This will work. Here each = 3 is the nrow of test
test + rep(c(3,4), each = 3)
You can imagine the dataframe test is transformed into a vector c(A,B) firstly when it puls a vector c(3,4). Because c(3,4) is shorter than c(A,B), it will be extened to a same size vector.

Merging dataframes without changing values [duplicate]

This question already has an answer here:
Column binding in R
(1 answer)
Closed 3 years ago.
I have two dataframes
df1 <- data.frame(c(1:10))
df2 <- data.frame(c(1,0,1,1,0,1,0,0,1,0)
I tried to merge them using this code:
merge(df1,df2,all = TRUE, sort = FALSE)
But my dataframe comes out really weird, there are 100 rows
I want the dataframe to look like this:
col1 col2
1 1
2 0
3 1
4 1
5 0
6 1
7 0
8 0
9 1
10 0
How can I do this?
You can just define a new data frame, and use [,1] to extract the columns from your existing data frames, this gives you the ability to name the columns.
data.frame(col1=df1[,1], col2 = df2[,1])
# col1 col2
#1 1 1
#2 2 0
#3 3 1
#4 4 1
#5 5 0
#6 6 1
#7 7 0
#8 8 0
#9 9 1
#10 10 0
This will get you the formatting you want, with named columns:
library(dplyr)
df1 <- data.frame(col1 = c(1:10))
df2 <- data.frame(col2 = c(1,0,1,1,0,1,0,0,1,0))
df <- bind_cols(df1, df2)
You can do that with cbind(), which stands for column bind:
cbind(df1, df2)

Filling cell data with mean for each unique name [duplicate]

This question already has answers here:
replace NA with groups mean in a non specified number of columns [duplicate]
(2 answers)
Closed 3 years ago.
I have been using R for the past couple days and I have question that I am a little stumped on. I have a dataframe with bidder names and bids where some of the bids are empty. I am having trouble implementing a dynamic way to take the average bid for each unique bidder and apply that to the empty cells. This line of code below will take the mean bid for all of the unique bidders. All I need to do is place the mean value of unique_bid in the empty cells that shares the same bidder.
unique_bid <- aggregate(bid ~ bidder, auction[complete.cases(auction),], mean)
Here is a picture of what the dataframe looks like.
You could use ave.
Example:
df = data.frame(a = c(1,1,1,2,2,2), b=c(1,2,NA,4,5,NA),c= c(1,2,3,4,5,6))
> df
a b c
1 1 1 1
2 1 2 2
3 1 NA 3
4 2 4 4
5 2 5 5
6 2 NA 6
Do:
sel = is.na(df$b)
df$b[sel] = ave(df$b, df$a, FUN = function(x){mean(x, na.rm = T)})[sel]
ave will use apply the function FUN to df$b while grouping by df$a. The sel will select NA elements of df$b and replace them by the correponding function's result.
Result:
> df
a b c
1 1 1.0 1
2 1 2.0 2
3 1 1.5 3
4 2 4.0 4
5 2 5.0 5
6 2 4.5 6

Set a value if there is an increase of more than 1 in a column between rows [duplicate]

This question already has answers here:
Group rows in data frame based on time difference between consecutive rows
(2 answers)
Closed 6 years ago.
I have a dataframe as below:
RecordID <- c("a","b","c","d","e","f","g")
row.number <- c(1,2,10,11,12,45,46)
df <- data.frame(RecordID, row.number)
df$frame.change =1
I want the value in frame.change to increase by 1 from the previous row if there is an increase in row.number of more than 1 from the previous row. I am trying the following code but it doesn't work:
for( i in 2:nrow(df)){
df$frame.change[i] <- if( (df$frame.change[i] - df$frame.change[i-1]) <1 ){df$frame.change[i-1] }else{ df$frame.change[i-1] +1 }
cat("-")
}
It doesn't need to be done within a for loop and I presume lapply will be the solution but I can't seem to get this to work.
Help appreciated
A combination of cumsum and diff is all you need:
df$frame.change <- cumsum(c(1, diff(df$row.number) > 1))
which gives:
> df
RecordID row.number frame.change
1 a 1 1
2 b 2 1
3 c 10 2
4 d 11 2
5 e 12 2
6 f 45 3
7 g 46 3

Resources