Add new column to data frame, taking existing values within range - r

I was wondering if anyone knows a simple way to create a new column in a data frame, taking data from an existing column, within a certain range.
For example, I have this data frame
range col1
1 5
2 4
3 9
4 5
5 2
6 8
7 9
I would like to create col2 using the data in col1, and have col2 take values above the range 3
range col1 col2
1 5 0
2 4 0
3 9 0
4 5 5
5 2 2
6 8 8
7 9 9
I have tried
data$col2 <- data$col1 [which(data$range > 3)) ]
data$col2 <- subset ( data$col1 , data$range >3 )
However both of these produce error:
replacement has 4 rows, data has 7
Any help greatly appreciated

You can do it even without ifelse here:
data$new <- with(data, (range > 3) * col1)
data
# range col1 new
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9

Try ifelse
transform(data, col2=ifelse(range >3, col1, 0))
# range col1 col2
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9

Related

get the value of a cell of a dataframe based on the value in one of the columns in R

I have an example of a data frame in which columns "a" and "b" have certain values, and in column "c" the values are 1 or 2. I would like to create column "d" in which the value found in the frame will be located at the index specified in column "c".
x = data.frame(a = c(1:10), b = c(3:12), c = seq(1:2))
x
a b c
1 1 3 1
2 2 4 2
3 3 5 1
4 4 6 2
5 5 7 1
6 6 8 2
7 7 9 1
8 8 10 2
9 9 11 1
10 10 12 2
thus column "d" for the first row will contain the value 1, since the index in column "c" is 1, for the second row d = 4, since the index in column "c" is 2, and so on. I was not helped by the standard indexing in R, it just returns the value of the column c. in what ways can I solve my problem?
You may can create a matrix of row and column numbers to subset values from the dataframe.
x$d <- x[cbind(1:nrow(x), x$c)]
x
# a b c d
#1 1 3 1 1
#2 2 4 2 4
#3 3 5 1 3
#4 4 6 2 6
#5 5 7 1 5
#6 6 8 2 8
#7 7 9 1 7
#8 8 10 2 10
#9 9 11 1 9
#10 10 12 2 12
If the input is tibble, you need to change the tibble to dataframe to use the above answer.
If you don't want to change to dataframe, here is another option using rowwise.
library(dplyr)
x <- tibble(x)
x %>% rowwise() %>% mutate(d = c_across()[c])
By using dplyr::mutate and ifelse,
x %>% mutate(d = ifelse(c == 1, a, b))
a b c d
1 1 3 1 1
2 2 4 2 4
3 3 5 1 3
4 4 6 2 6
5 5 7 1 5
6 6 8 2 8
7 7 9 1 7
8 8 10 2 10
9 9 11 1 9
10 10 12 2 12

Manipulate Vector and Data Frame in R

list1=c(1,6,3,4,4,5)
data=data.frame("colA" = c(1:6),
"colB"=c(4,3,1,8,9,8))
I have 'list1' and 'data'
I wish to match the values in 'colB' to the ones in list1 using 'colA' as a key aso it looks like
Perhaps, we need match
data.frame(list1, colB = data$colB[match(list1, data$colA)])
# list1 colB
#1 1 4
#2 6 8
#3 3 1
#4 4 8
#5 4 8
#6 5 9
You can also use merge, which was one of your tags.
merge(data.frame(list1=list1), data, by.x=c("list1"), by.y="colA")
list1 colB
1 1 4
2 3 1
3 4 8
4 4 8
5 5 9
6 6 8
Or if you don't care about the column name:
merge(data.frame(colA=list1), data)
colA colB
1 1 4
2 3 1
3 4 8
4 4 8
5 5 9
6 6 8

filter unique rows ignoring NA [duplicate]

This question already has an answer here:
Want to remove duplicated rows unless NA value exists in columns
(1 answer)
Closed 3 years ago.
These are my example data:
df <- data.frame("x" = c(1,1,2,3,4,NA,NA,6), "y"=c(1,1,6,7,8,9,9,10))
x y
1 1 1
2 1 1
3 2 6
4 3 7
5 4 8
6 NA 9
7 NA 9
8 6 10
I'd like to get rid of duplicate entries (=rows), but I want to keep entries with at least one NA. As a result, only the second row above should be deleted, while the 7th should be kept. unique(df, incomparables=NA) yields an error message.
You can use complete.cases and duplicated to subset df.
df[!complete.cases(df) | !duplicated(df),]
# x y
#1 1 1
#3 2 6
#4 3 7
#5 4 8
#6 NA 9
#7 NA 9
#8 6 10
You can use duplicated with rowSums
df[!duplicated(df) | rowSums(is.na(df)) == 1, ]
# x y
#1 1 1
#3 2 6
#4 3 7
#5 4 8
#6 NA 9
#7 NA 9
#8 6 10

replace values in row if it matches with last row in R

I have below data frame in R
df <- read.table(text = "
A B C D E
14 6 8 16 14
5 6 10 6 4
2 4 6 3 4
26 6 18 39 36
1 2 3 1 2
3 1 1 1 1
3 5 1 4 11
", header = TRUE)
Now if values in last two rows are same, I need to replace these values with 0, can any one help me in this if it is doable in R
For example:
values last two rows in column 1 are 3 so I need to replace 3 by 0.
Also same for column 3 last two rows in column 3 are 1 so I need to replace 3 by 0.
you can compare last 2 rows and replace in the columns where the values are same :
nr <- nrow(df)
df[(nr-1):nr, df[nr-1, ]==df[nr, ]] <- 0
df
# A B C D E
#1 14 6 8 16 14
#2 5 6 10 6 4
#3 2 4 6 3 4
#4 26 6 18 39 36
#5 1 2 3 1 2
#6 0 1 0 1 1
#7 0 5 0 4 11
One option is to loop through the columns, check if the last two elements (tail(x,2)) or duplicated, then replace it with 0 or else return the column and assign the output back to the dataset. The [] make sure that the structure is intact.
df[] <- lapply(df, function(x) if(anyDuplicated(tail(x, 2))>0)
replace(x, c(length(x)-1, length(x)), 0) else x)
df
# A B C D E
#1 14 6 8 16 14
#2 5 6 10 6 4
#3 2 4 6 3 4
#4 26 6 18 39 36
#5 1 2 3 1 2
#6 0 1 0 1 1
#7 0 5 0 4 11
You could also do this:
r <- tail(df, 2)
r[,r[1,]==r[2,]] <- 0
df <- rbind(head(df, -2), r)

Arithmetic operation on selective rows in R

I am very new to R, so this question may seem stupid, but please bear with me. Here's what my data looks like:
col1 col2
1 2 9
2 2 2
3 1 8
4 1 1
5 2 4
6 2 5
7 2 3
8 1 10
9 1 6
10 2 7
reproducible from
data <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
col2 = as.factor(sample(10)))
I want to have all rows in col2 multiplied by 2, if the corresponding value in col1 is "1". So the end result should be like:
col1 col2
1 2 9
2 2 2
3 1 16
4 1 2
5 2 4
6 2 5
7 2 3
8 1 20
9 1 12
10 2 7
And ideas? Appreciation in advance for your help.
If the data were numeric, you could assign to a slice with a simple computation:
> d[d$col1==1,2] <- 2*d[d$col1==1,2]
> d
col1 col2
1 2 9
2 2 2
3 1 16
4 1 2
5 2 4
6 2 5
7 2 3
8 1 20
9 1 12
10 2 7
With a factor, this becomes problematic as you cannot do the substitution in-place (the existing factor doesn't have the appropriate levels). Instead, you must create a new factor with the desired levels:
d$col2 <- as.factor(ifelse(d$col1==1, 2*as.numeric(d$col2), d$col2))
Assuming that the columns are numeric
transform(df1, col2= (2+(col1==1)-1)*col2)
Here's another possibility:
data$col2 <- as.numeric(data$col2) * (1 + (data$col1==1))

Resources