Could anyone explain how to change the negative values in the below dataframe?
we have been asked to create a data structure to get the below output.
# > df
# x y z
# 1 a -2 3
# 2 b 0 4
# 3 c 2 -5
# 4 d 4 6
Then we have to use control flow operators and/or vectorisation to multiply only the negative values by 10.
I tried so many different ways but cannot get this to work. I get an error when i try to use a loop and because of the letters.
Create indices of the negative values and multiply by 10, i.e.
i1 <- which(df < 0, arr.ind = TRUE)
df[i1] <- as.numeric(df[i1]) * 10
# x y z
#1 a -20 3
#2 b 0 4
#3 c 2 -50
#4 d 4 6
First find out the numeric columns of the dataframe and multiply the negative values by 10.
cols <- sapply(df, is.numeric)
#Multiply negative values by 10 and positive with 1
df[cols] <- df[cols] * ifelse(sign(df[cols]) == -1, 10, 1)
df
# x y z
#1 a -20 3
#2 b 0 4
#3 c 2 -50
#4 d 4 6
Using dplyr -
library(dplyr)
df <- df %>% mutate(across(where(is.numeric), ~. * ifelse(sign(.) == -1, 10, 1)))
Related
The task is to multiply all negative numbers by 10 in 'df'.
So I am only able to multiply everything by 10 but when I add an if-statement then everything stops working.
df =
x <- c('a', 'b', 'c', 'd', 'e')
y <- c(-4,-2,0,2,4)
z <- c(3, 4, -5, 6, -8)
# Join the variables to create a data frame
df <- data.frame(x,y,z)
df
##
x y z
1 a -4 3
2 b -2 4
3 c 0 -5
4 d 2 6
5 e 4 -8
my code so far
df2 <- df
df2
for(i in 2:ncol(df2)) {
df2[ , i] <- df2[ , i] *10
}
df2
cbind(df[1], 10^(df[-1] < 0) * df[-1])
x y z
1 a -40 3
2 b -20 4
3 c 0 -50
4 d 2 6
5 e 4 -80
You can achieve this using the dplyr functions mutate and across.
library(dplyr) # install if required
df %>%
mutate(across(-x, ~ifelse(. < 0, 10 * ., .)))
This says "for all columns except x, multiple by 10 where the value is < 0, otherwise leave value as is".
Result:
x y z
1 a -40 3
2 b -20 4
3 c 0 -50
4 d 2 6
5 e 4 -80
I'm an absolute beginner in coding and R and this is my third week doing it for a project. (for biologists, I'm trying to find the sum of risk alleles for PRS) but I need help with this part
df
x y z
1 t c a
2 a t a
3 g g t
so when code applied:
x y z
1 t 0 0
2 a 0 1
3 g 1 0
```
I'm trying to make it that if the rows in y or z match x the value changes to 1 and if not, zero
I started with:
```
for(i in 1:ncol(df)){
df[, i]<-df[df$x == df[,i], df[ ,i]<- 1]
}
```
But got all NA values
In reality, I have 100 columns I have to compare with x in the data frame. Any help is appreciated
An alternative way to do this is by using ifelse() in base R.
df$y <- ifelse(df$y == df$x, 1, 0)
df$z <- ifelse(df$z == df$x, 1, 0)
df
# x y z
#1 t 0 0
#2 a 0 1
#3 g 1 0
Edit to extend this step to all columns efficiently
For example:
df1
# x y z w
#1 t c a t
#2 a t a a
#3 g g t m
To apply column editing efficiently, a better approach is to use a function applied to all targeted columns in the data frame. Here is a simple function to do the work:
edit_col <- function(any_col) any_col <- ifelse(any_col == df1$x, 1, 0)
This function takes a column, and then compare the elements in the column with the elements of df1$x, and then edit the column accordingly. This function takes a single column. To apply this to all targeted columns, you can use apply(). Because in your case x is not a targeted column, you need to exclude it by indexing [,-1] because it is the first column in df.
# Here number 2 indicates columns. Use number 1 for rows.
df1[, -1] <- apply(df1[,-1], 2, edit_col)
df1
# x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0
Of course you can also define a function that edit the data frame so you don't need to do apply() manually.
Here is an example of such function
edit_df <- function(any_df){
edit_col <- function(any_col) any_col <- ifelse(any_col == any_df$x, 1, 0)
# Create a vector containing all names of the targeted columns.
target_col_names <- setdiff(colnames(any_df), "x")
any_df[,target_col_names] <-apply( any_df[,target_col_names], 2, edit_col)
return(any_df)
}
Then use the function:
edit_df(df1)
# x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0
A tidyverse approach
library(dplyr)
df <-
tibble(
x = c("t","a","g"),
y = c("c","t","g"),
z = c("a","a","t")
)
df %>%
mutate(
across(
.cols = c(y,z),
.fns = ~if_else(. == x,1,0)
)
)
# A tibble: 3 x 3
x y z
<chr> <dbl> <dbl>
1 t 0 0
2 a 0 1
3 g 1 0
I want to subtract 1 from the values of column A if column B is <= 20.
A = c(1,2,3,4,5)
B = c(10,20,30,40,50)
df = data.frame(A,B)
output
A B
1 0 10
2 1 20
3 3 30
4 4 40
5 5 50
My data is very huge so I prefer not to use a loop. Is there any computationally efficient method in R?
You can do
df$A[df$B <= 20] <- df$A[df$B <= 20] - 1
# A B
#1 0 10
#2 1 20
#3 3 30
#4 4 40
#5 5 50
We can break this down step-by-step to understand how this works.
First we check which numbers in B is less than equal to 20 which gives us a logical vector
df$B <= 20
#[1] TRUE TRUE FALSE FALSE FALSE
Using that logical vector we can select the numbers in A
df$A[df$B <= 20]
#[1] 1 2
Subtract 1 from those numbers
df$A[df$B <= 20] - 1
#[1] 0 1
and replace these values for the same indices in A.
With dplyr we can also use case_when
library(dplyr)
df %>%
mutate(A = case_when(B <= 20 ~ A - 1,
TRUE ~ A))
Another possibility:
df$A <- ifelse(df$B < 21, df$A - 1, df$A)
And here is a data.table solution:
library(data.table)
setDT(df)
df[B <= 20, A := A - 1]
I am in a novice of R. I have a dataframe with columns 1:n. Excluding column 1 and n, I want to change the maximum value of each row if the row has a specific value in a different column AND set the remaining values (excluding column 1 and n) to zero. I have about 300,000 cases and 40 columns in my real data, however, the example below illustrates what I am trying to achieve:
A <- c(1,1,5,5,10)
B <- rnorm(1:5)
C <- rnorm(1:5)
D <- rnorm(1:5)
E <- c(10,15,100,100,100)
df <- data.frame(A,B,C,D,E)
df
A B C D E
1 1 0.74286670 0.3222136 0.9381296 10
2 1 -0.03352498 0.5262685 0.1225731 15
3 5 -0.17689629 -0.8949740 -1.4376567 100
4 5 0.48329153 1.1574834 -1.1116581 100
5 10 0.13117277 -0.2068736 0.4841806 100
Here, if column A of each row has 1, I want to change the maximum value of each row into the value of column E, and set columns B, C and D to 0.
So, the result should be like this:
A B C D E
1 1 0 0 10 10
2 1 0 15 0 15
3 5 -0.17689629 -0.8949740 -1.4376567 100
4 5 0.48329153 1.1574834 -1.1116581 100
5 10 0.13117277 -0.2068736 0.4841806 100
I tried to do this for two days. Thanks.
Try this out and see what happens :)
df <- read.table(text = "A B C D E
1 1 0.74286670 0.3222136 0.9381296 10
2 1 -0.03352498 0.5262685 0.1225731 15
3 5 -0.17689629 -0.8949740 -1.4376567 100
4 5 0.48329153 1.1574834 -1.1116581 100
5 10 0.13117277 -0.2068736 0.4841806 100", stringsAsFactor = FALSE)
# find the max in columns B,C,D
z <- apply(df[df$A == 1, 2:4], 1, max)
# substitute the maximum value of each row for columns B,C,D where A == 1
# with the value of column E. Assign 0 to the others
y <- ifelse(df[df$A == 1, 2:4] == z, df$E[df$A == 1], 0)
# Change the values in your dataframe
df[df$A == 1, 2:4] <- y
I have a variable A containing continuous numeric values and a binary variable B. I would like to create a new variable A1 which contains the same values as A if B=1 and missing values (NA) if B=2.
Many thanks!
You can use ifelse() for that:
a1 <- ifelse(B == 1, A, NA)
Here's a simple and efficient approach without ifelse:
A <- 1:10
# [1] 1 2 3 4 5 6 7 8 9 10
B <- rep(1:2, 5)
# [1] 1 2 1 2 1 2 1 2 1 2
A1 <- A * NA ^ (B - 1)
# [1] 1 NA 3 NA 5 NA 7 NA 9 NA
You can use ifelse for this:
A = runif(100)
B = sample(c(0,1), 100, replace = TRUE)
B1 = ifelse(B == 1, A, NA)
You can even leave out the == 1 as R interprets 0 as FALSE and any other number as TRUE:
B1 = ifelse(B, A, NA)
Although the == 1 is both more flexible and makes it more clear what happens. So I'd go for the first approach.