I am working with a data set where I have to recode variables so that Never and Rarely =0, Sometimes and Always as 1, and Not Applicable as NA. For reference, the numbering scheme for the code is that 1=Never, 2=Rarely, 3=Sometimes, 4=Always, and 5= Not Applicable. Should I change the numeric variables before renaming them or change the character variables into numeric ones? I'm at an impasse and could use help on what code to use.
The problem
You have a vector (or a data frame column) x with values 1 through 5, eg:
x <- c(1,2,3,4,5,4,3,2,1)
You want to recode 1 and 2 to 0, 3 and 4 to 1, and 5 to NA.
Solution in base R
values <- list(`1` = 0, `2` = 0, `3` = 1, `4` = 1, `5` = NA)
x <- unname(unlist(values[x]))
[1] 0 0 1 1 NA 1 1 0 0
Solution with dplyr::recode()
values <- list(`1` = 0, `2` = 0, `3` = 1, `4` = 1, `5` = NA_real_)
x <- dplyr::recode(x, !!!values)
[1] 0 0 1 1 NA 1 1 0 0
Related
I have a very simple case here in which I would like to subtract each column from its previous one. As a matter of fact I am looking for a sliding subtraction as the first column stays as is and then the first one subtracts the second one and second one subtracts the third one and so on till the last column.
here is my sample data set:
structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
and my desired output:
structure(list(x = c(1, 0, 0, 0), y = c(0, 0, 1, 1), z = c(-1,
1, 0, 0)), class = "data.frame", row.names = c(NA, -4L))
I am personally looking for a solution with purrr family of functions. I also thought about slider but I'm not quite familiar with the latter one. So I would appreciate any help and idea with these two packages in advance. Thank you very much.
A simple dplyr only solution-
cur_data() inside mutate/summarise just creates a whole copy. So
just substract cur_data()[-ncol(.)] from cur_data()[-1]
with pmap_df you can do similar things
df <- structure(list(x = c(1, 0, 0, 0), y = c(1, 0, 1, 1), z = c(0,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
library(dplyr)
df %>%
mutate(cur_data()[-1] - cur_data()[-ncol(.)])
#> x y z
#> 1 1 0 -1
#> 2 0 0 1
#> 3 0 1 0
#> 4 0 1 0
similarly
pmap_dfr(df, ~c(c(...)[1], c(...)[-1] - c(...)[-ncol(df)]))
I think you are looking for pmap_df with lag to subtract the previous value.
library(purrr)
library(dplyr)
pmap_df(df, ~{x <- c(...);x - lag(x, default = 0)})
# A tibble: 4 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
Verbose, but simple:
df %>%
select(x) %>%
bind_cols(df %>%
select(-1) %>%
map2_dfc(df %>%
select(-ncol(df)), ~.x -.y))
# x y z
#1 1 0 -1
#2 0 0 1
#3 0 1 0
#4 0 1 0
We can just do (no need of any packages)
cbind(df1[1], df1[-1] - df1[-ncol(df1)])
-output
x y z
1 1 0 -1
2 0 0 1
3 0 1 0
4 0 1 0
Or using dplyr
library(dplyr)
df1 %>%
mutate(.[-1] - .[-ncol(.)])
I'm running a linear regression, but many of my observations can be used because some of the values have an NA in the row. I know that if one of a set of variables is entered, then and NA is actually 0. However, if all the values are NA, then the columns do not change. I will include and example because I know this might be confusing.
What I have is something that looks likes this:
df <- data.frame(outcome = c(1, 0, 1, 1, 0),
Var1 = c(1, 0, 1, NA, NA),
Var2 = c(NA, 1, 0, 0, NA),
Var3 = c(0, 1, NA, 1, NA))
For Vars 1-3, the first 4 rows have an NA, but have other entries in other vars. In the last row, however, all values are NA. I know that everything in the last row is NA, but I want the NAs in those first 4 rows to be filled with 0. The desired outcome would look like this:
desired - data.frame(outcome = c(1, 0, 1, 1, 0),
Var1 = c(1, 0, 1, 0, NA),
Var2 = c(0, 1, 0, 0, NA),
Var3 = c(0, 1, 0, 1, NA))
I know there are messy ways I could go about this, but I was wondering what would be the most streamlined process for this?
I hope this makes sense, I know the question is confusing. I can clarify anything if needed.
We can create a logical vector with rowSums, use that to subset the rows before changing the NA to 0
i1 <- rowSums(!is.na(df[-1])) > 0
df[i1, -1][is.na(df[i1, -1])] <- 0
-checking with desired
identical(df, desired)
#[1] TRUE
You can use apply to conditionally replace NA in certain rows:
data.frame(t(apply(df, 1, function(x) if (all(is.na(x[-1]))) x else replace(x, is.na(x), 0))))
Output
outcome Var1 Var2 Var3
1 1 1 0 0
2 0 0 1 1
3 1 1 0 0
4 1 0 0 1
5 0 NA NA NA
I have a dataframe which looks like this:
> df
1 2 3 4 5 7 8 9 10 11 12 13 14 15 16
1 6 0 0 0 0 0 3 0 0 0 0 0 0 1
I try to replicate the number 1540 by the entries in the df and store them in length(df) new variables. So, this loop should output 16 variables, for example
a1b <- c(1540)
a2b <- c(1540,1540,1540,1540,1540,1540)
...
I tried to solve this, for example, with the code below, but this does not work.
df <- P1_2008[1:16]
for(i in 1:15){
paste0("a",i,"b") <- rep(c(1540), times = df[i])
}
Does anyone has an idea how to fix this?
Best regards,
Daniel
The output of the df is
dput(df)
c(`1` = 1, `2` = 6, `3` = 0, `4` = 0, `5` = 0, `7` = 0, `8` = 0,
`9` = 3, `10` = 0, `11` = 0, `12` = 0, `13` = 0, `14` = 0, `15` = 0,
`16` = 1)
Does this help?
for(i in 1:15){
assign(paste0("a",i,"b"), rep(c(1540), times = df[i]))
}
If you want to create a variable name from a string assign() is your friend. The second argument is an object (in this a vector) that is assigned to the variable name given (as a string) in the first argument.
use tidyverse
library(tidyverse)
df <- read.table(text = "1 6 0 0 0 0 0 3 0 0 0 0 0 0 1", header = F)
out <- map(df, ~rep(1540, .x)) %>% purrr::set_names(., paste0("a", seq_along(df), "b"))
list2env(out, envir = .GlobalEnv)
#> <environment: R_GlobalEnv>
Created on 2020-09-30 by the reprex package (v0.3.0)
Assign is the right choice, but the answer above has it sligtly backwards. You should provide the text you want for your variable as the first argument and the desired value as the second, so this should work.
for (i in 1:16){assign(paste0('a',i,'b'),rep(1540,i))}
I have a named numeric vector like this:
c(`1` = 2, `5` = 3, `6` = 1, `7` = 2, `8` = 1, `9` = 1)
#1 5 6 7 8 9 (names)
#2 3 1 2 1 1 (values)
I want to expand the vector so that the names form a sequence of integers and fill the values with 0.
Here is my expected output:
c(`1` = 2, `2` = 0, `3` = 0, `4` = 0, `5` = 3, `6` = 1, `7` = 2, `8` = 1, `9` = 1)
#1 2 3 4 5 6 7 8 9
#2 0 0 0 3 1 2 1 1
Any help?
Thanks
Here is my solution using name based indexing:
vec = c("1" = 2, "5" = 3, "6" = 1, "7" = 2, "8" = 1, "9" = 1)
newvec = double(9);
names(newvec) = 1:9
newvec[names(vec)] = vec;
newvec
# 1 2 3 4 5 6 7 8 9
# 2 0 0 0 3 1 2 1 1
I have searched around but could not find a particular answer to my question.
Suppose I have a data frame df:
df = data.frame(id = c(10, 11, 12, 13, 14),
V1 = c('blue', 'blue', 'blue', NA, NA),
V2 = c('blue', 'yellow', NA, 'yellow', 'green'),
V3 = c('yellow', NA, NA, NA, 'blue'))
I want to use the values of V1-V3 as unique column headers and I want the occurrence frequency of each of those per row to populate the rows.
Desired output:
desired = data.frame(id = c(10, 11, 12, 13, 14),
blue = c(2, 1, 1, 0, 1),
yellow = c(1, 1, 0, 1, 0),
green = c(0, 0, 0, 0, 1))
There is probably a really cool way to do this with tidyr::spread and dplyr::summarise. However, I don't know how to spread the V* columns when the keys I want to spread by are all over the place in different columns and include NAs.
Thanks for any help!
Using meltand dcast from package reshape2:
dcast(melt(df, id="id", na.rm = TRUE), id~value)
id blue green yellow
1 10 2 0 1
2 11 1 0 1
3 12 1 0 0
4 13 0 0 1
5 14 1 1 0
As suggested by David Arenburg, it is just simpler to use recast, a wrapper for melt and dcast:
recast(df, id ~ value, id.var = "id")[,1:4] # na.rm is not possible then
id blue green yellow
1 10 2 0 1
2 11 1 0 1
3 12 1 0 0
4 13 0 0 1
5 14 1 1 0