Subset data based on character vector

Subset data based on character vector - r

I have character vector containing numeric values. I want to subset data based on the vector.
x = '1,2,3,4,5'
n = noquote(gsub(","," ",x))
mtcars[n,]
It's not working.
But the following code works.
d = data.frame(n = 1:5)
mtcars[d$n,]

x is a string and not a numeric vector to use it as a row index. We can split the string on ",", convert the numbers into numeric and then subset the dataframe.
mtcars[as.numeric(strsplit(x, ",")[[1]]), ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
For comparison purposes, look at below output to understand why your second attempt works.
class(d$n)
#[1] "integer"
length(d$n)
#[1] 5
class(x)
#[1] "character"
length(x)
#[1] 1

Related

how to add a Column using IF_ELSE

I'm trying to add a column to a dataframe using add_column and if_else but I can get it I don't know how to do a correct logical test using logical conditional (or "|").
I have this kind data:
dataframe1
variable 1 variable2 variable3
(char) (char) (char)
value value value
value value value
value value value
I try this:
dataframe2 <- dataframe1%>%
add_column(newcolumn_name = if_else(variable3== "value1"|"value2”, TRUE, FALSE)
And I get this error:
Unknown or uninitialised column: value1.Error in variable3 ==
“value1“| "value2" : operations are possible only for numeric,
logical or complex types

Consider to extract the column with .$. The == can be replaced with %in% and | is used mostly with regex pattern (OR) while == does a fixed match. In addition, the output of == or %in% returns a logical vector. So, we don't need the if_else/ifelse
library(dplyr)
library(tibble)
dataframe1 %>%
add_column(newcolumn_name = .$variable3 %in% c("value1", "value2"))
Using a reproducible example
head(mtcars) %>%
add_column(new_column_name = .$carb %in% c(1, 4))
mpg cyl disp hp drat wt qsec vs am gear carb new_column_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
Also, this can be done within dplyr itself i.e. using mutate and thus we don't need to extract the column
head(mtcars) %>%
mutate(new_column_name = carb %in% c(1, 4))
mpg cyl disp hp drat wt qsec vs am gear carb new_column_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE

I was able to do that with this code:
dataf2 <- dataf %>%
add_column(newcol = ifelse(dataf$var3=="value1" | dataf$var3=="value2", TRUE, FALSE) )

How to create function to use regular expressions to replace column names in a data frame?

I am feeling lost with how to create a helper function in R that takes the following 3 arguments:
a data frame,
a string pattern, and
a string "replacement pattern".
The function is supposed to replace occurrences of the string pattern in the names of the variables in the data frame with the replacement pattern.
Any guidance, tips or help would be greatly appreciated.

func <- function(x, nm1, nm2, ...) {
names(x) <- gsub(nm1, nm2, names(x), ...)
x
}
head(func(mtcars, "c", "C"))
# mpg Cyl disp hp drat wt qseC vs am gear Carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Create subset of data.frame based on complete observations in 7 variables? [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 2 years ago.
I have a data.frame with 571 observations of 156 variables. I am interested in keeping all 156 variables; however, I only need complete observations for 7 of these variables.
By using:
> nrow(na.omit(subset(finaldata, select = c("h_egfr_cystc96", "child_age96", "smoke_inside2T", "SES_3cat2T", "X_ZBFA96", "log2Tblood", "sexo_h00"))))
I learn that there are 453 observations that have complete information for these 7 variables.
How can I create a new data.frame that will have 453 observations of 156 variables, with complete information for my 7 variables of interest?
I suspect that complete.cases will be useful, but I am not sure how to apply it here.
Any ideas? Thank you in advance for the help!

Use complete.cases on just the columns of interest, but use its return value (a vector of logical) on the original frame.
mt <- mtcars[1:5,]
mt
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
mt$cyl[3] <- mt$disp[2] <- NA
mt[complete.cases(mt[,c("mpg","cyl")]),]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Because I looked for complete cases in just "mpg" and "cyl", then the NA in "disp" didn't remove that row.

Copy a data frame in a function

I want to copy data frame a to a new data frame b inside a function.
a <- mtcars
saveData <- function(x, y){
y <- x
return(y)
}
saveData(a, b)
In this example, the function should create the object/data frame b. b should be a copy of a (i.e., mtcars)
The crux is to flexibly "name" objects.
I excessively played around with assign(), deparse(), and substitute(), but I could not make it work.

It is not a good pracrtice to save the data in global environment from a function. However if you want to do it here is a way :
saveData <- function(x, y){
assign(deparse(substitute(y)), x, envir = parent.frame())
}
a <- mtcars
b
Error: object 'b' not found
saveData(a, b)
b
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#...

Another idea is to use list2env, but you have to convert to a named list, so your second argument will need to be a character, i.e.
saveData <- function(x, y) {
v1 <- setNames(list(x), y)
list2env(v1, envir = .GlobalEnv)
}
saveData(a, 'b')
b
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#.....
NOTE: I wouldn't recommend adding staff to your global environment. It is better to keep them in lists

Opposite function to add_rownames in dplyr

As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.

You can now use the tibble-package:
tibble::column_to_rownames()

This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.

If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...