Fix subscript out of bounds error when adding column to df - r

I have a df with 20 columns of numerical data. I am trying to add an additional column with the "total" number of rows, however I am getting a subscript out of bounds error. This is the code I'm using:
df[,"Total"]<-rowSums(df)
This is the error:
Error in `[<-`(`*tmp*`, , "Total", value = c(Acidovorax = 13, Acinetobacter = 48143, :
subscript out of bounds

That shouldn't happen for data.frames, but can for matrix.
mt_mtx <- as.matrix(mtcars)
mtcars[,"Total"] <- rowSums(mtcars)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb Total
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 328.980
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 329.795
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 259.580
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 426.135
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 590.310
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 385.540
mt_mtx[,"Total"] <- rowSums(mt_mtx)
# Error in `[<-`(`*tmp*`, , "Total", value = c(`Mazda RX4` = 328.98, `Mazda RX4 Wag` = 329.795, :
# subscript out of bounds
The quick remedy is to convert your df back to a data.frame. If you weren't expecting this, thinking that your df was already a frame, then I suggest you go back through your code to find what accidentally coerced it to a matrix.

Related

RStudio: colnames() function not showing name of very first column

When I run colnames(), it never shows the name of this first column.
For example, after wasting a lot of time researching online, I discovered the name of the first column in mtcars is das_Auto.
Why doesn't this name show when I run this code?
[colnames(mtcars)][1]
What's the easiest way to determine the name of the first column in a data set?
This is because the first 'column' of mtcars is not actually a column but an index. If you want to convert it to a column you can run the below:
df <- cbind(das_Auto = rownames(mtcars), mtcars)
rownames(df) <- 1:nrow(mtcars)
head(df)
das_Auto mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

how to add a Column using IF_ELSE

I'm trying to add a column to a dataframe using add_column and if_else but I can get it I don't know how to do a correct logical test using logical conditional (or "|").
I have this kind data:
dataframe1
variable 1 variable2 variable3
(char) (char) (char)
value value value
value value value
value value value
I try this:
dataframe2 <- dataframe1%>%
add_column(newcolumn_name = if_else(variable3== "value1"|"value2”, TRUE, FALSE)
And I get this error:
Unknown or uninitialised column: value1.Error in variable3 ==
“value1“| "value2" : operations are possible only for numeric,
logical or complex types
Consider to extract the column with .$. The == can be replaced with %in% and | is used mostly with regex pattern (OR) while == does a fixed match. In addition, the output of == or %in% returns a logical vector. So, we don't need the if_else/ifelse
library(dplyr)
library(tibble)
dataframe1 %>%
add_column(newcolumn_name = .$variable3 %in% c("value1", "value2"))
Using a reproducible example
head(mtcars) %>%
add_column(new_column_name = .$carb %in% c(1, 4))
mpg cyl disp hp drat wt qsec vs am gear carb new_column_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
Also, this can be done within dplyr itself i.e. using mutate and thus we don't need to extract the column
head(mtcars) %>%
mutate(new_column_name = carb %in% c(1, 4))
mpg cyl disp hp drat wt qsec vs am gear carb new_column_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
I was able to do that with this code:
dataf2 <- dataf %>%
add_column(newcol = ifelse(dataf$var3=="value1" | dataf$var3=="value2", TRUE, FALSE) )

How to create function to use regular expressions to replace column names in a data frame?

I am feeling lost with how to create a helper function in R that takes the following 3 arguments:
a data frame,
a string pattern, and
a string "replacement pattern".
The function is supposed to replace occurrences of the string pattern in the names of the variables in the data frame with the replacement pattern.
Any guidance, tips or help would be greatly appreciated.
func <- function(x, nm1, nm2, ...) {
names(x) <- gsub(nm1, nm2, names(x), ...)
x
}
head(func(mtcars, "c", "C"))
# mpg Cyl disp hp drat wt qseC vs am gear Carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Looping through rows and columns in R does not work

I am just trying to fill gaps but in a loop. It is a monthly data, and fill_gaps produces NAs for every day. I am not sure why.
for (x in 2:length(differencing)){
for(micky in 1:length(differencing$`d_ BA`)){
if(is.na(differencing[micky,x])== T){
differencing[micky,x] = differencing[micky-1,x]
}
}
}
here is the error that I am getting:
Error: Assigned data `differencing[(micky - 1), x]` must be compatible with row subscript `micky`.
x 1 row must be assigned.
x Assigned data has 0 rows.
i Row updates require a list value. Do you need `list()` or `as.list()`?
Run `rlang::last_error()` to see where the error occurred.
This can be easily done using fill
library(tidyr)
library(dplyr)
differencing %>%
fill(everything())
Or we can use na.locf from zoo
library(zoo)
na.locf(differencing)
In the OP's loop, in the first line, it would be
for (x in 2:length(differencing$`d_ BA`)
...
as length of a data.frame will be the number of columns (as mentioned in the comments) and is different from length of a column i.e. vector
As the OP mentioned none of them works (OP didn't provide any example), using a small reproducible example ('tmp')
tmp %>%
fill(everything())
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 6 258 110 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 258 110 2.76 3.460 20.22 1 0 3 1
or using na.locf
na.locf(tmp)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 6 258 110 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 258 110 2.76 3.460 20.22 1 0 3 1
data
tmp <- head(mtcars)
tmp[c(2, 5, 6), c(3, 4, 2)] <- NA

write r function to modify value in data frame

I have a set a variables say Var1, Var2 to Varn. They all take three possible values 0, 1, and 2. I want to replace all 2 as 1
like so
df$Var1[df$Var1 >= 1] <- 1
This does the job. But when I try to write a function to do this
MakeBinary <- function(varName dfName){dfName$varName[dfName$varNAme > = 1] <- 1}
and use this function like:
MakeBinary(Var2, df)
I got an error message: Error in $<-.data.frame(*tmp*, "varName", value = numeric(0)) :
replacement has 0 rows, data has 512.
I just want to know why I got this message. Thanks. My sample size is 512.
If we are passing column name as string, then use [[ instead of $ and return the dataset
MakeBinary <- function(varName, dfName){
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary("Var2", df)
example with mtcars
MakeBinary("carb", head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Unquoted arguments for variable names can be passed as well, but it needs to be converted to string
MakeBinary <- function(varName, dfName){
varName <- deparse(substitute(varName))
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary(Var2, df)
Using a reproducible example with mtcars
MakeBinary(carb, head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Resources