lapply and data.frame in R

lapply and data.frame in R - r

I am attempting to use R to accept as many user input files as required and to take those files and make one histogram per file of the values stored in the 14th column. I have gotten this far:
library("tcltk")
library("grid")
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))
Num.Files<-NROW(File.names)
test<-sapply(1:Num.Files,function(x){readLines(File.names[x])})
data<-read.table(header=TRUE,text=test[1])
names(data)[14]<-'column14'
dat <- list(file1 = data.frame("column14"),
file2 = data.frame("column14"),
file3 = data.frame("column14"),
file4 = data.frame("column14"))
#Where the error comes up
tmp <- lapply(dat, `[[`, 2)
lapply(tmp, function(x) {hist(x, probability=TRUE, main=paste("Histogram of Coverage")); invisible()})
layout(1)
My code hangs up though on the line that states tmp <- lapply(dat,[[, 2)
The error that comes up is one of two things. If the line reads as above then the error is this:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Calls: lapply -> FUN -> [[.data.frame -> <Anonymous>
I did some research and found that it could be caused by a double [[]] so I changed it to tmp <- lapply(dat,[, 2) to see if it would do any good (as many tutorials said it might) but that just resulted in this error:
Error in `[.data.frame`(X[[1L]], ...) : undefined columns selected
Calls: lapply -> FUN -> [.data.frame
The input files all will follow this pattern:
Targ cov av_cov 87A_cvg 87Ag 87Agr 87Agr 87A_gra 87A%_1 87A%_3 87A%_5 87A%_10 87A%_20 87A%_30 87A%_40 87A%_50 87A%_75 87A%_100
1:028 400 0.42 400 0.42 1 1 2 41.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1:296 400 0.42 400 0.42 1 1 2 41.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Is this a common problem? Can anyone explain it to me? I am not too familiar with R but I hope to continue learning.
Thanks
EDIT:
For reproducibility, if I run:
head(test)
head(data)
x <- list(mtcars, mtcars, mtcars);lapply(x, head)
head(dat)
This is the result:
> head(test)
[,1]
[1,] "Targ cov av_cov 87A_cvg 87Ag 87Agr 87Agr 87A_gra 87A%_1 87A%_3 87A%_5 87A%_10 87A%_20 87A%_30 87A%_40\t87A%_50\t87A%_75\t87A%_100"
[2,] "1:028 400\t0.42\t400\t0.42\t1\t1\t2\t41.8\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"
[3,] "1:296 400\t0.42\t400\t0.42\t1\t1\t2\t41.8\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"
[4,] "1:453 1646\t8.11\t1646\t8.11\t7\t8\t13\t100.0\t100.0\t87.2\t32.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"
[5,] "1:427 1646\t8.11\t1646\t8.11\t7\t8\t13\t100.0\t100.0\t87.2\t32.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"
[6,] "1:736 5105\t29.68\t5105\t29.68\t14\t29\t48\t100.0\t100.0\t100.0\t86.0\t65.7\t49.4\t35.5\t16.9\t0.0\t0.0"
> head(data)
[1] Targ cov av_cov X87A_cvg X87Ag X87Agr X87Agr.1
[8] X87A_gra X87A._1 X87A._3 X87A._5 X87A._10 X87A._20 X87A._30
[15] X87A._40 X87A._50 X87A._75 X87A._100
<0 rows> (or 0-length row.names)
> x <- list(mtcars, mtcars, mtcars);lapply(x, head)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
[[3]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> head(dat)
$file1
X.column14.
1 column14
$file2
X.column14.
1 column14
$file3
X.column14.
1 column14
$file4
X.column14.
1 column14
> tmp <- lapply(dat, `[`, 2)
Error in `[.data.frame`(X[[1L]], ...) : undefined columns selected
Calls: lapply -> FUN -> [.data.frame
Execution halted

What are you trying to do here?
tmp <- lapply(dat, `[[`, 2)
The lapply function is equivalent to
list(file1=dat[[1]][[2]],
file2=dat[[2]][[2]],
file3=dat[[3]][[2]],
file4=dat[[4]][[2]])
This doesn't work. You're trying to extract column 2 out of data frame that only has 1 column.
Redefine dat as this, and it will work.
dat <- list(file1 = data.frame("column14","iforgotcolumn2"),
file2 = data.frame("column14","iforgotcolumn2"),
file3 = data.frame("column14","iforgotcolumn2"),
file4 = data.frame("column14","iforgotcolumn2"))

Related

How to create function to use regular expressions to replace column names in a data frame?

I am feeling lost with how to create a helper function in R that takes the following 3 arguments:
a data frame,
a string pattern, and
a string "replacement pattern".
The function is supposed to replace occurrences of the string pattern in the names of the variables in the data frame with the replacement pattern.
Any guidance, tips or help would be greatly appreciated.

func <- function(x, nm1, nm2, ...) {
names(x) <- gsub(nm1, nm2, names(x), ...)
x
}
head(func(mtcars, "c", "C"))
# mpg Cyl disp hp drat wt qseC vs am gear Carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

write r function to modify value in data frame

I have a set a variables say Var1, Var2 to Varn. They all take three possible values 0, 1, and 2. I want to replace all 2 as 1
like so
df$Var1[df$Var1 >= 1] <- 1
This does the job. But when I try to write a function to do this
MakeBinary <- function(varName dfName){dfName$varName[dfName$varNAme > = 1] <- 1}
and use this function like:
MakeBinary(Var2, df)
I got an error message: Error in $<-.data.frame(*tmp*, "varName", value = numeric(0)) :
replacement has 0 rows, data has 512.
I just want to know why I got this message. Thanks. My sample size is 512.

If we are passing column name as string, then use [[ instead of $ and return the dataset
MakeBinary <- function(varName, dfName){
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary("Var2", df)
example with mtcars
MakeBinary("carb", head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Unquoted arguments for variable names can be passed as well, but it needs to be converted to string
MakeBinary <- function(varName, dfName){
varName <- deparse(substitute(varName))
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary(Var2, df)
Using a reproducible example with mtcars
MakeBinary(carb, head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

write function to replace variable with itself plus 1% of its median

I'm new to writing functions in R, but want to write a function to add 1% of the median of a variable to itself, using dplyr, and replace the variable with this transformation.
x is a numeric variable.
add_median <- function(df, x) {
x <- enquo(x)
x <- quo_name(x)
mutate(x=x+.01*median(x, na.rm=T))
}
When I run newDF <- DF %>% add_median(variable_of_interest), I get the following error:
Error in 0.01 * median(x, na.rm = T) : non-numeric argument to binary operator
What am I doing wrong here?

We could change the function to evaluate with {{}} and then use assign (:=) instead of = in mutate
library(dplyr)
add_median <- function(df, x) {
df %>%
mutate({{x}} := {{x}} + .01 * median({{x}}, na.rm = TRUE))
}
If we need to change multiple columns, use mutate_at
add_median_multiple <- function(df, vec){
df %>%
mutate_at(vars(vec), ~ . + .01 * median(., na.rm = TRUE))
}
-testing
data(mtcars)
head(mtcars) %>%
add_median(mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.460 20.22 1 0 3 1
comparison with original 'mpg' column
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
add_median_multiple(head(mtcars), c('mpg', 'wt'))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.65045 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.90545 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.35045 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.24545 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.47045 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.49045 20.22 1 0 3 1

Convert LIst To Dataframe Using For Loop And Saving Under Different Names In R

I am trying to convert my list consisting of 52 components to a dataframe for each of the components.
Without using the for loop will look something like this which is tedious:
df1 = as.data.frame(list[1])
df2 = as.data.frame(list[2])
df3 = as.data.frame(list[3])
.
.
.
df50 = as.data.frame(list[50])
How do I achieve this using the for loop? My attempt:
for (i in seq_along(list)) {
noquote(paste0("df", i)) = as.data.frame(list[i])
}
Error: target of assignment expands to non-language objec
I think I'll have to invovle assign.

If you have list of dataframes in list, you can name them and then use list2env to have them as separate dataframes in the environment.
names(list) <- paste0('df', seq_along(list))
list2env(list, .GlobalEnv)
Using a reproducible exmaple,
temp <- list(mtcars, mtcars)
names(temp) <- paste0('df', seq_along(temp))
list2env(temp, .GlobalEnv)
head(df1)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(df2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
However, note that
list is an internal function in R, so it is better to name your variables something else.
As #MrFlick suggested try to keep your data in a list as lists are easier to manage rather than creating numerous objects in your global environment.

We can use assign instead of noquote from the OP's function
for (i in seq_along(list)) {
assign(paste0("df", i), value = list[[i]])
}

Opposite function to add_rownames in dplyr

As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.

You can now use the tibble-package:
tibble::column_to_rownames()

This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.

If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

lapply and data.frame in R - r

Related

How to create function to use regular expressions to replace column names in a data frame?

write r function to modify value in data frame

write function to replace variable with itself plus 1% of its median

Convert LIst To Dataframe Using For Loop And Saving Under Different Names In R

Opposite function to add_rownames in dplyr

Categories

Resources