I have some R command like this
subset(
(aggregate(cbind(var1,var2)~Ei+Mi+hours,a, FUN=mean)),
(aggregate(cbind(var1,var2)~Ei+Mi+hours,a, FUN=mean))$Ei == c(1:EXP)
)
I want to do
1) Ask the user to input the var1 and var2
2) Get those variables into the subset command line as shown above and
continue with other things.
Note: for reading the user input I have variables like
c(ax,bx,cx,dx,ex,fx,gx,hx,ix,jx,kx,lx,mx,nx,ox) = c(1:15) and each
variable is mapped to number 1 to 15. So displaying this for user and
asking the user to select any number between 1 to 15 and then
checking the corresponding variable for the entered number and
reading this into the command line is whats the best method, I think.
So how can I implement this?
Regarding the answer:
Just wondering there is one possible scenario like , if the user wants to enter multiple of numbers in one go. [ex: 1,2,3]...than how to read this using readlines as said in the answer below using
v1 <- quote(var1 <- as.numeric(readline('Enter Variable 1: ')))
eavl(v1)
xx <- paste0(letters[1:15], 'x')
xx[v1]
How to read multiple variables in this case?
Here's a rough example of the readline interactive prompt. When v1 is evaluated, the user will be prompted to enter a value. That value is then stored as var1.
> v1 <- quote(var1 <- as.numeric(readline('Enter Variable 1: ')))
> eval(v1)
Enter Variable 1: 1000 ## user enters 1000, for example
> 100 + var1 + 50 ## example to show captured output as object
## [1] 1150
So in your case it might go something like
> v1 <- quote(var1 <- as.numeric(readline('Enter a number from 1 to 15: ')))
> eval(v1)
Enter a number from 1 to 15: 7
> var1
## [1] 7
> xx <- paste0(letters[1:15], 'x')
> xx
## [1] "ax" "bx" "cx" "dx" "ex" "fx" "gx" "hx" "ix" "jx" "kx" "lx" "mx" "nx" "ox"
> xx[var1]
## [1] "gx"
I borrowed this idea for a function from this older SO post. You can return the output invisibly and it will still take in the user values.
input.fun <- function(){
v1 <- readline("var1: ")
v2 <- readline("var2: ")
v3 <- readline("var3: ")
v4 <- readline("var4: ")
v5 <- readline("var5: ")
out <- sapply(c(v1, v2, v3, v4, v5), as.numeric, USE.NAMES = FALSE)
invisible(out)
}
> x <- input.fun()
var1: 7
var2: 4
var3: 8
var4: 5
var5: 2
> x
[1] 7 4 8 5 2
In response to your edit: I'm not sure if this is the standard method for reading multiple numbers in one line, but it works.
> xx <- readline('Enter numbers separated by a space: ')
Enter numbers separated by a space: 4 12 67 9 2
> as.numeric(strsplit(xx, ' ')[[1]])
## [1] 4 12 67 9 2
Here's a possibility using scan()
#sample data
df<-data.frame(
ax=runif(50),
bx=runif(50),
cx=runif(50),
dx=runif(50),
Ei=sample(letters[1:5], 50, replace=T)
)
#get vars
vars<-c(NA,NA)
while(any(is.na(vars))) {
cat(paste("enter var number", sum(!is.na(vars))+1),"\n")
cat(paste(seq_along(names(df)), ":", names(df)), sep="\n")
try(n<-scan(what=integer(), nmax=1), silent=T)
vars[min(which(is.na(vars)))]<-n
}
#--pause
#use vars
subset(aggregate(df[,vars], df[,c("Ei"), drop=F], FUN=mean), Ei=="a")
It's not super robust, but if you copy the first half (before the pause) it will ask you for two variable numbers, and then if you run the second half, it will use those two values. I've adjusted the aggregate and subset to be more appropriate for variable usage which means not using the formula syntax.
I did not do any error checking. That's left as an exercise for the asker.
Related
I have a list of data.frames. I want to send each data.frame to a function using lapply. Inside the function I want to check whether the name of a data.frame includes a particular string. If the string in question is present I want to perform one series of operations. Otherwise I want to perform a different series of operations. I cannot figure out how to check whether the string in question is present from within the function.
I wish to use base R. This seems to be a possible solution but I cannot get it to work:
In R, how to get an object's name after it is sent to a function?
Here is an example list followed by an example function further below.
matrix.apple1 <- read.table(text = '
X3 X4 X5
1 1 1
1 1 1
', header = TRUE)
matrix.apple2 <- read.table(text = '
X3 X4 X5
1 1 1
2 2 2
', header = TRUE)
matrix.orange1 <- read.table(text = '
X3 X4 X5
10 10 10
20 20 20
', header = TRUE)
my.list <- list(matrix.apple1 = matrix.apple1,
matrix.orange1 = matrix.orange1,
matrix.apple2 = matrix.apple2)
This operation can check whether each object name contains the string apples
but I am not sure how to use this information inside the function further below.
grepl('apple', names(my.list), fixed = TRUE)
#[1] TRUE FALSE TRUE
Here is an example function. Based on hours of searching and trial-and-error I perhaps am supposed to use deparse(substitute(x)) but so far it only returns x or something similar.
table.function <- function(x) {
# The three object names are:
# 'matrix.apple1', 'matrix.orange1' and 'matrix.apple2'
myObjectName <- deparse(substitute(x))
print(myObjectName)
# perform a trivial example operation on a data.frame
my.table <- table(as.matrix(x))
# Test whether an object name contains the string 'apple'
contains.apple <- grep('apple', myObjectName, fixed = TRUE)
# Use the result of the above test to perform a trivial example operation.
# With my code 'my.binomial' is always given the value of 0 even though
# 'apple' appears in the name of two of the data.frames.
my.binomial <- ifelse(contains.apple == 1, 1, 0)
return(list(my.table = my.table, my.binomial = my.binomial))
}
table.function.output <- lapply(my.list, function(x) table.function(x))
These are the results of print(myObjectName):
#[1] "x"
#[1] "x"
#[1] "x"
table.function.output
Here are the rest of the results of table.function showing that my.binomial is always 0.
The first and third value of my.binomial should be 1 because the names of the first and third data.frames contain the string apple.
# $matrix.apple1
# $matrix.apple1$my.table
# 1
# 6
# $matrix.apple1$my.binomial
# logical(0)
#
# $matrix.orange1
# $matrix.orange1$my.table
# 10 20
# 3 3
# $matrix.orange1$my.binomial
# logical(0)
#
# $matrix.apple2
# $matrix.apple2$my.table
# 1 2
# 3 3
# $matrix.apple2$my.binomial
# logical(0)
You could redesign your function to use the list names instead:
table_function <- function(myObjectName) {
# The three object names are:
# 'matrix.apple1', 'matrix.orange1' and 'matrix.apple2'
myObject <- get(myObjectName)
print(myObjectName)
# perform a trivial example operation on a data.frame
my.table <- table(as.matrix(myObject))
# Test whether an object name contains the string 'apple'
contains.apple <- grep('apple', myObjectName, fixed = TRUE)
# Use the result of the above test to perform a trivial example operation.
# With my code 'my.binomial' is always given the value of 0 even though
# 'apple' appears in the name of two of the data.frames.
my.binomial <- +(contains.apple == 1)
return(list(my.table = my.table, my.binomial = my.binomial))
}
lapply(names(my.list), table_function)
This returns
[[1]]
[[1]]$my.table
1
6
[[1]]$my.binomial
[1] 1
[[2]]
[[2]]$my.table
10 20
3 3
[[2]]$my.binomial
integer(0)
[[3]]
[[3]]$my.table
1 2
3 3
[[3]]$my.binomial
[1] 1
If you want to keep the list names, you could use
sapply(names(my.list), table_function, simplify = FALSE, USE.NAMES = TRUE)
instead of lapply.
Use Map and pass both list data and it's name to the function. Change your function to accept two arguments.
table.function <- function(data, name) {
# The three object names are:
# 'matrix.apple1', 'matrix.orange1' and 'matrix.apple2'
print(name)
# perform a trivial example operation on a data.frame
my.table <- table(as.matrix(data))
# Test whether an object name contains the string 'apple'
contains.apple <- grep('apple', name, fixed = TRUE)
# Use the result of the above test to perform a trivial example operation.
# With my code 'my.binomial' is always given the value of 0 even though
# 'apple' appears in the name of two of the data.frames.
my.binomial <- as.integer(contains.apple == 1)
return(list(my.table = my.table, my.binomial = my.binomial))
}
Map(table.function, my.list, names(my.list))
#[1] "matrix.apple1"
#[1] "matrix.orange1"
#[1] "matrix.apple2"
#$matrix.apple1
#$matrix.apple1$my.table
#1
#6
#$matrix.apple1$my.binomial
#[1] 1
#$matrix.orange1
#$matrix.orange1$my.table
#10 20
# 3 3
#$matrix.orange1$my.binomial
#integer(0)
#...
#...
The same functionality is provided by imap in purrr where you don't need to explicitly pass the names.
purrr::imap(my.list, table.function)
How can the CJ-command be run with string as input? The following MNWE illustrates what is needed:
library(data.table)
# This is the desired output (when needed.cols==2)
dt.wanted <- CJ(X.1=c(1L, 2L), X.2=c(1L, 2L))
# Here is an example with needed.cols as variable
needed.cols <- 2L
use.text <- paste0("X.", 1L:needed.cols, "=c(1L, 2L)", collapse=", ")
# Here are some failing attempts
dt.fail <- CJ(use.text)
dt.fail <- CJ(eval(use.text))
dt.fail <- CJ(get(use.text))
So it is the use.text I want to make scriptable (because it varies, not only with needed.cols).
IIUC, you are looking for a function to pass a list of arguments into ... of a function. You can do it using do.call as follows:
do.call(CJ, eval(parse(text=paste0("list(",use.text,")"))))
Hope that is what you are looking for...
The get-function is the standard way of promoting a character value to a true R name value.
Is this what you want:
col.wanted =2
dt.wanted[ , get(paste0("X.", col.wanted) )]
#[1] 1 2 1 2
Getting multiple columns based on evaluation of a more complex expression might require somewhat more baroque efforts:
> use.text <- paste0("list(", paste0("X.", 1L:needed.cols, collapse=", "),")")
> use.text
[1] "list(X.1, X.2)"
> dt.wanted[ , eval(use.text)]
[1] "list(X.1, X.2)"
> dt.wanted[ , parse(text=use.text)]
expression(list(X.1, X.2))
> dt.wanted[ , eval(parse(text=use.text))]
X.1 X.2
1: 1 1
2: 1 2
3: 2 1
4: 2 2
I've figured out if I use as.character(df[x,y]) or as.<whatever>df[x,y] I can get/coerce what I need, every time from my data frames
What I cant seem to find/figure out is why. Details below.
When I access df[1,1] (or anything in column 1) I get
df[1,1]
[1] a
Levels: a b c
but when I access 1,3 it works fine
> df[1,3]
[1] 10
but then when I use as.character() it works.
> as.character(df[1,1])
[1] "a"
The data frame was built using this line
df = data.frame(names = c("a","b","c"), size = c(1,2,3),num = c(10,20,30) )
> df
names size num
1 a 1 10
2 b 2 20
3 c 3 30
But in this data frame
imp2met = read.csv('tomet.csv', header = TRUE, sep=",",dec='.')
> imp2met
unit mult ret
1 (yd) 0.9100 (m)
2 (in) 2.5200 (cm)
3 .....
I get these results for 1,3
> imp2met[1,3]
[1] (m)
Levels: (c) (cm) (cm^2) ....
>
> as.character(imp2met[1,3])
[1] "(m)"
So why the "random" results? Why do I need as.<whatever>() but only some of the time?
data.frame default is to convert character vectors to factors. You can change this with the argument stringsAsFactors=FALSE
Also, when you subset a dataframe using [, you can add the drop=FALSE argument to simplify the results in some cases.
In R I'd like to take a collection of file names in the format below and return the number to the right of the second underscore (this will always be a number) and the text string to the right of the third underscore (this will be combinations of letters and numbers).
I have file names in this format:
HELP_PLEASE_4_ME
I want to extract the number 4 and the text ME
I'd then like to create a new field within my data frame where these two types of data can be stored. Any suggestions?
Here is an option using regexec and regmatches to pull out the patterns:
matches <- regmatches(df$a, regexec("^.*?_.*?_([0-9]+)_([[:alnum:]]+)$", df$a))
df[c("match.1", "match.2")] <- t(sapply(matches, `[`, -1)) # first result for each match is full regular expression so need to drop that.
Produces:
a match.1 match.2
1 HELP_PLEASE_4_ME 4 ME
2 SOS_WOW_3_Y34OU 3 Y34OU
This will break if any rows don't have the expected structure, but I think that is what you want to happen (i.e. be alerted that your data is not what you think it is). strsplit based approaches will require additional checking to ensure that your data is what you think it is.
And the data:
df <- data.frame(a=c("HELP_PLEASE_4_ME", "SOS_WOW_3_Y34OU"), stringsAsFactors=F)
The obligatory stringr version of #BrodieG's quite spiffy answer:
df[c("match.1", "match.2")] <-
t(sapply(str_match_all(df$a, "^.*?_.*?_([0-9]+)_([[:alnum:]]+)$"), "[", 2:3))
Put here for context only. You should accept BrodieG's answer.
Since you already know that you want the text that comes after the second and third underscore, you could use strsplit and take the third and fourth result.
> x <- "HELP_PLEASE_4_ME"
> spl <- unlist(strsplit(x, "_"))[3:4]
> data.frame(string = x, under2 = spl[1], under3 = spl[2])
## string under2 under3
## 1 HELP_PLEASE_4_ME 4 ME
Then for longer vectors, you could do something like the last two lines here.
## set up some data
> word1 <- c("HELLO", "GOODBYE", "HI", "BYE")
> word2 <- c("ONE", "TWO", "THREE", "FOUR")
> nums <- 20:23
> word3 <- c("ME", "YOU", "THEM", "US")
> XX <-paste0(word1, "_", word2, "_", nums, "_", word3)
> XX
## [1] "HELLO_ONE_20_ME" "GOODBYE_TWO_21_YOU"
## [3] "HI_THREE_22_THEM" "BYE_FOUR_23_US"
## ------------------------------------------------
## process it
> spl <- do.call(rbind, strsplit(XX, "_"))[, 3:4]
> data.frame(cbind(XX, spl))
## XX V2 V3
## 1 HELLO_ONE_20_ME 20 ME
## 2 GOODBYE_TWO_21_YOU 21 YOU
## 3 HI_THREE_22_THEM 22 THEM
## 4 BYE_FOUR_23_US 23 US
I am trying to figure out why the rbind function is not working as intended when joining data.frames without names.
Here is my testing:
test <- data.frame(
id=rep(c("a","b"),each=3),
time=rep(1:3,2),
black=1:6,
white=1:6,
stringsAsFactors=FALSE
)
# take some subsets with different names
pt1 <- test[,c(1,2,3)]
pt2 <- test[,c(1,2,4)]
# method 1 - rename to same names - works
names(pt2) <- names(pt1)
rbind(pt1,pt2)
# method 2 - works - even with duplicate names
names(pt1) <- letters[c(1,1,1)]
names(pt2) <- letters[c(1,1,1)]
rbind(pt1,pt2)
# method 3 - works - with a vector of NA's as names
names(pt1) <- rep(NA,ncol(pt1))
names(pt2) <- rep(NA,ncol(pt2))
rbind(pt1,pt2)
# method 4 - but... does not work without names at all?
pt1 <- unname(pt1)
pt2 <- unname(pt2)
rbind(pt1,pt2)
This seems a bit odd to me. Am I missing a good reason why this shouldn't work out of the box?
edit for additional info
Using #JoshO'Brien's suggestion to debug, I can identify the error as occurring during this if statement part of the rbind.data.frame function
if (is.null(pi) || is.na(jj <- pi[[j]]))
(online version of code here: http://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R starting at: "### Here are the methods for rbind and cbind.")
From stepping through the program, the value of pi does not appear to have been set at this point, hence the program tries to index the built-in constant pi like pi[[3]] and errors out.
From what I can figure, the internal pi object doesn't appear to be set due to this earlier line where clabs has been initialized as NULL:
if (is.null(clabs)) clabs <- names(xi) else { #pi gets set here
I am in a tangle trying to figure this out, but will update as it comes together.
Because unname() & explicitly assigning NA as column headers are not identical actions. When the column names are all NA, then an rbind() is possible. Since rbind() takes the names/colnames of the data frame, the results do not match & hence rbind() fails.
Here is some code to help see what I mean:
> c1 <- c(1,2,3)
> c2 <- c('A','B','C')
> df1 <- data.frame(c1,c2)
> df1
c1 c2
1 1 A
2 2 B
3 3 C
> df2 <- data.frame(c1,c2) # df1 & df2 are identical
>
> #Let's perform unname on one data frame &
> #replacement with NA on the other
>
> unname(df1)
NA NA
1 1 A
2 2 B
3 3 C
> tem1 <- names(unname(df1))
> tem1
NULL
>
> #Please note above that the column headers though showing as NA are null
>
> names(df2) <- rep(NA,ncol(df2))
> df2
NA NA
1 1 A
2 2 B
3 3 C
> tem2 <- names(df2)
> tem2
[1] NA NA
>
> #Though unname(df1) & df2 look identical, they aren't
> #Also note difference in tem1 & tem2
>
> identical(unname(df1),df2)
[1] FALSE
>
I hope this helps. The names show up as NA each, but the two operations are different.
Hence, two data frames with their column headers replaced to NA can be "rbound" but two data frames without any column headers (achieved using unname()) cannot.