Calculate length of each object in R - r

I would like to calculate the length of many objects in R and return those objects with the name-prefix 'length_'. However, when I type this code:
A <- c('A', 'B', '3')
B <- c('A', '2')
files <- ls()
for (i in 1:length(files)) assign(paste("length_",files[i], sep = ""), length(unlist(files[i])))
This returns the vectors length_A and length_B, but each with the value 1 and not 3 and 2.
Thank you for any help,
Paul
p.s. I actually would like to apply this to a different function instead of length (GC.content from package ape to calculate GC content of DNA-sequences), but with that function I have the same problem as with the abovementioned example.

In R 3.2.0, the lengths function was introduced which calculates the length of each item of a list. Using this function, as #docendo-discimus notes in the comments above, a super compact (and R-like) solution is
lengths(mget(ls()))
which returns a named vector
A B
3 2
mget returns a list of objects in the environment and is sort of like "multipleget."

A <- c('A', 'B', '3')
B <- c('A', '2')
files <- ls()
for (i in 1:length(files)) assign(paste("length_",files[i], sep = ""), length(get(files[i])))
This create a length_A of value 3 and length_B of value 2.

A <- c('A', 'B', '3')
B <- c('A', '2')
files <- list(A,B)
sapply(files,length)
this will give you the answer but I don't know if it's what you want.

Related

Generalizable function to select and filter dataframe r - using shiny input

I am building a shiny app. The user will need to be able to reduce the data by selecting variables and filtering on specific values for those variables. I am stuck trying to get a generalizable function that can work based on all possible selections.
Here is an example - I skip the shiny code because I think the problem is with the function:
#sample dataframe
df <- data.frame('date' = c(1, 2, 3, 2, 2, 3, 1),
'time' = c('a', 'b', 'c', 'e', 'b', 'a', 'e'),
'place' = c('A', 'A', 'A', 'H', 'A', 'H', 'H'),
'result' = c('W', 'W', 'L', 'W', 'W', 'L', 'L'))
If the user selected date and result for the date values 1, 2; and the result values W, I would do the following:
out <- df %>%
select(date, result) %>%
filter(date %in% c(1,2)) %>%
filter(result %in% c('W'))
The challenge I am having is that the user can select any unique combinations of variables and values. Using the input$ values from my shiny app, I can get the selected variables into a vector and I can get the selected values into a list of values, positionaly matching the selected variables. For example:
selected_variables <- c('date', 'result')
selected_values <- list(c(1,2), c('W'))
What i think i then need is a generalizable function that will match up the filter calls with the correct variables. Something like:
#function that takes data frame, vector of selected variables, list of vectors of chosen values for each variable
#Returns a reduced table of selected variables, filtered values
table_reducer <- function(df, select_var, filter_values) {
#select the variables
out <- df %>%
#now filter each variable by the values contained in the list
select(vect_of_var)
out <- [for loop that iterates over vect_of_var, list_of_vec, filtering accordingly]
out #return out
}
My thinking would be to use a zip equivalent from python, but all my searching on that just points me to mapply and i can't see how to use that within the for loop (which i also know is not always approved in R - but i am talking about a relatively small number of iterations). If there is a better solution to this i would welcome it.
Here's a 1-liner table_reducer function in base R -
table_reducer <- function(df, select_var, filter_values) {
subset(df, Reduce(`&`, Map(`%in%`, df[select_var], filter_values)))
}
selected_variables <- c('date', 'result')
selected_values <- list(c(1,2), c('W'))
table_reducer(df, selected_variables, selected_values)
# date time place result
#1 1 a A W
#2 2 b A W
#4 2 e H W
#5 2 b A W
Map is a wrapper over mapply so you were right in thinking that you should use mapply for this task. This answer is also free of dreaded for loops.

Using similar variable names in R, split/subset a large dataframe into multiple smaller ones

I have a dataset with more than 300 variables in the following manner:
create example data:
id <- c('a','b','c', 'd', 'e', 'f')
type <- c(1,2,3,1,2,3)
x_97 <- c(1,2,3,4,5,6)
y_97 <- c('q','w','r','t', 'y', 'i')
z_97 <- c(80,90,70,50,60,40)
x_98 <- c(7,8,9,4,5,6)
y_98 <- c('y', 'i', 'r','t','q','w')
x_99 <- c(4,5,5,6,1,2)
z_99 <- c(20,10,40,50,20,50)
w_99 <- c(8,9,7,4,5,NA)
my.data <- data.frame(id, type, x_97, y_97, z_97, x_98, y_98, x_99, z_99)
Please note: _97, _98, _99 are years 1997, 1998 and 1999.
expected outcome:
I want to split this big data frame into 3 smaller data frames by year on the basis of id and type.
initial thoughts:
I am creating a list:
my.list <- c("_97", "_98", "_99")
And now I want to write something like this:
newdata97 <- subset(my.data, all variables with the 1st object of my.list)
newdata98 <- subset(my.data, all variables with the 2nd object of my.list)
and so on.
question
I am not sure how to achieve the newdata frames as above. Can anyone please help?
Moreover, I think there must be a more elegant solution to this with something from apply family. Any idea?
Thank you very much for your help.
We can use loop through the 'my.list', use grep to extract the column names that match the substring in 'my.list', cbind with the first two column to create a list of data.frames
lst1 <- lapply(my.list, function(x) cbind(my.data[1:2],
my.data[grep(x, names(my.data))]))
If there is one of the columns among 'x', 'y', 'z' are missing, then can assign it to NA
lst1 <- lapply(lst1, function(x) {nm1 <- setdiff(paste0(c('x', 'y',
'z'), substring(names(x)[3], 2)), names(x)[-(1:2)]); x[nm1] <- NA; x})
Or instead of creating columns later, create NA columns in the 'my.data'
my.data[setdiff(paste0(rep(c("x_", "y_", "z_"), each = 3),
97:99), names(my.data)[-(1:2)])] <- NA
and then use grep as above into creating a list of data.frames
Or another option is split based on the substring of the column names
lst1 <- lapply(split.default(my.data[-(1:2)],
sub(".*_", "", names(my.data)[-(1:2)])), function(x) cbind(my.data[1:2], x))
It is better to keep it as a list, but if we need individual data.frames in the global env, then name the list elements and use list2env (not recommended though)
names(lst1) <- paste0("newdata", substring(my.list, 2))
list2env(lst1, envir = .GlobalEnv)

index vector by value in R

Say I have two character vectors
vec <- c('A', 'B', 'C', 'D', 'E')
pat <- c('D', 'B', 'A')
how do I get the indexes of the occurrences in vec of the values in pat in the order they appear in pat?
I can try
which(vec %in% pat)
but this gives me them in the incorrect order: 1 2 4. I want them as 4 2 1.
I tried different ways to solve this problem before and always found that the easiest way to solve it is the solution as mentioned in #DavidArenburg's comment:
match(pat, vec)
# [1] 4 2 1

R: How can I split my data according to type

I use R and I have a data.frame which contains both numeric and categorical variables.
TR is the name of my data.frame.
I managed to see the output:
for(i in 1:ncol(TR)){ print( is.factor(TR[,i])==TRUE )}
However, I do not succeed in putting these into a usable output like a vector.:
type <- for(i in 1:ncol(TR)){ print( is.factor(TR[,i])==TRUE )}
returns NULL.
If you need to split your data according to type
lst <- lapply(split(colnames(df), sapply(df, class)), function(x) df[x])
list2env can be used to create multiple objects in the global environment. But, I would prefer to work within the list
list2env(setNames(lst, paste0('dat', names(lst))), envir=.GlobalEnv)
head(datfactor,3)
# V2
#1 A
#2 B
#3 C
head(datcharacter,3)
# V1
#1 B
#2 A
#3 C
data
set.seed(24)
df <- data.frame(V1=sample(LETTERS[1:4], 10, replace=TRUE),
V2= factor(rep(LETTERS[1:3], length.out=10)), V3= rnorm(10),
V4=runif(10), stringsAsFactors=FALSE)
You aren't ever returning anything to type.
Try sapply instead:
type <- sapply(1:ncol(TR), function(col.idx) is.factor(TR[,co.idx]))

mapply within ddply

note: this is a direct follow up to this previous question
I have very long dataframe consisting of two columns that I am using as arguments for a function that will find the value of a third column using mapply as so:
df$3rd <- mapply(myfunction, A=df$1st, B=df$2nd)
where myfunction has arguments A and B. While this works great for small datasets, it stalls for large datasets so I was thinking a good way to approach the problem would be to apply this function using ddply. I don't know if ddply is the best approach for this problem but I am also having some trouble with syntax. So suggestions for either would be appreciated.
This is what I am trying:
> df$3rd <- ddply(df, .(1st), function(x) x$3rd <-
> mapply(myfunction, A=x$1st, B=df$second))
and this is the error I am getting:
Error in `$<-.data.frame`(`*tmp*`, "n", value = c(1L, 1L, 1L, 1L, 1L, :
replacement has 112 rows, data has 16
EDIT:
In light of the answer and comments I I am posting a small reproducible example below - it is one of the answers from the previous question. However as the commenters below note, ddply is probably not the way to go. I am trying Ramnath's solution right now.
library(reshape2)
foo <- data.frame(x = c('a', 'a', 'a', 'b', 'b', 'b'),
y = c('ab', 'ac', 'ad', 'ae', 'fx', 'fy'))
bar <- data.frame(x = c('c', 'c', 'c', 'd', 'd', 'd'),
y = c('ab', 'xy', 'xz', 'xy', 'fx', 'xz'))
nShared <- function(A, B) {
length(intersect(with(foo, y[x==A]), with(bar, y[x==B])))
}
# Enumerate all combinations of groups in foo and bar
(combos <- expand.grid(foo.x=unique(foo$x), bar.x=unique(bar$x)))
# Find number of elements in common among all pairs of groups
combos$n <- mapply(nShared, A=combos$foo.x, B=combos$bar.x)
# Reshape results into matrix form
dcast(combos, foo.x ~ bar.x)
# foo.x c d
# 1 a 1 0
# 2 b 0 1
ddply isn't what you're after here, ddply(df,.(1st), FUNCTION) is more like:
for each val in unique(df$1st)
outdf[nrow(outdf)+1,] = FUNCTION( df[df$1st==val] )
That is, it makes outdf consisting of FUNCTION applied to subsets of df determined by column 1st.
In any case, I think your error might be because you have df instead of x in function(x) x$3rd<-mapply(myfunction,A=x$1st, B=df$second) (the B argument)? Although it is hard to tell without a working example.
What exactly does myfunction do? I think your best bet is to vectorise myfunction so that you can just do df$third <- myfunction( A=df$first, B=df$second ).
For example, if myfunction <- function(A,B) { A+B }, instead of doing mapply(myfunction,df$first,df$second) you could equivalently do myfunction(df$first,df$second) and not even need mapply at all.

Resources