e <<- data.env ## here i am storing my rdata
data_frames <- Filter(function(x) is.data.frame(get(x)), ls(envir = e)) ## getting only dataframe
for(i in data_frames) e[[i]] <<- mytest_function(e[[i]]) ### here i am iterating the dataframe
Now, how do I convert the for loop into an apply function? The loop takes so long to iterate.
Ok here some basic demonstration and I think it is a good call to use apply especially because of the environment issues in loops and such.
# lets create some data.frames
df1 <- data.frame(x = LETTERS[1:3], y = rep(1:3))
df2 <- data.frame(x = LETTERS[4:6], y = rep(4:6))
# what df's are we going to "loop" over
data_frames <- c("df1", "df2")
# just some simple function to paste x and y from your df's to a new column z
mytest_function <- function(x) {
df <- get(x)
df$z <- paste(df$x, df$y)
df
}
# apply over your df's and call your function for every df
e <- lapply(data_frames, mytest_function)
# note that e will be a list with data.frames
e
[[1]]
x y z
1 A 1 A 1
2 B 2 B 2
3 C 3 C 3
[[2]]
x y z
1 D 4 D 4
2 E 5 E 5
3 F 6 F 6
# most of the time you want them combined
e <- do.call(rbind, e)
e
x y z
1 A 1 A 1
2 B 2 B 2
3 C 3 C 3
4 D 4 D 4
5 E 5 E 5
6 F 6 F 6
It's unclear what you want the result to be. However, if you are just wanting to apply a function to each column in a dataframe, then you can just use sapply.
sapply(df, function(x) mytest_function(x))
Or you can use the purrr package.
purrr::map(df, function(x) mytest_function(x)) %>%
as.data.frame
If you have a list of a dataframes and are applying a function to each dataframe, then you can also use purrr.
library(purrr)
purrr::map(data_frames, mytest_function)
When you want to convert a loop into an apply function I usually go for lapply but it depends on the situation :
my_f <- function(x) {
mytest_function(e[[x]])
}
my_var <- lapply(1:length(data_frames), my_f)
Related
I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1
Coming from Sum the values according to labels in R.
I've been notified that working with 2 dimensional tables is rather significantly different with 1 dimensional ones, like:
a a,b a,b,c c
d 5 2 1 2
d,e 2 1 1 1
And we want to achieve:
a b c
d 12 5 5
e 4 2 2
So how can this be achieved using R?
A little bit convoluted, but it should work :
m <- as.matrix(data.frame('a'=c(5,2),'a,b'=c(2,1),
'a,b,c'=c(1:1),'c'=c(2,1),
check.names = FALSE,row.names=c('d','d,e')))
colNamesSplits <- strsplit(colnames(m),',')
rowNamesSplits <- strsplit(rownames(m),',')
colNms <- unique(unlist(colNamesSplits))
rowNms <- unique(unlist(rowNamesSplits))
colIdxs <- unlist(sapply(1:length(colNamesSplits),
function(i) rep.int(i,length(colNamesSplits[[i]]))))
rowIdxs <- unlist(sapply(1:length(rowNamesSplits),
function(i) rep.int(i,length(rowNamesSplits[[i]]))))
colIdxsMapped <- unlist(sapply(colNamesSplits, function(n) match(n,colNms)))
rowIdxsMapped <- unlist(sapply(rowNamesSplits, function(n) match(n,rowNms)))
# let's create the fully expanded matrix
expanded <- as.matrix(m[rowIdxs,colIdxs])
rownames(expanded) <- rowNms[rowIdxsMapped]
colnames(expanded) <- colNms[colIdxsMapped]
# aggregate expanded by cols :
expanded <- do.call(cbind,lapply(split(1:ncol(expanded),colnames(expanded)),
function(ii) rowSums(expanded[,ii,drop=FALSE])))
# aggregate expanded by rows :
expanded <- do.call(rbind,lapply(split(1:nrow(expanded),rownames(expanded)),
function(ii) colSums(expanded[ii,,drop=FALSE])))
> expanded
a b c
d 12 5 5
e 4 2 2
I have n number of data.frame i would like to add column to all data.frame
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
test=ls()
for (j in test){
j = cbind(get(j),IssueType=j)
}
Problem that i'm running into is
j = cbind(get(j),IssueType=j)
because it assigns all the data to j instead of a, b.
As commented, it's mostly better to keep related data in a list structure. If you already have the data.frames in your global environment and you want to get them into a list, you can use:
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
This is from here and makes sure that you only get data.frame objects from your global environment.
You will notice that you now already have a named list:
> dflist
# $a
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
#
# $b
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
So you can easily select the data you want by typing for example
dflist[["a"]]
If you still want to create extra columns, you could do it like this:
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
Now, each data.frame in dflist has a new column called IssueType:
> dflist
# $a
# X1.4 X5.8 IssueType
# 1 1 5 a
# 2 2 6 a
# 3 3 7 a
# 4 4 8 a
#
# $b
# X1.4 X5.8 IssueType
# 1 1 5 b
# 2 2 6 b
# 3 3 7 b
# 4 4 8 b
In the future, you can create the data inside a list from the beginning, i.e.
dflist <- list(
a = data.frame(1:4,5:8)
b = data.frame(1:4, 5:8)
)
To create a list of your data.frames do this:
a <- data.frame(1:4,5:8); b <- data.frame(1:4, 5:8); test <- list(a,b)
This allows you to us the lapply function to perform whatever you like to do with each of the dataframes, eg:
out <- lapply(test, function(x) cbind(j))
For most data.frame operations I recommend using the packages dplyr and tidyr.
wooo wooo
here is answer for the issue
helped by #docendo discimus
Created Dataframe
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
Group data.frame into list
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
Add's extra column
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
unstinting the data frame
list2env(dflist ,.GlobalEnv)
Let
x=c(1,2,2,3,4,1)
y=c("A","B","C","D","E","F")
df=data.frame(x,y)
df
x y
1 1 A
2 2 B
3 2 C
4 3 D
5 4 E
6 1 F
How can I put duplicate rows in this data frame in different data frames
like this :
df1
x y
1 A
1 F
df2
x y
2 B
2 C
Thank you for help
You could use split
split(df, f = df$x)
f = df$x is used to specify the grouping column
check ?split for more details
to remove the non duplicated rows you could use
mylist = split(df, f = df$x)[df$x[duplicated(df$x)]]
names(mylist) = c('df1', 'df2')
list2env(mylist,envir=.GlobalEnv) # to separate the data frames
I would like to process all rows in data frame df by applying function f to every row. As function f returns numeric vector with two elements I would like to assign individual elements to new columns in df.
Sample df, trivial function f returning two elements and my trial with using apply
df <- data.frame(a = 1:3, b = 3:5)
f <- function (a, b) {
c(a + b, a * b)
}
df[, c('apb', 'amb')] <- apply(df, 1, function(x) f(a = x[1], b = x[2]))
This does not work results are assigned by columns:
> df
a b apb amb
1 1 3 4 8
2 2 4 3 8
3 3 5 6 15
You could also use Reduce instead of apply as it is generally more efficient. You just need to slightly modify your function to use cbind instead of c
f <- function (a, b) {
cbind(a + b, a * b) # midified to use `cbind` instead of `c`
}
df[c('apb', 'amb')] <- Reduce(f, df)
df
# a b apb amb
# 1 1 3 4 3
# 2 2 4 6 8
# 3 3 5 8 15
Note: This will only work nicely if you have only two columns (as in your example), thus if you have more columns in you data set, run this only on a subset
You need to transpose apply results to get what you want :
df[, c('apb', 'amb')] <- t(apply(df, 1, function(x) f(a = x[1], b = x[2])))
> df
a b apb amb
1 1 3 4 3
2 2 4 6 8
3 3 5 8 15