This is my code that attempts apply a function to each row in a tibble , mytib :
> mytib
# A tibble: 3 x 1
value
<chr>
1 1
2 2
3 3
Here is my code where I'm attempting to apply a function to each line in the tibble :
mytib = as_tibble(c("1" , "2" ,"3"))
procLine <- function(f) {
print('here')
print(f)
}
lapply(mytib , procLine)
Using lapply :
> lapply(mytib , procLine)
[1] "here"
[1] "1" "2" "3"
$value
[1] "1" "2" "3"
This output suggests the function is not invoked once per line as I expect the output to be :
here
1
here
2
here
3
How to apply function to each row in tibble ?
Update : I appreciate the supplied answers that allow my expected result but what have I done incorrectly with my implementation ? lapply should apply a function to each element ?
invisible is used to avoid displaying the output. Also you have to loop through elements of the column named 'value', instead of the column as a whole.
invisible( lapply(mytib$value , procLine) )
# [1] "here"
# [1] "1"
# [1] "here"
# [1] "2"
# [1] "here"
# [1] "3"
lapply loops through columns of a data frame by default. See the example below. The values of two columns are printed as a whole in each iteration.
mydf <- data.frame(a = letters[1:3], b = 1:3, stringsAsFactors = FALSE )
invisible(lapply( mydf, print))
# [1] "a" "b" "c"
# [1] 1 2 3
To iterate through each element of a column in a data frame, you have to loop twice like below.
invisible(lapply( mydf, function(x) lapply(x, print)))
# [1] "a"
# [1] "b"
# [1] "c"
# [1] 1
# [1] 2
# [1] 3
Related
I'm looking for a simple way to check if values in an R data frame have comma (or any character for that matter).
Let's suppose I have the following data frame:
df <- data.frame(A = c("apple","orange", "banana","strawberries"),
B = c(23,12,10,15),
C = c("2,53", "1.35","0,25","1,44"))
If I know the column with commas in it I use this:
which(grepl(",",df$C))
length(which(grepl(",",df$C)))
However, I want an output as the one above but not specifying the column of my dataframe.
Any suggestions?
You need to simply go through all three columns; sapply works here:
sapply(df, grep, pattern = ",")
##output:
# $A
# integer(0)
#
# $B
# integer(0)
#
# $C
# [1] 1 3 4
To get the length you can do this:
sapply(sapply(df, grep, pattern = ","), length)
# A B C D
# 0 0 3 0
Somewhat simpler to grasp solution; first, convert your data frame to vector.
df2vector <- as.vector(t(df))
df2vector
# [1] "apple" "23" "2,53" "orange" "12"
# [6] "1.35" "banana" "10" "0,25" "strawberries"
# [11] "15" "1,44"
Then use your approach.
length(which(grepl(",",df2vector)))
# [1] 3
I'm trying to find the largest number of people who did not survive in a dataframe that I am working on. I used a for loop to iterate through the rows but I'm having an issue. It doesn't seem like my if condition is working. It is saying that the largest number is 89 but it is actually 670.
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
This is the output of the printed most_lost
[1] 0
[1] 0
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "387"
[1] "670"
[1] "670"
[1] "670"
[1] "89"
[1] "89"
Here is the table I'm working with
Could you please check the data formats in your table, e.g., is Freq really numeric? With below example data your code works for me - see below code. As a side note, it would be better if you would not post your data as a figure, use, e.g., dput(data) instead and post its output, this makes it easier for others to import your data and check its structure. You might edit your question accordingly.
In any case, I would like to highlight, that for the task you describe you should not use a loop but simply subset your table, since looping will be unacceptably slow for such tasks with larger data sets. I have provided an example at the end of below code.
Titanic = as.data.frame(cbind(Survived = rep("No", 8), Freq = c(1,2,5,0,2,3,1,1)), stringsAsFactors = F)
# Survived Freq
# 1 No 1
# 2 No 2
# 3 No 5
# 4 No 1
# 5 No 2
# 6 No 3
# 7 No 1
# 8 No 1
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
# [1] "1"
# [1] "2"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
max(Titanic[Titanic$Survived == "No", "Freq"])
# [1] "5"
If I'm understanding correctly, you don't need a for loop.
max(Titanic$Freq[Titanic$Survived == "No"])
This line is subsetting the Freq column by rows where the Survived column is "No" and then finding the max value of the subsetted Freq column.
Consider the following list:
temp <- list(1, "a", TRUE)
We can use sapply to replicate the list:
> ts <- sapply(1:5, function(x) temp)
> ts
[,1] [,2] [,3] [,4] [,5]
id 1 1 1 1 1
grade "a" "a" "a" "a" "a"
alive TRUE TRUE TRUE TRUE TRUE
If I inspect the result using typeof, I obtain list. However, if I inspect it with sapply, I get this:
> sapply(ts, function(x) print(x))
[1] 1
[1] "a"
[1] TRUE
[1] 1
[1] "a"
[1] TRUE
[1] 1
[1] "a"
[1] TRUE
[1] 1
[1] "a"
[1] TRUE
[1] 1
[1] "a"
[1] TRUE
That is, when I inspect the same result with sapply, this vector of lists is treated as a matrix. Is there any workaround, or does R disallow a vector of lists in general? If the latter is the case, why do I get "list" from typeof?
PS: For my specific question, I understand the obvious solution of using lapply to switch to a list of lists. I am just curious and confused by R’s behavior.
The return of sapply(ts, function(x) print(x)) is still a list. Actually a list of 15 variables as 3 members of temp has been simplified and returned as 3 items (times 5 iterations). If you want something like lapply like output please try:
>ts <- sapply(1:5, function(x) temp, simplify = FALSE)
> ts
#[[1]]
#[[1]][[1]]
#[1] 1
#
#[[1]][[2]]
#[1] "a"
#
#[[1]][[3]]
#[1] TRUE
#.......
#.......
Or even you can try:
>ts <- sapply(1:5, function(x) as.data.frame(temp))
Consider this example:
df <- data.frame(id=1:10,var1=LETTERS[1:10],var2=LETTERS[6:15])
fun.split <- function(x) tolower(as.character(x))
df$new.letters <- apply(df[ ,2:3],2,fun.split)
df$new.letters.var1
#NULL
colnames(df)
# [1] "id" "var1" "var2" "new.letters"
df$new.letters
# var1 var2
# [1,] "a" "f"
# [2,] "b" "g"
# [3,] "c" "h"
# [4,] "d" "i"
# [5,] "e" "j"
# [6,] "f" "k"
# [7,] "g" "l"
# [8,] "h" "m"
# [9,] "i" "n"
# [10,] "j" "o"
Would be someone so kind and explain what is going on here? A new dataframe within dataframe?
I expected this:
colnames(df)
# id var1 var2 new.letters.var1 new.letters.var2
The reason is because you assigned a single new column to a 2 column matrix output by apply. So, the result will be a matrix in a single column. You can convert it back to normal data.frame with
do.call(data.frame, df)
A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes. apply returns a matrix and with mixed class, the columns will be 'character' class. But, lapply gets the output in a list and preserves the class
df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)
#akrun solved 90% of my problem. But I had data.frames buried within data.frames, buried within data.frames and so on, without knowing the depth to which this was happening.
In this case, I thought sharing my recursive solution might be helpful to others searching this thread as I was:
unnest_dataframes <- function(x) {
y <- do.call(data.frame, x)
if("data.frame" %in% sapply(y, class)) unnest_dataframes(y)
y
}
new_data <- unnest_dataframes(df)
Although this itself sometimes has problems and it can be helpful to separate all columns of class "data.frame" from the original data set then cbind() it back together like so:
# Find all columns that are data.frame
# Assuming your data frame is stored in variable 'y'
data.frame.cols <- unname(sapply(y, function(x) class(x) == "data.frame"))
z <- y[, !data.frame.cols]
# All columns of class "data.frame"
dfs <- y[, data.frame.cols]
# Recursively unnest each of these columns
unnest_dataframes <- function(x) {
y <- do.call(data.frame, x)
if("data.frame" %in% sapply(y, class)) {
unnest_dataframes(y)
} else {
cat('Nested data.frames successfully unpacked\n')
}
y
}
df2 <- unnest_dataframes(dfs)
# Combine with original data
all_columns <- cbind(z, df2)
In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):
a data frame is a list, with the components of that list being
equal-length vectors
The following code might be useful to understand.
class(df$new.letters)
[1] "matrix"
str(df)
'data.frame': 10 obs. of 4 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10
$ var1 : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
$ var2 : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
$ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "var1" "var2"
Maybe the reason why it looks strange is in the print methods. Consider this:
colnames(df$new.letters)
[1] "var1" "var2"
maybe there must something in the print methods that combine the sub-names of objects and display them all.
For example here the vectors that constitute the df are:
names(df)
[1] "id" "var1" "var2" "new.letters"
but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:
attributes(df$new.letters)
$dim
[1] 10 2
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "var1" "var2"
but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).
Edit: Print methods
Just for curiosity in order to improve this question I looked inside the methods of the print functions:
methods(print)
The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is listof.
getS3method("print", "listof")
function (x, ...)
{
nn <- names(x)
ll <- length(x)
if (length(nn) != ll)
nn <- paste("Component", seq.int(ll))
for (i in seq_len(ll)) {
cat(nn[i], ":\n")
print(x[[i]], ...)
cat("\n")
}
invisible(x)
}
<bytecode: 0x101afe1c8>
<environment: namespace:base>
Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.
I have a rather simple task but haven't find a good solution.
> mylist
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
[[3]]
[1] 25 26 27 28 29 30 31 32
y <- c(3,5,9)
I would like to extract from mylist the sub-elements 3,5, and 9 of each component in the list.
I have tried, sapply[mylist,"[[",y] but not luck!, and others like vapply, lapply, etc..
You could use sapply(mylist, "[", y):
mylist <- list(1:5, 6:10, 11:15)
sapply(mylist, "[", c(2,3))
Try using [ instead of [[ (and depending on what you're after you light actually want lapply).
From ?'[[':
The most important distinction between [, [[ and $ is that the [ can
select more than one element whereas the other two select a single
element.
Using lapply:
# create mylist
list1<-1:10
list2<-letters[1:26]
list3<-25:32
mylist<-list(list1,list2,list3)
# select 3,5,9th element from each list
list.2 <- lapply(mylist, function(x) {x[c(3,5,9)]})
purrr provides another solution for solving these kinds of list manipulations within the tidyverse
library(purrr)
library(dplyr)
desired_values <- c(1,3)
mylist <- list(1:5, letters[1:6], 11:15) %>%
purrr::map(`[`,desired_values)
mylist
An easy way to subset repeated named elements of a list, similar to other answers here.
(so I can find it next time I look this question up)
E.g., subset the "b" elements from a repeating list where each element includes an "a" and "b" sub-element:
mylist <- list(
list(
"a" = runif(3),
"b" = runif(1)
),
list(
"a" = runif(3),
"b" = runif(1)
)
)
mylist
#> [[1]]
#> [[1]]$a
#> [1] 0.7547490 0.6528348 0.2339767
#>
#> [[1]]$b
#> [1] 0.8815888
#>
#>
#> [[2]]
#> [[2]]$a
#> [1] 0.51352909 0.09637425 0.99291650
#>
#> [[2]]$b
#> [1] 0.8407162
blist <- lapply(
X = mylist,
FUN = function(x){x[["b"]]}
)
blist
#> [[1]]
#> [1] 0.8815888
#>
#> [[2]]
#> [1] 0.8407162
Created on 2019-11-06 by the reprex package (v0.3.0)
I don't think sgibb's answer gives what you would want. I suggest making a new function:
subsetList <- function(myList, elementNames) {
lapply(elementNames, FUN=function(x) myList[[x]])
}
Then you can use it like this:
x <- list(a=3, b="hello", c=4.5, d="world")
subsetList(x, c("d", "a"))
subsetList(x, c(4, 1))
These both give
[[1]]
[1] "world"
[[2]]
[1] 3
which is what you would want, I think.
There are better ways of doing this, but here's a quick solution.
# your values
list1<-1:10
list2<-letters[1:26]
list3<-25:32
# put 'em together in a list
mylist<-list(list1,list2,list3)
# function
foo<-function(x){x[c(3,5,9)]}
# apply function to each of the element in the list
foo(mylist[[1]])
foo(mylist[[2]])
foo(mylist[[3]])
# check the output
> foo(mylist[[1]])
[1] 3 5 9
> foo(mylist[[2]])
[1] "c" "e" "i"
> foo(mylist[[3]])
[1] 27 29 NA