difference between 1:10 and c(1:10) - r

I understand that c is used to combine elements. But what is the difference between 1:10 and c(1:10)? I see that the outputs are the same. Shouldn't c(1:10) give an error, because 1:10 already combines all the elements?
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> c(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> class(1:10)
[1] "integer"
> class(c(1:10))
[1] "integer"

If you combine (aka c function) with only one parameter it is the same as the identity (aka not calling the c function). Therefore c(1:10) is the same as 1:10. However you can combine with as many arguments as you want with different type (character,number...). It will convert the type for you.
all.equal(1:10,c(1:5,6:10))
[1] TRUE
all.equal("meow",c("meow"))
[1] TRUE
c(1:5,6:10,"meow")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "meow"
class(c(1:5,6:10,"meow"))
[1] "character"
Another difference is that you can call c with the parameter recursive. As the doc states:
?c
Usage
c(..., recursive = FALSE)
Arguments
...
objects to be concatenated.
recursive
logical. If recursive = TRUE, the function recursively descends through lists (and pairlists) combining all their elements into a vector.

Related

Height conversion in R

I have recently started learning R and I am facing an issue.
I have a column in my data which have height of players in (feet'inches) format.
I want to create a new column for height in centimeters. For this I used the "strsplit" function as below(df is the height column):
l <- strsplit(df,"'",fixed = T)
print(l)
[[1]]
[1] "5" "7"
[[2]]
[1] "6" "2"
[[3]]
[1] "5" "9"
[[4]]
[1] "6" "4"
[[5]]
[1] "5" "11"
[[6]]
[1] "5" "8"
I am getting stuck here as I don't know how to obtain the required value after splitting the field.
I am trying to use the below code but its giving the following error:
p_pos <- grep("'",df)
l[[p_pos]][1]
Error in l[[p_pos]] : recursive indexing failed at level 2
I am expecting the above code to print the values from the first column in the list
5 6 5 6 5 5
>dput(head(df, 10))
c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8")
One way to do this is to create a data frame with a column of feet and a column of inches. The separate function in the tidyr package handles this well - see this answer by its creator.
> library(dplyr)
> library(tidyr)
> df = data.frame(height = c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8"))
> df %>% separate(height, c('feet', 'inches'), "'", convert = TRUE) %>%
+ mutate(cm = (12*feet + inches)*2.54)
feet inches cm
1 5 7 170.18
2 6 2 187.96
3 5 9 175.26
4 6 4 193.04
5 5 11 180.34
6 5 8 172.72
The separate creates a data frame with columns of feet and inches; the mutate does the conversion to centimeters.
This will give you a vector with the heights in centimeters.
We are applying to your whole list a function that turns the number string into numeric and multiplies it with the conversion to cm.
l = list()
l[[1]] = c("5","7")
l[[2]] = c("6","2")
l[[3]] = c("5","9")
l[[4]] = c("6","4")
l[[5]] = c("5","11")
l[[6]] = c("5","8")
sapply(l,function(x) sum(as.numeric(x)*c(30.48,2.54)))
[1] 170.18 187.96 175.26 193.04 180.34 172.72

Issue with If Condition in For Loop

I'm trying to find the largest number of people who did not survive in a dataframe that I am working on. I used a for loop to iterate through the rows but I'm having an issue. It doesn't seem like my if condition is working. It is saying that the largest number is 89 but it is actually 670.
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
This is the output of the printed most_lost
[1] 0
[1] 0
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "387"
[1] "670"
[1] "670"
[1] "670"
[1] "89"
[1] "89"
Here is the table I'm working with
Could you please check the data formats in your table, e.g., is Freq really numeric? With below example data your code works for me - see below code. As a side note, it would be better if you would not post your data as a figure, use, e.g., dput(data) instead and post its output, this makes it easier for others to import your data and check its structure. You might edit your question accordingly.
In any case, I would like to highlight, that for the task you describe you should not use a loop but simply subset your table, since looping will be unacceptably slow for such tasks with larger data sets. I have provided an example at the end of below code.
Titanic = as.data.frame(cbind(Survived = rep("No", 8), Freq = c(1,2,5,0,2,3,1,1)), stringsAsFactors = F)
# Survived Freq
# 1 No 1
# 2 No 2
# 3 No 5
# 4 No 1
# 5 No 2
# 6 No 3
# 7 No 1
# 8 No 1
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
# [1] "1"
# [1] "2"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
max(Titanic[Titanic$Survived == "No", "Freq"])
# [1] "5"
If I'm understanding correctly, you don't need a for loop.
max(Titanic$Freq[Titanic$Survived == "No"])
This line is subsetting the Freq column by rows where the Survived column is "No" and then finding the max value of the subsetted Freq column.

Using lapply to apply function to each row in a tibble

This is my code that attempts apply a function to each row in a tibble , mytib :
> mytib
# A tibble: 3 x 1
value
<chr>
1 1
2 2
3 3
Here is my code where I'm attempting to apply a function to each line in the tibble :
mytib = as_tibble(c("1" , "2" ,"3"))
procLine <- function(f) {
print('here')
print(f)
}
lapply(mytib , procLine)
Using lapply :
> lapply(mytib , procLine)
[1] "here"
[1] "1" "2" "3"
$value
[1] "1" "2" "3"
This output suggests the function is not invoked once per line as I expect the output to be :
here
1
here
2
here
3
How to apply function to each row in tibble ?
Update : I appreciate the supplied answers that allow my expected result but what have I done incorrectly with my implementation ? lapply should apply a function to each element ?
invisible is used to avoid displaying the output. Also you have to loop through elements of the column named 'value', instead of the column as a whole.
invisible( lapply(mytib$value , procLine) )
# [1] "here"
# [1] "1"
# [1] "here"
# [1] "2"
# [1] "here"
# [1] "3"
lapply loops through columns of a data frame by default. See the example below. The values of two columns are printed as a whole in each iteration.
mydf <- data.frame(a = letters[1:3], b = 1:3, stringsAsFactors = FALSE )
invisible(lapply( mydf, print))
# [1] "a" "b" "c"
# [1] 1 2 3
To iterate through each element of a column in a data frame, you have to loop twice like below.
invisible(lapply( mydf, function(x) lapply(x, print)))
# [1] "a"
# [1] "b"
# [1] "c"
# [1] 1
# [1] 2
# [1] 3

Dataframe within dataframe?

Consider this example:
df <- data.frame(id=1:10,var1=LETTERS[1:10],var2=LETTERS[6:15])
fun.split <- function(x) tolower(as.character(x))
df$new.letters <- apply(df[ ,2:3],2,fun.split)
df$new.letters.var1
#NULL
colnames(df)
# [1] "id" "var1" "var2" "new.letters"
df$new.letters
# var1 var2
# [1,] "a" "f"
# [2,] "b" "g"
# [3,] "c" "h"
# [4,] "d" "i"
# [5,] "e" "j"
# [6,] "f" "k"
# [7,] "g" "l"
# [8,] "h" "m"
# [9,] "i" "n"
# [10,] "j" "o"
Would be someone so kind and explain what is going on here? A new dataframe within dataframe?
I expected this:
colnames(df)
# id var1 var2 new.letters.var1 new.letters.var2
The reason is because you assigned a single new column to a 2 column matrix output by apply. So, the result will be a matrix in a single column. You can convert it back to normal data.frame with
do.call(data.frame, df)
A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes. apply returns a matrix and with mixed class, the columns will be 'character' class. But, lapply gets the output in a list and preserves the class
df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)
#akrun solved 90% of my problem. But I had data.frames buried within data.frames, buried within data.frames and so on, without knowing the depth to which this was happening.
In this case, I thought sharing my recursive solution might be helpful to others searching this thread as I was:
unnest_dataframes <- function(x) {
y <- do.call(data.frame, x)
if("data.frame" %in% sapply(y, class)) unnest_dataframes(y)
y
}
new_data <- unnest_dataframes(df)
Although this itself sometimes has problems and it can be helpful to separate all columns of class "data.frame" from the original data set then cbind() it back together like so:
# Find all columns that are data.frame
# Assuming your data frame is stored in variable 'y'
data.frame.cols <- unname(sapply(y, function(x) class(x) == "data.frame"))
z <- y[, !data.frame.cols]
# All columns of class "data.frame"
dfs <- y[, data.frame.cols]
# Recursively unnest each of these columns
unnest_dataframes <- function(x) {
y <- do.call(data.frame, x)
if("data.frame" %in% sapply(y, class)) {
unnest_dataframes(y)
} else {
cat('Nested data.frames successfully unpacked\n')
}
y
}
df2 <- unnest_dataframes(dfs)
# Combine with original data
all_columns <- cbind(z, df2)
In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):
a data frame is a list, with the components of that list being
equal-length vectors
The following code might be useful to understand.
class(df$new.letters)
[1] "matrix"
str(df)
'data.frame': 10 obs. of 4 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10
$ var1 : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
$ var2 : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
$ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "var1" "var2"
Maybe the reason why it looks strange is in the print methods. Consider this:
colnames(df$new.letters)
[1] "var1" "var2"
maybe there must something in the print methods that combine the sub-names of objects and display them all.
For example here the vectors that constitute the df are:
names(df)
[1] "id" "var1" "var2" "new.letters"
but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:
attributes(df$new.letters)
$dim
[1] 10 2
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "var1" "var2"
but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).
Edit: Print methods
Just for curiosity in order to improve this question I looked inside the methods of the print functions:
methods(print)
The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is listof.
getS3method("print", "listof")
function (x, ...)
{
nn <- names(x)
ll <- length(x)
if (length(nn) != ll)
nn <- paste("Component", seq.int(ll))
for (i in seq_len(ll)) {
cat(nn[i], ":\n")
print(x[[i]], ...)
cat("\n")
}
invisible(x)
}
<bytecode: 0x101afe1c8>
<environment: namespace:base>
Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.

R strange apply returns

I use apply to a matrix in order to apply a function row by row.
My syntax is as follows :
res = apply(X,1,MyFunc)
The above function MyFunc returns a list of two values.
But the result of this apply application is a strange structure, where R seems to add some of its own (housekeeping?) data :
res = $`81`
$`81`$a
[1] 80.8078
$`81`$b
[1] 6247
Whereas the result I am waiting for is simply :
res = $a
[1] 80.8078
$b
[1] 6247
I do not know why this strange 81 is inserted by R and how can I get rid of it.
Thanks for help
This is perfectly normal behaviour. You are applying a function over a matrix with named rows. Your function returns a list for each row, and each element in this new list of lists is named with the corresponding rowname.
Here is an example that reproduces what you describe:
x <- matrix(1:4, nrow=2)
rownames(x) <- 80:81
myFunc <- function(x)list(a=1, b=2)
xx <- apply(x, 1, myFunc)
xx
This returns:
$`80`
$`80`$a
[1] 1
$`80`$b
[1] 2
$`81`
$`81`$a
[1] 1
$`81`$b
[1] 2
Take a look at the structure of this list:
str(xx)
List of 2
$ 80:List of 2
..$ a: num 1
..$ b: num 2
$ 81:List of 2
..$ a: num 1
..$ b: num 2
To index the first element, simply use xx[[1]]:
xx[[1]]
$a
[1] 1
$b
[1] 2
Here is a guess as to what you may have intended... Rather than returning a list, if you return a vector, the result of the apply will be a matrix:
myFunc <- function(x)c(a=1, b=2)
apply(x, 1, myFunc)
80 81
a 1 1
b 2 2
And to get a specific row, without names, do:
unname(xx[2, ])
[1] 2 2
It would help to know what your matrix (X) looks like. Let's try something like this:
mf <- function(x) list(a=sum(x),b=prod(x))
mat <- matrix(1:6,nrow=2)
Then:
> apply(mat,1,mf)
[[1]]
[[1]]$a
[1] 9
[[1]]$b
[1] 15
[[2]]
[[2]]$a
[1] 12
[[2]]$b
[1] 48
You need that first subscript to differentiate between the lists that each row will generate. I suspect that your rownames are numbered, which results in the $`81` that you are seeing.

Resources