I have recently started learning R and I am facing an issue.
I have a column in my data which have height of players in (feet'inches) format.
I want to create a new column for height in centimeters. For this I used the "strsplit" function as below(df is the height column):
l <- strsplit(df,"'",fixed = T)
print(l)
[[1]]
[1] "5" "7"
[[2]]
[1] "6" "2"
[[3]]
[1] "5" "9"
[[4]]
[1] "6" "4"
[[5]]
[1] "5" "11"
[[6]]
[1] "5" "8"
I am getting stuck here as I don't know how to obtain the required value after splitting the field.
I am trying to use the below code but its giving the following error:
p_pos <- grep("'",df)
l[[p_pos]][1]
Error in l[[p_pos]] : recursive indexing failed at level 2
I am expecting the above code to print the values from the first column in the list
5 6 5 6 5 5
>dput(head(df, 10))
c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8")
One way to do this is to create a data frame with a column of feet and a column of inches. The separate function in the tidyr package handles this well - see this answer by its creator.
> library(dplyr)
> library(tidyr)
> df = data.frame(height = c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8"))
> df %>% separate(height, c('feet', 'inches'), "'", convert = TRUE) %>%
+ mutate(cm = (12*feet + inches)*2.54)
feet inches cm
1 5 7 170.18
2 6 2 187.96
3 5 9 175.26
4 6 4 193.04
5 5 11 180.34
6 5 8 172.72
The separate creates a data frame with columns of feet and inches; the mutate does the conversion to centimeters.
This will give you a vector with the heights in centimeters.
We are applying to your whole list a function that turns the number string into numeric and multiplies it with the conversion to cm.
l = list()
l[[1]] = c("5","7")
l[[2]] = c("6","2")
l[[3]] = c("5","9")
l[[4]] = c("6","4")
l[[5]] = c("5","11")
l[[6]] = c("5","8")
sapply(l,function(x) sum(as.numeric(x)*c(30.48,2.54)))
[1] 170.18 187.96 175.26 193.04 180.34 172.72
Related
I'm looking for a simple way to check if values in an R data frame have comma (or any character for that matter).
Let's suppose I have the following data frame:
df <- data.frame(A = c("apple","orange", "banana","strawberries"),
B = c(23,12,10,15),
C = c("2,53", "1.35","0,25","1,44"))
If I know the column with commas in it I use this:
which(grepl(",",df$C))
length(which(grepl(",",df$C)))
However, I want an output as the one above but not specifying the column of my dataframe.
Any suggestions?
You need to simply go through all three columns; sapply works here:
sapply(df, grep, pattern = ",")
##output:
# $A
# integer(0)
#
# $B
# integer(0)
#
# $C
# [1] 1 3 4
To get the length you can do this:
sapply(sapply(df, grep, pattern = ","), length)
# A B C D
# 0 0 3 0
Somewhat simpler to grasp solution; first, convert your data frame to vector.
df2vector <- as.vector(t(df))
df2vector
# [1] "apple" "23" "2,53" "orange" "12"
# [6] "1.35" "banana" "10" "0,25" "strawberries"
# [11] "15" "1,44"
Then use your approach.
length(which(grepl(",",df2vector)))
# [1] 3
I'm trying to find the largest number of people who did not survive in a dataframe that I am working on. I used a for loop to iterate through the rows but I'm having an issue. It doesn't seem like my if condition is working. It is saying that the largest number is 89 but it is actually 670.
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
This is the output of the printed most_lost
[1] 0
[1] 0
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "387"
[1] "670"
[1] "670"
[1] "670"
[1] "89"
[1] "89"
Here is the table I'm working with
Could you please check the data formats in your table, e.g., is Freq really numeric? With below example data your code works for me - see below code. As a side note, it would be better if you would not post your data as a figure, use, e.g., dput(data) instead and post its output, this makes it easier for others to import your data and check its structure. You might edit your question accordingly.
In any case, I would like to highlight, that for the task you describe you should not use a loop but simply subset your table, since looping will be unacceptably slow for such tasks with larger data sets. I have provided an example at the end of below code.
Titanic = as.data.frame(cbind(Survived = rep("No", 8), Freq = c(1,2,5,0,2,3,1,1)), stringsAsFactors = F)
# Survived Freq
# 1 No 1
# 2 No 2
# 3 No 5
# 4 No 1
# 5 No 2
# 6 No 3
# 7 No 1
# 8 No 1
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
# [1] "1"
# [1] "2"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
max(Titanic[Titanic$Survived == "No", "Freq"])
# [1] "5"
If I'm understanding correctly, you don't need a for loop.
max(Titanic$Freq[Titanic$Survived == "No"])
This line is subsetting the Freq column by rows where the Survived column is "No" and then finding the max value of the subsetted Freq column.
This is my code that attempts apply a function to each row in a tibble , mytib :
> mytib
# A tibble: 3 x 1
value
<chr>
1 1
2 2
3 3
Here is my code where I'm attempting to apply a function to each line in the tibble :
mytib = as_tibble(c("1" , "2" ,"3"))
procLine <- function(f) {
print('here')
print(f)
}
lapply(mytib , procLine)
Using lapply :
> lapply(mytib , procLine)
[1] "here"
[1] "1" "2" "3"
$value
[1] "1" "2" "3"
This output suggests the function is not invoked once per line as I expect the output to be :
here
1
here
2
here
3
How to apply function to each row in tibble ?
Update : I appreciate the supplied answers that allow my expected result but what have I done incorrectly with my implementation ? lapply should apply a function to each element ?
invisible is used to avoid displaying the output. Also you have to loop through elements of the column named 'value', instead of the column as a whole.
invisible( lapply(mytib$value , procLine) )
# [1] "here"
# [1] "1"
# [1] "here"
# [1] "2"
# [1] "here"
# [1] "3"
lapply loops through columns of a data frame by default. See the example below. The values of two columns are printed as a whole in each iteration.
mydf <- data.frame(a = letters[1:3], b = 1:3, stringsAsFactors = FALSE )
invisible(lapply( mydf, print))
# [1] "a" "b" "c"
# [1] 1 2 3
To iterate through each element of a column in a data frame, you have to loop twice like below.
invisible(lapply( mydf, function(x) lapply(x, print)))
# [1] "a"
# [1] "b"
# [1] "c"
# [1] 1
# [1] 2
# [1] 3
I have a list of lists of strings as follows:
> ll
[[1]]
[1] "2" "1"
[[2]]
character(0)
[[3]]
[1] "1"
[[4]]
[1] "1" "8"
The longest list is of length 2, and I want to build a data frame with 2 columns from this list. Bonus points for also converting each item in the list to a number or NA for character(0). I have tried using mapply() and data.frame to convert to a data frame and fill with NA's as follows.
# Find length of each list element
len = sapply(awards2, length)
# Number of NAs to fill for column shorter than longest
len = 2 - len
df = data.frame(mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len))
However, I do not get a data frame with 2 columns (and NA's as fillers) using the code above.
Thanks for the help.
We can use stri_list2matrix from stringi. As the list elements are all character vectors, it seems okay to use this function
library(stringi)
t(stri_list2matrix(ll))
# [,1] [,2]
#[1,] "2" "1"
#[2,] NA NA
#[3,] "1" NA
#[4,] "1" "8"
If we need to convert to data.frame, wrap it with as.data.frame
I understand that c is used to combine elements. But what is the difference between 1:10 and c(1:10)? I see that the outputs are the same. Shouldn't c(1:10) give an error, because 1:10 already combines all the elements?
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> c(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> class(1:10)
[1] "integer"
> class(c(1:10))
[1] "integer"
If you combine (aka c function) with only one parameter it is the same as the identity (aka not calling the c function). Therefore c(1:10) is the same as 1:10. However you can combine with as many arguments as you want with different type (character,number...). It will convert the type for you.
all.equal(1:10,c(1:5,6:10))
[1] TRUE
all.equal("meow",c("meow"))
[1] TRUE
c(1:5,6:10,"meow")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "meow"
class(c(1:5,6:10,"meow"))
[1] "character"
Another difference is that you can call c with the parameter recursive. As the doc states:
?c
Usage
c(..., recursive = FALSE)
Arguments
...
objects to be concatenated.
recursive
logical. If recursive = TRUE, the function recursively descends through lists (and pairlists) combining all their elements into a vector.