Issue with If Condition in For Loop - r

I'm trying to find the largest number of people who did not survive in a dataframe that I am working on. I used a for loop to iterate through the rows but I'm having an issue. It doesn't seem like my if condition is working. It is saying that the largest number is 89 but it is actually 670.
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
This is the output of the printed most_lost
[1] 0
[1] 0
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "35"
[1] "387"
[1] "670"
[1] "670"
[1] "670"
[1] "89"
[1] "89"
Here is the table I'm working with

Could you please check the data formats in your table, e.g., is Freq really numeric? With below example data your code works for me - see below code. As a side note, it would be better if you would not post your data as a figure, use, e.g., dput(data) instead and post its output, this makes it easier for others to import your data and check its structure. You might edit your question accordingly.
In any case, I would like to highlight, that for the task you describe you should not use a loop but simply subset your table, since looping will be unacceptably slow for such tasks with larger data sets. I have provided an example at the end of below code.
Titanic = as.data.frame(cbind(Survived = rep("No", 8), Freq = c(1,2,5,0,2,3,1,1)), stringsAsFactors = F)
# Survived Freq
# 1 No 1
# 2 No 2
# 3 No 5
# 4 No 1
# 5 No 2
# 6 No 3
# 7 No 1
# 8 No 1
most_lost <- 0
for (i in 1:dim(Titanic)[1]) {
if (Titanic$Survived[i] == "No") {
if (Titanic$Freq[i] > most_lost) {
most_lost <- Titanic$Freq[i]
}
print(most_lost)
}
}
# [1] "1"
# [1] "2"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
# [1] "5"
max(Titanic[Titanic$Survived == "No", "Freq"])
# [1] "5"

If I'm understanding correctly, you don't need a for loop.
max(Titanic$Freq[Titanic$Survived == "No"])
This line is subsetting the Freq column by rows where the Survived column is "No" and then finding the max value of the subsetted Freq column.

Related

Height conversion in R

I have recently started learning R and I am facing an issue.
I have a column in my data which have height of players in (feet'inches) format.
I want to create a new column for height in centimeters. For this I used the "strsplit" function as below(df is the height column):
l <- strsplit(df,"'",fixed = T)
print(l)
[[1]]
[1] "5" "7"
[[2]]
[1] "6" "2"
[[3]]
[1] "5" "9"
[[4]]
[1] "6" "4"
[[5]]
[1] "5" "11"
[[6]]
[1] "5" "8"
I am getting stuck here as I don't know how to obtain the required value after splitting the field.
I am trying to use the below code but its giving the following error:
p_pos <- grep("'",df)
l[[p_pos]][1]
Error in l[[p_pos]] : recursive indexing failed at level 2
I am expecting the above code to print the values from the first column in the list
5 6 5 6 5 5
>dput(head(df, 10))
c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8")
One way to do this is to create a data frame with a column of feet and a column of inches. The separate function in the tidyr package handles this well - see this answer by its creator.
> library(dplyr)
> library(tidyr)
> df = data.frame(height = c("5'7", "6'2", "5'9", "6'4", "5'11", "5'8"))
> df %>% separate(height, c('feet', 'inches'), "'", convert = TRUE) %>%
+ mutate(cm = (12*feet + inches)*2.54)
feet inches cm
1 5 7 170.18
2 6 2 187.96
3 5 9 175.26
4 6 4 193.04
5 5 11 180.34
6 5 8 172.72
The separate creates a data frame with columns of feet and inches; the mutate does the conversion to centimeters.
This will give you a vector with the heights in centimeters.
We are applying to your whole list a function that turns the number string into numeric and multiplies it with the conversion to cm.
l = list()
l[[1]] = c("5","7")
l[[2]] = c("6","2")
l[[3]] = c("5","9")
l[[4]] = c("6","4")
l[[5]] = c("5","11")
l[[6]] = c("5","8")
sapply(l,function(x) sum(as.numeric(x)*c(30.48,2.54)))
[1] 170.18 187.96 175.26 193.04 180.34 172.72

How can I compare two lists and output "hits" into a dataframe

I've tried to find answers here and on google but no luck, been struggling with this issue for some days so would really appreciate help. I'm analyzing a network to see if cycles tend to be within discreet communities or between them, or no pattern. My data are a list of cycles (three nodes forming a loop) and a list of communities (variable amount of nodes). I have two questions, 1) how to compare two lists, and 2) how to output the comparison results in a way which is readable:
Question 1
I have two lists (both igraph objects), one containing 678 items (each of 3 elements, all characters) and another containing 11 items each with a differing number of elements. Example:
x1 <- as.character(c(1,3,5))
x2 <- as.character(c(2,4,6))
x3 <- as.character(c(7,8,9))
x4 <- as.character(c(10,11,12))
x <- list(x1, x2, x3, x4)
y1 <- as.character(c(1,2,3,4,5))
y2 <- as.character(c(2,3,4,5))
y3 <- as.character(c(1,2,3,4,5,7,8,9))
y <- list(y1, y2, y3)
Giving:
> x
[[1]]
[1] "1" "3" "5"
[[2]]
[1] "2" "4" "6"
[[3]]
[1] "7" "8" "9"
[[4]]
[1] "10" "11" "12"
> y
[[1]]
[1] "1" "2" "3" "4" "5"
[[2]]
[1] "2" "3" "4" "5"
[[3]]
[1] "1" "2" "3" "4" "5" "7" "8" "9"
I want to compare every component in x against every component in y and add every hit (i.e. when all the elements from x[[i]] are also found in y[[i]]) to a new dataframe. I tried a loop using all() and %in% but this didn't work:
for (i in 1:length(x)) {
for (j in 1:length(y)) {
hits <- all(y[[j]] %in% x[[i]]) == TRUE
print(hits)
}
}
This returns 12 FALSE hits. Checking individual components, it should have worked, because:
all(x[[1]] %in% y[[1]])
Returns TRUE as it should, and:
all(x[[1]] %in% y[[2]])
Returns FALSE as it should. Where am I going wrong here?
Question 2
I have seen some solutions for outputting loop results into a df, but that's not exactly what I need. What I want as an output is a dataframe telling me which community every cycle is in. Since there's only 11 communities, it could just refer me to the list component's index, but I haven't found a way to do this. I could also just use paste() to concatenate the node names of a community into a title. Either way, here is the output I need:
cycle community
1 1_3_5 1_2_3_4_5
2 1_3_5 1_2_3_4_5_7_8_9
3 7_8_9 1_2_3_4_5_7_8_9
I'm guessing some kind of an if statement. I feel this should be fairly simple to execute and that I should have been able to work it out myself. Nevertheless, thank you for your time and sorry about the essay.
You made a mistake
for (i in 1:length(x)) {
for (j in 1:length(y)) {
# hits <- all(y[[j]] %in% x[[i]]) == TRUE
hits <- all(x[[i]] %in% y[[j]]) == TRUE
print(hits)
}
}
For the second part you can store the indexes that have a hit and use them for later.
a <- list()
for (i in 1:length(x)) {
for (j in 1:length(y)) {
# hits <- all(y[[j]] %in% x[[i]]) == TRUE
hits <- all(x[[i]] %in% y[[j]]) == TRUE
if(hits == TRUE){
a[[length(a)+1]] <- c(i,j)
}
}
}
The final part of the question, creation of cycle and community tags, can be accomplished with stringi::stri_join() (or paste() as pointed out in the comments). The final step to wrangle the list created in Jt Miclat's answer is as follows, using the indexes in the list a to extract the appropriate strings for cycle and community, generate data frames, and rbind() the result to a single data frame.
# combine with cycle & community tags
cycles <- sapply(x,paste,collapse="_")
communities <- sapply(y,paste,collapse="_")
b <- lapply(a,function(x){
cycle <- cycles[x[1]]
community <- communities[x[2]]
data.frame(x=x[1],y=x[2],cycle=cycle,community=community,
stringsAsFactors=FALSE)
})
df <- do.call(rbind,b)
df
...and the output:
> df <- do.call(rbind,b)
> df
x y cycle community
1 1 1 1_3_5 1_2_3_4_5
2 1 3 1_3_5 1_2_3_4_5_7_8_9
3 3 3 7_8_9 1_2_3_4_5_7_8_9
>
Well you can make use of outer:
outer(x,y,function(w,z)Map(function(i,j)all(i%in%j),w,z))->results
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE TRUE
[4,] FALSE FALSE FALSE
x is the rows while y is the columns, so to check all(x[[1]]%in%y[[2]]),just check row 1 column 2 ie element [1,2] etc..
Then you can use apply with a own created function:
fun<-function(i)c(paste(x[[i[1]]],collapse ="_"), paste(y[[i[2]]],collapse ="_"))
t(apply(which(result==T,T),1,fun))
[,1] [,2]
[1,] "1_3_5" "1_2_3_4_5"
[2,] "1_3_5" "1_2_3_4_5_7_8_9"
[3,] "7_8_9" "1_2_3_4_5_7_8_9"

Using lapply to apply function to each row in a tibble

This is my code that attempts apply a function to each row in a tibble , mytib :
> mytib
# A tibble: 3 x 1
value
<chr>
1 1
2 2
3 3
Here is my code where I'm attempting to apply a function to each line in the tibble :
mytib = as_tibble(c("1" , "2" ,"3"))
procLine <- function(f) {
print('here')
print(f)
}
lapply(mytib , procLine)
Using lapply :
> lapply(mytib , procLine)
[1] "here"
[1] "1" "2" "3"
$value
[1] "1" "2" "3"
This output suggests the function is not invoked once per line as I expect the output to be :
here
1
here
2
here
3
How to apply function to each row in tibble ?
Update : I appreciate the supplied answers that allow my expected result but what have I done incorrectly with my implementation ? lapply should apply a function to each element ?
invisible is used to avoid displaying the output. Also you have to loop through elements of the column named 'value', instead of the column as a whole.
invisible( lapply(mytib$value , procLine) )
# [1] "here"
# [1] "1"
# [1] "here"
# [1] "2"
# [1] "here"
# [1] "3"
lapply loops through columns of a data frame by default. See the example below. The values of two columns are printed as a whole in each iteration.
mydf <- data.frame(a = letters[1:3], b = 1:3, stringsAsFactors = FALSE )
invisible(lapply( mydf, print))
# [1] "a" "b" "c"
# [1] 1 2 3
To iterate through each element of a column in a data frame, you have to loop twice like below.
invisible(lapply( mydf, function(x) lapply(x, print)))
# [1] "a"
# [1] "b"
# [1] "c"
# [1] 1
# [1] 2
# [1] 3

Change values from categorical to nominal in R

I want to change all the values in categorical columns by rank. Rank can be decided using the index of the sorted unique elements in the column.
For instance,
> data[1:5,1]
[1] "B2" "C4" "C5" "C1" "B5"
then I want these entries in the column replacing categorical values
> data[1:5,1]
[1] "1" "4" "5" "3" "2"
Another column:
> data[1:5,3]
[1] "Verified" "Source Verified" "Not Verified" "Source Verified" "Source Verified"
Then the updated column:
> data[1:5,3]
[1] "3" "2" "1" "2" "2"
I used this code for this task but it is taking a lot of time.
for(i in 1:ncol(data)){
if(is.character(data[,i])){
temp <- sort(unique(data[,i]))
for(j in 1:nrow(data)){
for(k in 1:length(temp)){
if(data[j,i] == temp[k]){
data[j,i] <- k}
}
}
}
}
Please suggest me the efficient way to do this, if possible.
Thanks.
Here a solution in base R. I create a helper function that convert each column to a factor using its unique sorted values as levels. This is similar to what you did except I use as.integer to get the ranking values.
rank_fac <- function(col1)
as.integer(factor(col1,levels = unique(col1)))
Some data example:
dx <- data.frame(
col1= c("B2" ,"C4" ,"C5", "C1", "B5"),
col2=c("Verified" , "Source Verified", "Not Verified" , "Source Verified", "Source Verified")
)
Applying it without using a for loop. Better to use lapply here to avoid side-effect.
data.frame(lapply(dx,rank_fac)
Results:
# col1 col2
# [1,] 1 3
# [2,] 4 2
# [3,] 5 1
# [4,] 3 2
# [5,] 2 2
using data.table syntax-sugar
library(data.table)
setDT(dx)[,lapply(.SD,rank_fac)]
# col1 col2
# 1: 1 3
# 2: 4 2
# 3: 5 1
# 4: 3 2
# 5: 2 2
simpler solution:
Using only as.integer :
setDT(dx)[,lapply(.SD,as.integer)]
Using match:
# df is your data.frame
df[] <- lapply(df, function(x) match(x, sort(unique(x))))

difference between 1:10 and c(1:10)

I understand that c is used to combine elements. But what is the difference between 1:10 and c(1:10)? I see that the outputs are the same. Shouldn't c(1:10) give an error, because 1:10 already combines all the elements?
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> c(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> class(1:10)
[1] "integer"
> class(c(1:10))
[1] "integer"
If you combine (aka c function) with only one parameter it is the same as the identity (aka not calling the c function). Therefore c(1:10) is the same as 1:10. However you can combine with as many arguments as you want with different type (character,number...). It will convert the type for you.
all.equal(1:10,c(1:5,6:10))
[1] TRUE
all.equal("meow",c("meow"))
[1] TRUE
c(1:5,6:10,"meow")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "meow"
class(c(1:5,6:10,"meow"))
[1] "character"
Another difference is that you can call c with the parameter recursive. As the doc states:
?c
Usage
c(..., recursive = FALSE)
Arguments
...
objects to be concatenated.
recursive
logical. If recursive = TRUE, the function recursively descends through lists (and pairlists) combining all their elements into a vector.

Resources