I am learning the use of the ifelse function from Zuur et al (2009) A Beginners guide to R. In one exercise, there is a data frame called Owls which contains data about about 27 nests and two night of observations.
structure(list(Nest = structure(c(1L, 1L, 1L, 1L), .Label = "AutavauxTV", class = "factor"),
FoodTreatment = structure(c(1L, 2L, 1L, 1L), .Label = c("Deprived",
"Satiated"), class = "factor"), SexParent = structure(c(1L,
1L, 1L, 1L), .Label = "Male", class = "factor"), ArrivalTime = c(22.25,
22.38, 22.53, 22.56), SiblingNegotiation = c(4L, 0L, 2L,
2L), BroodSize = c(5L, 5L, 5L, 5L), NegPerChick = c(0.8,
0, 0.4, 0.4)), .Names = c("Nest", "FoodTreatment", "SexParent",
"ArrivalTime", "SiblingNegotiation", "BroodSize", "NegPerChick"
), row.names = c(NA, 4L), class = "data.frame")
The two nights differed as to the feeding regime (satiated or deprived) and are indicated in the Foodregime variable. The task is to use ifelse and past functions that make a new categorical variable that defines observations from a single night at a particular nest.
In the solutions the following code is suggested:
Owls <- read.table(file = "Owls.txt", header = TRUE, dec = ".")
ifelse(Owls$FoodTreatment == "Satiated", Owls$NestNight <- paste(Owls$Nest, "1",sep = "_"), Owls$NestNight <- paste(Owls$Nest, "2",sep = "_"))
and apparently it creates a new variable with values the endings of which vary ("-1" or "-2")
however when I call the original dataframe, all "-1" endings in the NestNight variable disappears and are turned to "-2."
Why does this happen? Did the authors miss something from the code or it's me who is not getting it?
Many thanks
EDIT: Sorry, I wanted to give a reproducible example by copying my data using dput but it did not work. If you can let me know how I can correct it so that it appears properly, I'd be grateful too!
Solution
If you do the assignment outside the ifelse structure, it works:
Owls$NestNight <- ifelse(Owls$FoodTreatment == "Satiated",
paste(Owls$Nest, "1",sep = ""),
paste(Owls$Nest, "2",sep = ""))
Explanation
What happens in your case is simply if you would execute the following two lines:
Owls$NestNight <- paste(Owls$Nest, "1",sep = "")
Owls$NestNight <- paste(Owls$Nest, "2",sep = "")
You first assign paste(Owls$Nest, "1",sep = "") to Owls$NestNight and then you reassign paste(Owls$Nest, "2",sep = "") to it. The ifelse is not affected by this, but you don't assign it's result to any variable.
Maybe it is more clear if you test this simple code:
c(a <- 1:5, a <- 6:10) #c is your ifelse, a is your Owls$NestNight
a #[1] 6 7 8 9 10
Related
This thread follows on from this answered qestion: Matching strings loop over multiple columns
I opened a new thread as I would like to make an update to flag for exact matches only..
I have a table of key words in separate colums as follows:
#codes table
codes <- structure(
list(
Support = structure(
c(2L, 3L, NA),
.Label = c("",
"help", "questions"),
class = "factor"
),
Online = structure(
c(1L,
3L, 2L),
.Label = c("activities", "discussion board", "quiz", "sy"),
class = "factor"
),
Resources = structure(
c(3L, 2L, NA),
.Label = c("", "pdf",
"textbook"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
I also have a comments table structured as follows:
#comments table
comments <- structure(
list(
SurveyID = structure(
1:5,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4", "ID_5"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 5L, 1L),
.Label = c(
"I could never get the pdf to download",
"I could never get the system to work",
"I didn’t get the help I needed on time",
"my questions went unanswered",
"staying motivated to get through the textbook",
"there wasn’t enough engagement in the discussion board"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-5L)
)
What I am trying to do:
Search for an exact match keyword. The following working code has been provided by #Len Greski and #Ronak Shah from the previous thread (with huge thanks to both):
resultsList <- lapply(1:ncol(codes),function(x){
y <- stri_detect_regex(comments$Open_comments,paste(codes[[x]],collapse = "|"))
ifelse(y == TRUE,1,0)
})
results <- as.data.frame(do.call(cbind,resultsList))
colnames(results) <- colnames(codes)
mergedData <- cbind(comments,results)
mergedData
and
comments[names(codes)] <- lapply(codes, function(x)
+(grepl(paste0(na.omit(x), collapse = "|"), comments$Open_comments)))
Both work great but I have come across a snag and now need to match the keywords exactly. As per the example tables above, if I have a keyword "sy", the code will flag any comment with the word "system". I would modify either of the above pieces of code to flag the comment where only "sy" exact match is present.
Many thanks
I have already searched the Forum for Hours (really) and start to get the faint Feeling that I am slowly going crazy, especially as it appears to me to be a really easily solvable Problem.
What do I want to do?
Basically, I want to simulate clinical data. Specifically, for each Patient (column 1:ID) an arbitrary score (column 3: score), dependant on the assigned Treatment Group (column 2: group).
set.seed(123)
# Number of subjects in study
n_patients = 1000
# Score: Mean and SDs
mean_verum = 70
sd_verum = 20
mean_placebo = 40
sd_placebo = 20
# Allocating to Treatment groups:
data = data.frame(id = as.character(1:n_patients))
data$group[1:(n_patients/2)] <- "placebo"
data$group[(n_patients/2+1):n_patients] <- "verum"
# Attach Score for each treatment group
data$score <- ifelse(data$group == "verum", rnorm(n=100, mean=mean_verum, sd=sd_verum), rnorm(n=100, mean=mean_placebo, sd=sd_placebo))
So far so easy. Now, I wish to 1) calculate a probability of an Event happening (logit function) depending on the score. Then, 2) I want to actually assign an Event, depending on the probability (rbinom).
I want to do this for n different probablities/Events. This is the Code I've used so far:
Calculate probabilities:
a = -1
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE1 <- p1
a = -0.5
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE2 <- p1
…
Assign Events:
data$Abbruch_AE1 <- rbinom(n_patients, 1, data$p_E1)
data$Abbruch_AE2 <- rbinom(n_patients, 1, data$p_E2)
…
Obviously, this is really inefficient, as it would like to easily scale this up or down, depending on how many probabilities/Events I want to simulate.
The Problem is, I simply do not get it, how I can simultaneously a) generate new, single column in the dataframe, where I want to put in the values for each, b) perform the function to assign the probabilities/Events, and c) do this for a number n of different formulas, which have their specific a and b.
I am sure the solution to this Problem is a simple one - what I didn't manage was to do all These Things at once, which is were I would like this to be eventually. I ahve played around with for loops, all to no avail.
Any help would be greatly appreciated!
This how my dataframe Looks like:
structure(list(id = structure(1:3, .Label = c("1", "2", "3"), class = "factor"),
group = c("placebo", "placebo", "placebo"), score = c(25.791868726014,
45.1376741831306, 35.0661624307525), p_AE1 = c(0.677450814266315,
0.633816117436442, 0.656861351663365), p_AE2 = c(0.560226492151216,
0.512153420188678, 0.537265362130761), p_AE3 = c(0.435875409622676,
0.389033483248856, 0.413221988111604), p_AE4 = c(0.319098312196655,
0.278608032377073, 0.299294085148527), p_AE5 = c(0.221332386680766,
0.189789774534235, 0.205762225373345), p_AE6 = c(0.147051201194953,
0.124403316086538, 0.135795233451071), p_AE7 = c(0.0946686004658072,
0.0793379289917946, 0.0870131973838217), p_AE8 = c(0.0596409872667201,
0.0496714832182721, 0.0546471270895262), AbbruchAE1 = c(1L,
1L, 1L), AbbruchAE2 = c(1L, 1L, 0L), AbbruchAE3 = c(0L, 0L,
0L), AbbruchAE4 = c(0L, 1L, 0L), AbbruchAE5 = c(1L, 0L, 0L
), AbbruchAE6 = c(1L, 0L, 0L), AbbruchAE7 = c(0L, 0L, 0L),
AbbruchAE8 = c(0L, 0L, 0L)), .Names = c("id", "group", "score", "p_AE1", "p_AE2", "p_AE3", "p_AE4", "p_AE5", "p_AE6", "p_AE7", "p_AE8", "AbbruchAE1", "AbbruchAE2", "AbbruchAE3", "AbbruchAE4", "AbbruchAE5", "AbbruchAE6", "AbbruchAE7", "AbbruchAE8"), row.names = c(NA, 3L), class = "data.frame")
As you can see I would like to read a csv-table into my data-pool. The table has multiple columns but when i simply try following code:
reviews <- read.table("Sz-Iraki2.csv", fileEncoding = "UTF-8")
i get the error: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 22 elements
When i Add header=True i get the error: more columns than column names. Seems like a basic problem but i can´t find the answer :(strong text
but should look like this
Data looks like this
You have to define a separator otherwise R fail to read data properly. Suppose your data structure is the following:
structure(list(month = 2:5, titles_tmp = structure(c(1L, 1L,
1L, 1L), .Label = "some text", class = "factor"), info_tmp = structure(c(1L,
1L, 1L, 1L), .Label = "More text", class = "factor"), unlist.text = structure(c(1L,
1L, 1L, 1L), .Label = "http://somelink.com", class = "factor")), .Names = c("month",
"titles_tmp", "info_tmp", "unlist.text"), class = "data.frame", row.names = c(NA,
-4L))
That means you separate each columns with single tab. Meaning you need to use sep = " " as a data separator. Provided your data file name is "df.csv" the following should import your data nicely:
df = read.csv("Sz-Iraki2.csv", sep= " ", fileEncoding = "UTF-8")
I like to use:
require(readr)
read_csv("myData.csv")
Seems more appropriate, if your file type is csv.
Also comes with some useful options like defining 'coltype' on import.
This may be a very simple question, but I don't see how to answer it.
I have the following reproducible code, where I have two small dataframes that I use to calculate a percentage value based on each column total:
#dataframe x
x <- structure(list(PROV = structure(c(1L, 1L), .Label = "AG", class = "factor"),
APT = structure(1:2, .Label = c("AAA", "BBB"), class = "factor"),
PAX.2013 = c(5L, 4L), PAX.2014 = c(4L, 2L), PAX.2015 = c(4L,0L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#dataframe y
y <- structure(list(PROV = structure(c(1L, 1L), .Label = "AQ", class = "factor"),
APT = structure(1:2, .Label = c("CCC", "AAA"), class = "factor"),
PAX.2013 = c(3L, 7L), PAX.2014 = c(2L, 1L), PAX.2015 = c(0L,3L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#list z (with x and y)
z <- list(x,y)
#percentage value of x and y based on columns total
round(prop.table(as.matrix(z[[1]][3:5]), margin = 2)*100,1)
round(prop.table(as.matrix(z[[2]][3:5]), margin = 2)*100,1)
as you can see, it works just fine.
Now I want to automate for all the list, but I can't figure out how to get the results. This is my simple code:
#for-loop that is not working
for (i in length(z))
{round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1)}
You have two problems.
First, you have not put a range into your for loop so you are just trying to iterate over a single number and second, you are not assigning your result anywhere on each iteration.
Use 1:length(z) to define a range. Then assign the results to a variable.
This would work:
my_list <- list()
for (i in 1:length(z)){
my_list[[i]] <- round(prop.table(as.matrix(z[[i]][3:5]),
margin = 2)*100,1)
}
my_list
But it would be more efficient and idiomatic to use lapply:
lapply(1:length(z),
function(x) round(prop.table(as.matrix(z[[x]][3:5]), margin = 2)*100,1))
Barring discussions whether for-loops is the best approach, you had two issues. One, your for loop only iterates over 2 (which is length(z)) instead of 1:2. Two, you need to do something with the round(....) statement. In this solution, I added a print statement.
for (i in 1:length(z)){
print(round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1))
}
This would seem to be a straightforward problem but I can't find an answer for it...
How do I write a function where one of the calls refers to a specific variable name?
For example, if I have a data frame:
data=structure(list(x = 1:10, treatment = c(1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L)), .Names = c("x", "treatment"), row.names = c(NA,
-10L), class = "data.frame")
I can write a trivial function that adds treatment to the other variable in the data frame but this only works if there is a variable called "treatment" in d.
ff=function(data,treatment){data+treatment)}
ff(data,data$treatment)
This works but I want to set it up so the user doesn't call data$Var in the function.
Is this what you want?
ff <- function(data, colname) {
data + data[[colname]]
}
ff( data, "treatment" )
or
ff <- function(data, column) {
colname <- deparse(substitute(column))
data + data[[colname]]
}
ff( data, treatment )
(the later can lead to hard to find bugs if someone tries something like ff(data, 1:10))