Subsetting List Document conditionally

Subsetting List Document conditionally - r

I'm currently working on a Homework, where I'm asked to subset a list of reviews to a new list containing only reviews with 5 or less words.
Using short_revs <- walk(mydoc, ~length(mydoc[[i]]) <= 5)) returns me the same initial List.
Can anyone help?

I think walk is not the right tool for this: it operates solely in side-effect, always returning the input unchanged. Some simple alternatives, choose one:
short_revs <- mydoc[ lengths(mydoc) <= 5 ]
short_revs <- Filter(function(z) length(z) <= 5, mydoc]

Related

Double filtering with for loops

I’m trying to essentially double filter this “championships” dataset for each element of the “questions” column and then for the elements of the “correct” column (either 1 or 0). I have tried to do this with the code below:
unique_questions <- unique(championships$question)
question_ <- numeric(length(unique_questions))
for(i in unique_questions) {{
question_[i] <- championships %>% filter(question == i)
}
correct_ <- length(championships$correct)
for(j in unique_correct) {
correct_[j] <- championships %>% filter(correct == j)
}
print(question_[i], correct_[j])
}
This doesn’t seem to be working and I have a feeling the problem has something to do with the placement of brackets of the choice of functions (in particular, the numeric(length()) function) within the for loops. If this function worked properly, I would hope to have elements of a form similar to qc_ij with i drawn from the “unique_questions” category and j drawn from the “unique_correct” category. There are twelve questions and two choices of “correct” (1 and 0) so I would hope there would be 24 objects of type qc_ij. If someone sees where this functions doesn’t work, can you help me fix it?

You don't really need to store the unique questions and answers for each step and you can filter a data.frame simply by selecting rows that satisfy a certain condition:
for(q in unique(championships$question)){
d = championships[championships$question==q,] ## Filter for q
for (ci in unique(d$correct)) { ## Only loop over the possible answers for THIS question
d2 = d[d$correct=ci,] ## Filter for ci
## Just print it to see whether it worked
print(paste(q,ci))
print(d2)
}
}
So, you subsequently get all the subsets of the championships table for each value of questions and each value of (specific) correct answers.

Problem deleting elements with 2 values in R list

I am trying to format a list such that I would have one word per value(I imported it from a very poor quality csv, and can't do much about improving the csv). I currently am trying to make it so that every element only has one value, however, the code I am currently using is not doing this, although I am not getting error messages.
Here is the code I am currently using:
Terms <- [] #9020 elements with lengths 1, 2, and 3
for (x in 1:length(Terms)){
if (Terms[[x]] %>% is.list()){
term <-Terms[[x]]
length(term) <- 1
Terms[[x]]<-term
}#should return list of same size, but only with elements of length 1
Any help figuring out what I could use to make it so that I can delete any second variables would be appreciated.

An option would be to create a logical condition with lengths and then use that for subsetting the list
lst2 <- lst1[lengths(lst1) == 1]
If the intention is to get only the first element
lst2 <- lapply(lst1, `[`, 1)
NOTE: Assuming the list elements are vectorss

R row selection providing partial results

I'm having an issue, which I have found a solution for, but would like to understand what was going on in the original coding.
So I started with a table pulled from an SQL database and wanted information for 1 client, who is covered by 2 client numbers.
Originally I was running this to select those account numbers.
match <- c("C524",'5568')
gtc <- gtc[gtc$AccountNumber == match,]
However this was only returning about half of the desired results, and the results returned vary at different times (this was running as a weekly report), and depending on the PC running it.
Now, I've set up a loop which works fine and extracts all the results, but would really like to know what was going on with the original query.
match <- c("C524",'5568')
for (each in match) {
gtcLoop<- gtc[gtc$AccountNumber == each,]
result<-rbind(result,gtcLoop)
}
Also, long time lurker, first time poster so let me know if I've done anything wrong in this question.

You need to replace == by %in%:
gtc <- data.frame(AccountNumber = sample(c(match, "something"), 10, replace = TRUE))
gtc[gtc$AccountNumber %in% match,]

Just to tag onto Qaswed's answer (+1), you need to understand what is happening when you compute vector comparisons like ==. See:
?`==`
and
?`%in%`
then try something like 1 == c(1,2) and 1 %in% c(1,2).
The reason you are getting half the results is because the row subset is using the first evaluation only, as in:
df <- data.frame(id=c(1:5), acct_cd = letters[1:5])
df[df$acct_cd == c("a","c"),] # this is wrong, for demo only
df[df$acct_cd %in% c("a","c"),] # this is correct

Assigning output of a function to two variables in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
function with multiple outputs
This seems like an easy question, but I can't figure it out and I haven't had luck in the R manuals I've looked at. I want to find dim(x), but I want to assign dim(x)[1] to a and dim(x)[2] to b in a single line.
I've tried [a b] <- dim(x) and c(a, b) <- dim(x), but neither has worked. Is there a one-line way to do this? It seems like a very basic thing that should be easy to handle.

This may not be as simple of a solution as you had wanted, but this gets the job done. It's also a very handy tool in the future, should you need to assign multiple variables at once (and you don't know how many values you have).
Output <- SomeFunction(x)
VariablesList <- letters[1:length(Output)]
for (i in seq(1, length(Output), by = 1)) {
assign(VariablesList[i], Output[i])
}
Loops aren't the most efficient things in R, but I've used this multiple times. I personally find it especially useful when gathering information from a folder with an unknown number of entries.
EDIT: And in this case, Output could be any length (as long as VariablesList is longer).
EDIT #2: Changed up the VariablesList vector to allow for more values, as Liz suggested.

You can also write your own function that will always make a global a and b. But this isn't advisable:
mydim <- function(x) {
out <- dim(x)
a <<- out[1]
b <<- out[2]
}
The "R" way to do this is to output the results as a list or vector just like the built in function does and access them as needed:
out <- dim(x)
out[1]
out[2]
R has excellent list and vector comprehension that many other languages lack and thus doesn't have this multiple assignment feature. Instead it has a rich set of functions to reach into complex data structures without looping constructs.

Doesn't look like there is a way to do this. Really the only way to deal with it is to add a couple of extra lines:
temp <- dim(x)
a <- temp[1]
b <- temp[2]

It depends what is in a and b. If they are just numbers try to return a vector like this:
dim <- function(x,y)
return(c(x,y))
dim(1,2)[1]
# [1] 1
dim(1,2)[2]
# [1] 2
If a and b are something else, you might want to return a list
dim <- function(x,y)
return(list(item1=x:y,item2=(2*x):(2*y)))
dim(1,2)[[1]]
[1] 1 2
dim(1,2)[[2]]
[1] 2 3 4
EDIT:
try this: x <- c(1,2); names(x) <- c("a","b")

Using mapply() in R over rows, vs. columns

I deal with a great deal of survey data and the like in my work, and I often have to make various scoring programs that process data on a row-by-row level. For instance, I am dealing with a table right now that contains 12 columns with subscale scores from a psychometric instrument. These will be converted to normalized scores using tables provided by the instrument's creator. Seems straightforward so far.
However, there are four tables - the instrument is scored differently depending on gender and age range. So, for instance, a 14-year old female and an 10 year-old male get different normalization tables. All of the normalization data is stored in a R data frame.
What I would like to do is write a function which can be applied over rows, which returns a vector looked up from the normalization data. So, something vaguely like this:
converter <- function(rawscores,gender,age) {
if(gender=="Male") {
if(8 <= age & age <= 11) {convertvec <- c(1:12)}
if(12 <= age & age <= 14) {convertvec <- c(13:24)}
}
else if(gender=="Female") {
if(8 <= age & age <= 11) {convertvec <- c(25:36)}
if(12 <= age & age <= 14) {convertvec <- c(37:48)}
}
converted_scores <- rep(0,12)
for(z in 1:12) {
converted_scores[z] <- conversion_table[(unlist(rawscores)+1)[z],
convertvec[z]]
}
rm(z)
return(converted_scores)
}
EDITED: I updated this with the code I actually got to work yesterday. This version returns a simple vector with the scores. Here's how I then implemented it.
mydata[,21:32] <- 0
for(x in 1:dim(mydata)[1]) {
tscc_scores[x,21:32] <- converter(mydata[x,7:18],
mydata[x,"gender"],
mydata[x,"age"])
}
This works, but like I said, I'm given to understand that it is bad practice?
Side note: the reason rawscores+1 is there is that the data frame has a score of zero in the first index.
Fundamentally, the function doesn't seem very complicated, and I know I could just implement it using a loop where I would do for(x in 1:number_of_records), but my understanding is that doing so is poor practice. I had hoped to simply use apply() to do this, like as follows:
apply(X=mydata[,1:12],MARGIN=1,
FUN=converter,gender=mydata[,"gender"],age=mydata[,"age"])
Unfortunately, R doesn't seem to approve of this approach, as it does not iterate through the vectors passed to subsequent arguments, but rather tries to take them as the argument as a whole. The solution would appear to be mapply(), but I can't figure out if there's a way to use mapply() over rows, instead of columns.
So, I guess my questions are threefold. One, is there a way to use mapply() over rows? Two, is there a way to make apply() iterate over arguments? And three, is there a better option out there? I've seen and heard a lot about the plyr package, but I didn't want to jump to that before I fully investigated the options present in Base R.

You could rewrite 'converter' so that it takes vectors of gender, age, and a row index which you then use to do lookups and assignments to converted_scores using a conversion array and a data array that is jsut the numeric score columns. There is an additional problem with using apply since it will convert all its x arguments to "character" class because of the gender class being "character". It wasn't clear whether your code normdf[ rawscores+1, convertvec] was supposed to be an array extraction or a function call.
Untested in absence of working example (with normdf, mydata):
converted_scores <- matrix(NA, nrow=NROW(rawscores), ncol=12)
converter <- function(idx,gender,age) {
gidx <- match(gender, c("Male", "Female") )
aidx <- findInterval(age, c(8,12,15) )
ag.idx <- gidx + 2*aidx -1
# the aidx factor needs to be the same number of valid age categories
cvt <- cvt.arr[ ag.idx, ]
converted_scores[idx] <- normdf[rawscores+1,convertvec]
return(converted_scores)
}
cvt.arr <- matrix(1:48, nrow=4, byrow=TRUE)[1,3,2,4] # the genders alternate
cvt.scores <- mapply(converter, 1:NROW(mydata), mydata$gender, mydata$age)

I'd advise against applying this stuff by row, but would rather apply this by column. The reason is that there are only 12 columns, but there might be many rows.
The following piece of code works for me. There might be better ways, but it might be interesting for you nevertheless.
offset <- with(mydata, 24*(gender == "Female") + 12*(age >= 12))
idxs <- expand.grid(row = 1:nrow(mydata), col = 1:12)
idxs$off <- idxs$col + offset
idxs$val <- as.numeric(mydata[as.matrix(idxs[c("row", "col")])]) + 1
idxs$norm <- normdf[as.matrix(idxs[c("val", "off")])]
converted <- mydata
converted[,1:12] <- as.matrix(idxs$norm, ncol=12)
The tricky part here is this idxs data frame which combines all the rest. It has the folowing columns:
row and column: Position in the original data
off: column in normdf, based on gender and age
val: row in normdf, based on original value + 1
norm: corresponding normalized value
I'll post this here with this first thought, and see whether I can come up with a better answer, either based on jorans comment, or using a three- or four-dimensional array for normdf. Not sure yet.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Subsetting List Document conditionally - r

I'm currently working on a Homework, where I'm asked to subset a list of reviews to a new list containing only reviews with 5 or less words. Using short_revs <- walk(mydoc, ~length(mydoc[[i]]) <= 5)) returns me the same initial List. Can anyone help?

I think walk is not the right tool for this: it operates solely in side-effect, always returning the input unchanged. Some simple alternatives, choose one: short_revs <- mydoc[ lengths(mydoc) <= 5 ] short_revs <- Filter(function(z) length(z) <= 5, mydoc]

Related

Double filtering with for loops

Problem deleting elements with 2 values in R list

R row selection providing partial results

Assigning output of a function to two variables in R [duplicate]

Using mapply() in R over rows, vs. columns

Categories

Resources