The background to this is that I'm mostly a Python programmer who has some passing familiarity with R. I've been tasked to look at an R script that was written by a Perl programmer who used for and while loops a lot, to see if I can make it more R-like and get it to run faster.
For example purposes, I have the following list:
> lnums <- list(1:5, 6:7, 8:12)
For the elements that have a length less than 5 (lnums[[2]]), I want to change the length to be 5. The original code uses a for loop to tack NA values to the end of any shorter vectors, and I know that there's got to be a better way than that. I was playing around with ways to get to it and came up with
> sapply(lnums, FUN=function(x) length(x) < 5)
which gets the right element, but I'm unable to figure out how to incorporate this into the subscript of a length(lnums[]) <- 5 statement. I know this is probably a really novice question, but I'd appreciate any help I can get.
Additionally, the reason that I want to increase the length of the shorter list elements is so that I can put the list into a data frame. It would be great if there was a way to do that without messing around with lengths, although I still wouldn't mind an answer to my first question to satisfy my curiosity if nothing else.
Thanks all. I've been digging through some topics in here and you've already helped me out quite a bit!
Here's one way:
lapply(lnums, 'length<-', 5)
Related
I recently just started with R a few weeks ago at the Uni. We were given a problem which we had to solve. However in this problem, I find that there are two answers that fit the question:
Verify that you created lo_heval correctly (incl. missing values). Store your verification in the object proof2.
So i find this is correct:
proof2 <- soep[1:100, c("heval", "lo_heval")]
But I think that this answer is also correct:
proof2 <- table(soep$heval, soep$lo_heval, useNA = "always")
Instead of having to decide for one answer, how do I combine them both into the object? I tried to use &, but I get an error. I may be using it wrong.
Prof. if you're seeing this, please don't fail me. I just can't decide between them.
Thanks in advance!
R lists can hold any arbitrary objects in them, so you could use
proof2 <- list(
soep[1:100, c("heval", "lo_heval")],
table(soep$heval, soep$lo_heval, useNA = "always")
)
However, to my mind 100 rows of two columns isn't proof - it's an exercise to look through those and verify things are right. (And what about the rows past 100? It's a decent spot check, but if there are more rows in the data it is more strong evidence than proof.) The table approach, on the other hand, seems succinct and effective.
Sorry for the noob question but I can't seem to get this to work!
X=cbind(rep(1,m), h2(x), h3(x)) #obs
So I have a 17*3 matrix X I have to create a matrix(list(),17,3) version of this matrix. I did manually below so you can see the desired result, but there must be an easier way to do this?
Z=matrix(list(X[1,1],X[2,1],X[3,1],X[4,1],X[5,1],X[6,1],X[7,1],X[8,1],X[9,1],X[10,1],X[11,1],X[12,1],X[13,1],X[14,1],X[15,1],X[16,1],X[17,1],X[1,2],X[2,2],X[3,2],X[4,2],X[5,2],X[6,2],X[7,2],X[8,2],X[9,2],X[10,2],X[11,2],X[12,2],X[13,2],X[14,2],X[15,2],X[16,2],X[17,2],X[1,3],X[2,3],X[3,3],X[4,3],X[5,3],X[6,3],X[7,3],X[8,3],X[9,3],X[10,3],X[11,3],X[12,3],X[13,3],X[14,3],X[15,3],X[16,3],X[17,3]),17,3)
I tried this (amongst others)
Z2=list(X[1:17,1],X[1:17,2],X[1:17,3])
Z3=matrix(Z2[1:3],17,3)
But it doesn't give the correct results! It just repeats the three column vectors over and over.
Can someone please explain how to do this correctly.
Apparently you want Z <- matrix(as.list(X), ncol = 3). However, I don't see how this structure could be useful.
I'm optimizing a more complex code, but got stuck with this problem.
a<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
m<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
f<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
g<-array(NA,c(10,10))
I need to use the values in a & m to index f and assign the value from f to g
i.e. g[1,1]<-f[a[1,1],m[1,1]] except for all the indexes, and as optimally/fast as possible
I could obviously make a for loop to do this for me but that seems rather dumb and slow. It seems like I should be able to us something in the apply family, but I've had no luck with figuring out how to do that. I do need to keep the data structured as it is here so that I can use matrix operations in different parts of my code. I've been searching for an answer to this but haven't found anything particularly helpful yet.
g[] <- f[cbind(c(a), c(m))]
This takes advantage of the fact that matrices can be addressed as vectors and using a matrix as the index.
New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!
I have been trying to produce a command in R that allows me to produce a new vector where each row is the sum of 25 rows from a previous vector.
I've tried making a function to do this, this allows me to produce a result for one data point.
I shall put where I haver got to; I realise this is probably a fairly basic question but it is one I have been struggling with... any help would be greatly appreciated;
example<-c(1;200)
fun.1<-function(x)
{sum(x[1:25])}
checklist<-sapply(check,FUN=fun.1)
This then supplies me with a vector of length 200 where all values are NA.
Can anybody help at all?
Your example is a bit noisy (e.g., c(1;200) has no meaning, probably you want 1:200 there, or, if you would like to have a list of lists then something like rep, there is no check variable, it should have been example, etc.).
Here's the code what I think you need probably (as far as I was able to understand it):
x <- rep(list(1:200), 5)
f <- function(y) {y[1:20]}
sapply(x, f)
Next time please be more specific, try out the code you post as an example before submitting a question.