This line of code rep(c(0), 2)
creates
x
1 0
2 0
I would like to somehow extend this, in a way appropriate for R, so that I end up with something like the vector below. Basically, I'd like to append integers pairwise as such:
x
1 0
2 0
3 1
4 1
5 2
6 2
I feel like this is an example of R's dislike of loops which has the unintended side effect of makes simple things seem unfamiliar and makes me long for other tools in other languages, like perhaps numpy in Python.
What is the best way I can iterate and increment in this manner?
Copying Nishanth's answer, this does exactly what you want.
rep(0:2, each=2)
data.frame(x = unlist(lapply(0:2, rep, 2)))
Related
I'm starting to learn R and I'm having a hard time making changes to the names of values in a factor. I've tried using revalue and recode but am still seeing the original names when I look at the dataframe.
Here's what the DF looks like:
head(freecut)
gender oldness student_loaniness homeland
1 0 20 4 Eurasia
2 1 25 4 Oceana
3 1 56 2 Eastasia
4 0 65 6 Eastasia
5 1 50 5 Oceana
6 0 20 5 Eastasia
And here are the coding attempts:
revalue(freecut$homeland, c("Eastasia" = "East_Asia", "Eurasia" = "Asiope",
"Oceana" = "Nemoville"))
recode(freecut$homeland, Eastasia = "East_Asia", Eurasia = "Asiope",
Oceana = "Nemoville")
After running the code the DF looks exactly the same. I know that in Python I would have to throw in "inplace = TRUE" to make changes stick--not sure what I need to do here (or what I'm missing).
R doesn't modify in place, you have to assign results - either back to the original variable to modify it, or to a new variable. This is a paradigm of functional programming, and R is a functional programming language.
If you have x = 1, running x + 1 will evaluate and print the result, 2, but x is not changed. If you want to overwrite x with the modified value, you run x = x + 1.
Just the same way, running recode, will evaluate and print a result, but if you want to modify the column in your data frame, you need to explicitly assign it with freecut$homeland = recode(...).
There are a few exceptions in add-on packages. For example, the data.table package defines some set* operators which do modify objects in place. data.table is fantastic, especially if you need efficiency, but if you are just starting with R I would recommend getting familiar with the basics first.
In addition to Gregor's answer which addresses more fundamental issues, you can in your particular case use levels<-:
levels(freecut$homeland) <- c("first", "second", "third")
# order is important if you don't want surprises
Or if you are ready to join the dark side, consider macros from gtools package. The first steps are described e.g. in https://www.r-bloggers.com/macros-in-r/. Nobody is using macros in R but I don't know why. Maybe they're dangerous but maybe they just seem obscure.
I have a vector (really a column of a data frame) that looks like this:
data$outcome
[1] Good Good Good Good Poor
Levels: Good Poor
Here is the str on it:
str(data$outcome)
Factor w/ 2 levels "Good","Poor": 1 1 1 1 2
I don't want 1's and 2's as in as.numeric(data$outcome)
[1] 1 1 1 1 2
I know you are not supposed to dummy-code the variables "manually" for regression, and I know about {psych} dummy.code(), which returns a matrix. I understand that I could use something like model.matrix() on the data.frame:
data$outcome <- model.matrix(lm(s100b ~ outcome, data))[,2]
Not nice...
Isn't there something like dummify(data$outcomes) somewhere in R? Please refrain from easy jokes...
I slightly prefer
data$isGood <- as.numeric(data$outcome == 'Good')
because it is a bit more explicit / less opaque, and would still work even if someone added a new level 'Awesome' to the factor.
this is probably a simple one, but I somehow got stuck...
I need to many loops to get the result of every sample in my support like the usual stacked loops:
for (a in 1:N1){
for (b in 1:N2){
for (c in 1:N3){
...
}
}
}
but the number of the for loops needed in this messy system depends on another random variable, let's say,
for(f in 1:N.for)
so how can I write a for loop to do deal with this? Or are there more elegant ways to do this?
note that the difference is that the nested for loops above (the variables a,b,c,...) do matter in my calculations, but the variable f of the for loop that controls for the number of for loops needed does not go into any of my calculations for my real purpose - all it does is count/ensure the number of for loops needed is correct.
Did I make it clear?
So what I am actually trying to do is generate all the possible combinations of a number of peoples preferences towards others.
Let's say I have 6 people (the simplest case for my purpose): Abi, Bob, Cath, Dan, Eva, Fay.
Abi and Bob have preference lists of C D E F ( 4!=24 possible permutations for each of them);
Cath and Dan have preference lists of A B and E F, respectively (2! * 2! = 4 possible permutations for each of them);
Eva and Fay have preference lists of A B C D (4!=24 possible permutations for each of them);
So all together there should be 24*24*4*4*24*24 possible permutations of preferences when taking all six them together.
I am just wondering what is a clear, easy and systematic way to generate them all at once?
I'd want them in the format such as
c.prefs <- as.matrix(data.frame(Abi = c("Eva", "Fay", "Dan", "Cath"),Bob = c("Dan", "Eva", "Fay", "Cath"))
but any clear format is fine...
Thank you so much!!
I'll assume you have a list of each loop variable and its maximum value, ordered from the outermost to innermost variable.
loops <- list(a=2, b=3, c=2)
You could create a data frame with all the loop variable values in the correct order with:
(indices <- rev(do.call(expand.grid, lapply(rev(loops), seq_len))))
# a b c
# 1 1 1 1
# 2 1 1 2
# 3 1 2 1
# 4 1 2 2
# 5 1 3 1
# 6 1 3 2
# 7 2 1 1
# 8 2 1 2
# 9 2 2 1
# 10 2 2 2
# 11 2 3 1
# 12 2 3 2
If the code run at the innermost point of the nested loop doesn't depend on the previous iterations, you could use something like apply to process each iteration independently. Otherwise you could loop through the rows of the data frame with a single loop:
for (i in seq_len(nrow(indices))) {
# You can get "a" with indices$a[i], "b" with indices$b[i], etc.
}
For the way of doing the calculation, an option is to use the Reduce function or some other higher-order function.
Since your data is not inherently ordered (an individual is part of a set, its preferences are part of the set) I would keep indivudals in a factor and have eg preferences in lists named with the individuals. If you have large data you can store it in an environment.
The first code is just how to make it reproducible. the problem domain was akin for graph oriented naming. You just need to change in the first line and in runif to change the behavior.
#people
verts <- factor(c(LETTERS[1:10]))
#relations, disallow preferring yourself
edges<-lapply(seq_along(verts), function(ind) {
levels(verts)[-ind]
})
names(edges) <- levels(verts)
#directions
#say you have these stored in a list or something
pool <- levels(verts)
directions<-lapply(pool, function(vert) {
relations <- pool[unique(round(runif(5, 1, 10)))]
relations[!(vert %in% relations)]
})
names(directions) = pool
num_prefs <- (lapply(directions, length))
names(num_prefs) <- names(directions)
#First take factorial of each persons preferences,
#then reduce that with multiplication
combinations <-
Reduce(`*`,
sapply(num_prefs, factorial)
)
I hope this answers your question!
I am using R to analyze a survey. Several of the columns include numbers 1-10, depending on how survey respondents answered the respective questions. I'd like to change the 1-10 scale to a 1-3 scale. Is there a simple way to do this? I was writing a complicated set of for loops and if statements, but I feel like there must be a better way in R.
I'd like to change numbers 1-3 to 1; numbers 4 and 8 to 2; numbers 5-7 to 3, and numbers 9 and 10 to NA.
So in the snippet below, OriginalColumn would become NewColumn.
OriginalColumn=c(4,9,1,10,8,3,2,7,5,6)
NewColumn=c(2,NA,1,NA,2,1,1,3,3,3)
Is there an easy way to do this without a bunch of crazy for loops? Thanks!
You can do this using positional indexing:
> c(1,1,1,2,3,3,3,2,NA,NA)[OriginalColumn]
[1] 2 NA 1 NA 2 1 1 3 3 3
It is better than repeated/nested ifelse because it is vectorized (thus easier to read, write, and understand; and probably faster). In essence, you're creating a new vector that contains that new values for every value you want to replace. So, for values 1:3 you want 1, thus the first three elements of the vector are 1, and so forth. You then use your original vector to extract the new values based on the positions of the original values.
You could also try
library(car)
recode(OriginalColumn, '1:3=1; c(4,8)=2; 5:7=3; else=NA')
#[1] 2 NA 1 NA 2 1 1 3 3 3
I'd done a serious PHP/JS coding recently, and I kind-of lost my R muscle. While this problem can be easily tackled within PHP/JS, what is the most efficient way of solving this one: I have to grade a questionnaire, and I have following scenario:
raw t
5 0
6 2
7-9 3
10-12 4
15-20 5
if x equals to, or is within range given in raw, value in according row in t should be returned. Of course, this can be done with for loop, or switch, but just imagine very lengthy set of value ranges in raw. How would you tackle this one?
We seem to be missing a part of the example because there in no mention of "x"
dat <- read.table(textConnection("raw t
5 0
6 2
7-9 3
10-12 4
15-20 5"), header=TRUE, stringsAsFactors=FALSE)
dat$bot <- as.numeric( sapply( sapply(dat$raw, strsplit, "-"), "[", 1 ))
get.t <- function(x) findInterval(x, dat$bot)
get.t(8)
#[1] 3
> dat$t[get.t(6)]
[1] 2
> dat$t[get.t(5)]
[1] 0
I would simply use an indexing scheme kind of like what Corbin alluded to, but since he didn't provide an example, here's a simple one:
m <- cbind(c(5:12,15:20),
rep(c(0,2,3,4,5),times = c(1,1,3,3,6)))
m[m[,1] == 11,2]
[1] 4
Note: very similar to Simone's answer as I started typing this a bit back. Has a note at the end though. The indexing approach I give is essentially Simone's answer.
There will have to be a loop involved somewhere.
The pseudo code of what I would do is something like:
score = blah
for each raw => t
break raw into rMin -> rMax
if(rMin <= score and rMax >= score)
return t
It avoids having to loop over each number between rMin and rMax (which is what I'm assuming you meant), but without some kind of indexing, that is the best you're going to get.
Note: if you have a ton of calls to this, and indexing would actually be worth your while, the easiest type of indexing would just be a hash map of score -> t entries.
Basically you would parse your example data into something like:
index[5] = 0
index[6] = 2
index[7] = 3
index[8] = 3
index[9] = 3
You would need to carefully weigh if building the index would be more time consuming than just looping over the ranges.
Note: the indexing approach is actually what Simone said.