Use R to translate 'coded' table [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have a problem with using R to 'translate' back a coded table. So I have a table with table elements consisting of XX,XY,YY. I have a second table (.csv) with the proper meaning of the X and Y - so it might look like, if X=1 and Y=2,
XY is transformed into 12
XX is transformed into 11 ...
can anybody hint at a good starting point to write such a program/ piece of code in R?

This is slightly different than a lookup table in that you're actually regexing and replacing parts of each element. The qdap (Quantitative Discourse Analysis Package) has a mgsub (multiple gsub) function that can handle this easily.
library(qdap)
#recreate scenerio with quick character vector (no need for quotes)
z <- factor(qcv(XX,XY,YY))
#replace all X and Ys with 1 and 2
mgsub(pattern = c("X", "Y"), replacement = c(1, 2), text.var = z)
#Even better if you have the code book read in, say it looks like this:
code.book <- data.frame(symb = c("X", "Y"), replacement = c(1, 2))
# > code.book
# symb replacement
# 1 X 1
# 2 Y 2
mgsub(code.book$symb, code.book$replacement, z)

Related

How to save a calculation to a variable using variables that have not been defined yet? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed last month.
Improve this question
Let's say I have a script, where I have a calculation like this:
calculation = c(
a*b +
c*d +
e*f
)
And then in another script I want to call that calculation using the source command.
I get an error saying "Object 'a' not found". What am I doing wrong?
Edit: I don't want to make a function, because this specific calculation is used as input in a complex program in r (apollo) - the input specifies a utility function in a logit regression.
You can capture your expression using expression(), then when you're ready, evaluate using eval():
calculation <- expression(a*b + d*e + f*g)
a <- 1
b <- 2
d <- 3
e <- 4
f <- 5
g <- 6
eval(calculation)
# 44

Finding shift/phase between two datasets [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a large dataset D with (x,y) coordinates in 2D. I have almost the same (with a very few elements missing) dataset D' but with a constant shift, s. That is, the elements of D' are (x+s, y). How do I compute the shift, in an efficient way? Thanks. An r code would be terrific.
If the values of y are equal in D and D' you can perform a join on y and a rolling join on x with data.table.
library(data.table)
set.seed(1)
D <- data.frame(x = runif(100,1,100), y = runif(100,1,100))
Dprime <- D[sample(1:100,90),]
Dprime$x <- Dprime$x + runif(length(Dprime$x),2.8,3.2)
setDT(Dprime)
setDT(D)
D[,x.original := x]
Dprime[,x.shift := x]
Dprime[D,on=c("y","x"),roll = "nearest"][,.(Shift = x.shift - x.original)][,median(Shift,na.rm=TRUE)]
#[1] 2.997595
This addresses the issue of potential duplicate values of y. Those values which are missing in D' simply get NA and are eliminated by median(x,na.rm=TRUE).
For more options on roll = that may be better suited to your unique problem, see the roll section of help(data.table).
Every x value in D' provides an estimate of s s = x' - x so probably your best estimate of s is the average of those.
s_est = mean(D'$x - D$x, na.rm = T)

Bidirectional assignment operator in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Is there an R function that assigns variables bidirectionally? For example, let <-> represent a bidirectional assignment operator.
a <-> b
a
> b
b
> a
One can define something like:
`%<->%` <- function(x,y){
t <- y
assign(deparse(substitute(y)), x, envir=parent.frame())
assign(deparse(substitute(x)), t, envir=parent.frame())
}
a <- 1
b <- 2
a %<->% b
a
[1] 2
b
[1] 1
Such an operator would not make sense with the way R functions:
From Hadley Wickham's book Advanced R, section "Binding basics":
Consider this code:
x <- c(1, 2, 3)
[...] this code is doing two things:
It’s creating an object, a vector of values, c(1, 2, 3).
And it’s binding that object to a name, x.
So, for instance, when you run:
a <- 1
you are creating a numerical vector with one element and you are binding it to the name a.
a <-> b
would be binding names to one another, which makes no sense in R.
Also note than when you do:
a <- 1
b <- a
b
# [1] 1
You get 1 as the output, not a, because you create another binding (b) to the numerical vector with the value 1. And when you run b, the output is the object binding to it (1), not another name this object is binding to.
Note: Hadley explains all this very clearly with diagrams in his book.

Reduce number of elements returned by lapply [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
As ?lapply states:
lapply returns a list of the same length as X, each element of which
is the result of applying FUN to the corresponding element of X.
Is it still possible to return a list with a smaller length than X?
Code
l <- lapply(1:10,function(u)ifelse(u<5,return(u),return(NULL)))
Can I place something in the return(NULL) part in order to drop/omit the element completely?
Desired Output
Output of the code section should be the same as:
l[!sapply(l,is.null)]
a list of 4 with only elements smaller 5!
Is it still possible to return a list with a smaller length than X?
Per the documentation quoted by the OP, the answer is "no, not unless you wrap lapply in another call that filters out the unwanted elements either before or after it."
There are many possible workarounds, but I might do ...
# example function
f = function(z) c(a = list(z+1), b = list(z-1), c = if (z > 3) list(z^2))
library(data.table)
data.table(x = 1:10)[x < 5, rbindlist(lapply(x, f), fill=TRUE)]
a b c
1: 2 0 NA
2: 3 1 NA
3: 4 2 NA
4: 5 3 16
... assuming the function returns a named list. If it just returns a scalar, try vectorizing or using sapply or vapply instead of lapply.

how to do looping in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Hey guys i want to do looping in R ; can anybody help me out
For eg i have sum of points and i want no of empid in 0-10 % of sum of points and so on ;how to do in R
For eg I have data as
empid sumofpoints
1 10
2 30
I want data as
percentageofsumpoints countofempid
0-10 4
11-20 5
21-30 6
and so on....
How to do it in R ,do i have to install any package for it
No need to install package.See http://nunn.rc.fas.harvard.edu/groups/pica/wiki/1f131/
Simple for loop
for (i in 1:10){
print(i)
}
In your example, asssuming your data is stored in a dataframe called df
res <- NULL
groups <- c(0,10,20,30,40,...)
for (i in 2:length(groups)){
res <- rbind(res,c(paste(groups[i],groups[i-1],sep="-"),nrow(df[df$sumofpoints <= groups[i] & df$sumofpoints > groups[i-1],])))
}
You can also use apply functions if you want to avoid for statements. This example I have taken directly from the help files
x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
dimnames(x)[[1]] <- letters[1:8]
apply(x, 2, mean, trim = .2)
EDIT further to this how to avoid loops
For large dataset, refer to the package foreach. This allows for a sequential loop set-up using %do% or a parallel set-up (faster for large datasets) using %dopar%.
http://cran.r-project.org/web/packages/foreach/vignettes/foreach.pdf
For parallel computing, be mindful that you will need a backend such as "doParallel" or "DoSNOW". There is also "doMC" which only works with operating systems that support the
fork system call (which means that Windows isn't supported).

Resources