Good day
1) Is there a R function similar to Excel's match function?
2) I've made my own as below(lengthy..TT)
Could anybody suggest things need to be improved? Or other way?
fmatch2<-function(ss1, ss2) { #ss1 correspond the first argument of Excel match function. ss2 for the second.
fmatch<-function(ii,ss) { # return location in ss where ii match.
if (length(which(ss==ii))>0 ) {
rr<- min(which(ss==ii))
} else {
if (length(which(ss>ii))>0)
{rr<-min(which(ss>ii))-1 }
}
return(rr)
}
rr<-list()
n<-1
for ( x in ss1 ) { # apply fmatch to each member in ss1
nn<-fmatch(x,ss2[1:n])
rr<-rbind(rr,nn)
n<-n+1
}
as.vector(unlist(rr[,1]))
}
Usages of the function fmatch2 as below.
Mimicking Excel "=MATCH(H1,$I$1:1,1)". Element name of the list below "ch, ci" correspond to column H, Column I. The result is the list named cn.
x<-data.frame(cf=c(0,1,2,3,4,5),ch=c(0,0,3,6,6,6),ci=c(0,0,3,7,11,13))
y<-data.frame(cf=c(0,1,2,3,4,5),ch=c(0,0,3,6,6,6),ci=c(0,0,3,7,11,13),cn=fmatch2(x[[2]],x[[3]]))
Ofcourse i am not entirely sure what you're trying to do, as i'd expect your fmatch2 function to return NA for ch==6 (because 6 is not present in ci), but i love doing things using dplyr:
library(dplyr)
result <- x %>% # "%>%" means "and then"
mutate(chInCi = match(ch, x$ci)) #adds a column named "chInCi" with the position in ci of the first match of the value in ch
result
cf ch ci chInCi
1 0 0 0 1
2 1 0 0 1
3 2 3 3 3
4 3 6 7 NA
5 4 6 11 NA
6 5 6 13 NA
Related
I am trying to mutate() a 0 or 1 at a specific position in a column. Normally mutate() just mutates the whole column but I want to check conditions and then place a value at a specific position. I tried to use something like an index. Hear is an example: I have values and want to compare them one by one. compare 10 to 16, 16 to 9 and so on. The criteria is: Are value 1 and 2 either both in a or not in a, or is one in a and the other value is not. I wrote down an approach but it seems like mutate does not allow to use TaskS[i+1].
Thanks for your help!
Index
Values
TaskS
1
10
2
16
1
3
9
1
4
8
0
a <- c(1:10)
data_time_filter <- mutate(data_time_filter, TaskS = '')
for (i in 1:40){
current <- data_time_filter$Trial_Id[i] %in% a
adjacent <- data_time_filter$Trial_Id[i+1] %in% a
if (current == adjacent){
data_time_filter <- mutate(data_time_filter, TaskS[i+1] = 0)
}
else if (current != adjacent){
data_time_filter <- mutate(data_time_filter, TaskS[i+1] = 1)
}
}
I am not really sure if I understand your question correctly but I will try to help anyway.
In my approach I have used a user made function in combination with sapply. I believe to work mutate correctly you need an vector output which you won't get with a loop.
So, here is what I did:
# Recreate df
data_time_filter <- data.frame(
index = 1:4,
Values = c(10, 16, 9, 8)
)
# Create filter
ff <- c(1:10)
# Add empty TakS column
data_time_filter <- data_time_filter %>%
mutate(TaskS = '')
# Define a function
abc <- function(data, filter){
l <- length(data)
sapply(1:l, function(x){
if(x == 1){
""
} else {
current <- data[x-1] %in% filter
adjacent <- data[x] %in% filter
if(current == adjacent){
0
} else {
1
}
}
})
}
This approach will let you use mutate:
> data_time_filter
index Values TaskS
1 1 10
2 2 16
3 3 9
4 4 8
> data_time_filter %>%
mutate(TaskS = abc(Values, ff))
index Values TaskS
1 1 10
2 2 16 1
3 3 9 1
4 4 8 0
You could even skip making placeholder TaskS column and create a new one:
> data_time_filter %>%
mutate(TskS_new = abc(Values, ff))
index Values TaskS TskS_new
1 1 10
2 2 16 1
3 3 9 1
4 4 8 0
I have a data frame that looks like:
ID A B
0 8 25
1 16 123
2 4 120
... ...
What I want to do now is to iterate over column 'A' for example and call a function with the value of the cell and return it at the same location.
For example a function like (x^2)-1.
int calculation(int val){
return val*val-1;
}
...code...
while(i<A.length){
A[i] = calculation(A[i]);
}
So the result should look like this.
ID A B
0 63 25
1 265 123
2 15 120
... ...
I am new to R, if you know some good basic guidelines or books for scientific plotting, let me know. :-)
Thanks for your help.
This is very straightforward task in R.
a<-c(8,16,4)
b<-c(25,123,120)
df<-data.frame(a,b)
calculation<-function(a){
a^2-1
}
# Method 1
df$a<-(df$a^2)-1
# Method 2
df$a<-calculation(df$a)
Here is very simple example using your data.frame (say df)
#Define your function as
calcualtion <- function(x){
x*x - 1
}
#Suppose you want to call it on column A. Use sapply function as
df$A <- sapply(df$A, calcualtion)
result:
ID A
1 0 63
2 1 255
3 2 15
The beauty of R is its simplicity.
If you would like to apply some operation for column A:
df[["A"]] = ((df[["A"]]*df[["A"]]) - 1)
Also, I would strongly recommend googling "vectorization in R".
Data set "dat" looks like this:
**V1 V2**
1 2
2 2
3 5
9 8
9 9
a 2
Want to create dummy variable V3:
if V1=V2, 0
otherwise, within a range 1-8
Where 8+ is involved, or any symbol or letter, the variable should read NA. In the above example, the
V3 = {0,1,0,NA,NA,NA}
This would be one of the many ways it can be done. There might be some more efficient ways:
# Create the original dataset
data <- data.frame(V1 = c(1,2,3,9,9,"a"), V2 = c(2,2,5,8,9,2))
# Check if V1 == V2 and write the result to V3 for ALL observations
data$V3 <- data$V1 == data$V2
# Where V1 or V2 are not in the range [1,8], overwrite V3 with NA
data$V3[!(grepl("\\b[12345678]\\b", data$V2) &
grepl("\\b[12345678]\\b", data$V1))] <- NA
Where the "\\b[12345678]{1,1}\\b" can be decomposed as follows:
1) the [12345678] part check, if the string contains some number from the range 1:8.
2) the \bb ... \bb part gives you the word boundary - thus number 2 will be matched, but number 28 will not.
If you wanted to match a range 0:13, you would adjust the regular expression like this:
data$V3[!(grepl("\\b([0-9]|1[0-3])\\b", data$V2) &
grepl("\\b([0-9]|1[0-3])\\b", data$V1))] <- NA
Where the \\b([0-9]|1[0-3])\\b can be translated as follows:
1) [0-9] matches numbers 0:9
2) 1[0-3] matches numbers 10:13
3) [0-9]|1[0-3] tells you that numbers 0:9 or 10:13 should be matched
4) \b...\b gives you the word boundaries
5) (...) tells you that the word boundaries should be evaluated after the expression within brackets. Without the brackets, this would be equivalent operation: \\b[0-9]\\b|\\b1[0-3]\\b
For more detailed introduction into matching numeric ranges with regular expression see this link: http://www.regular-expressions.info/numericranges.html
There are many ways to do this. This one has a loop which checks each row and based on a set of rules, returns whatever you want. This is easily extendable for more complex rules. Warnings can be ignored as they are produced when "a" is being coerced to numeric.
x <- read.table(text = "1 2
2 2
3 5
9 8
9 9
a 2", header = FALSE)
x$V3 <- apply(x, MARGIN = 1, FUN = function(m) {
xm <- as.numeric(as.character(m))
if (!any(is.na(xm))) {
if (any(xm > 8)) {
return(NA)
}
if(xm[1] == xm[2]) {
return(1)
} else {
return(0)
}
} else {
return(NA)
}
})
V1 V2 V3
1 1 2 0
2 2 2 1
3 3 5 0
4 9 8 NA
5 9 9 NA
6 a 2 NA
Let's say I have this sample dataframe:
df = data.frame(id=rep(1:2,each=5),v1=c(1:10),v2=c(1:10))
And, say, I would like to create a new column sumv1v2, that would contain the sum of v1 and v2, only if id = 1 (otherwise sumv1v2 would be 0) .
The following, with a custom function defined before hand, works:
condisum = function(pid,pv1,pv2){
if (pid[1]==1) {pv1+pv2}
else {0}
}
df = ddply(df,"id",mutate,sumv1v2=condisum(id,v1,v2))
And the returned dataframe is what I need:
df
id v1 v2 sumv1v2
1 1 1 1 2
2 1 2 2 4
3 1 3 3 6
4 1 4 4 8
5 1 5 5 10
6 2 6 6 0
7 2 7 7 0
8 2 8 8 0
9 2 9 9 0
10 2 10 10 0
But could I define the function like inline within ddply(), i.e. like a anonymous function? I tried this:
df = data.frame(id=rep(1:2,each=5),v1=c(1:10),v2=c(1:10))
df = ddply(df,"id",mutate,sumv1v2=function(pid,pv1,pv2){
if (pid[1]==1) {pv1+pv2}
else {0}
}(id,v1,v2))
And I got this error message:
Error: attempt to replicate an object of type 'closure'
I know I cannot pass a function to mutate, and should pass an expression, thanks to the excellent comment by Gregor in this post:
Use of ddply + mutate with a custom function?
So I am trying to pass a anonymous function with argument to it. Would this make it an expression? But I still got an error.
So, is it possible NOT to define the custom function before hand, and define it with function() inside ddply()?
Oh after more trial, I realize the issue, finally.
The following now works:
df = data.frame(id=rep(1:2,each=5),v1=c(1:10),v2=c(1:10))
df = ddply(df,"id",mutate,sumv1v2=(function(pid,pv1,pv2){
if (pid[1]==1) {pv1+pv2}
else {0}
})(id,v1,v2))
Take note of the new ( and ) around the anonymous function. Guess this turns it into a function at last, and with (id,v1,v2) to pass in the parameters, the whole thing finally becomes an expression.
In simpler form, I tried this:
x = function(y){y^2}(3)
x
and x returns:
function(y){y^2}(3)
But, if I add ( and ):
x = (function(y) y^2)(3)
x
x returns:
[1] 9
Or you could just define the function within the scope of the ddply call, and then use it. This might make the whole thing a lot easier to read.
df <- data.frame(id=rep(1:2,each=5),v1=c(1:10),v2=c(1:10))
df <- ddply(
df,
"id",
mutate,
sumv1v2={
f <- function(pid,pv1,pv2) {
if (pid[1]==1) pv1 + pv2
else 0
}
f(id,v1,v2)
}
)
I got this text file:
a b c d
0 2 8 9
2 0 3 4
8 3 0 2
9 4 2 0
I put this command in R:
k<-read.table("d:/r/file.txt", header=TRUE)
now I want to access the value in row 3 , column 4 (which is 2) ... how can I access it?
Basically my question is how to access table data one by one? I want to use all data separately in nested for loops.
Like:
for(row=0;row<4;row++)
for(col=0;col<4;col++)
print data[row][col];
You may want to apply a certain operation on each element of matrix.
This is how you could do it, an example
A <- matrix(1:16,4,4)
apply(A,c(1,2),function(x) {x %% 5})
And operation on the whole row
apply(A,1,function(x) sum(x^2))
Is this what you want? :
test <- read.table("test.txt", header = T, fill = T)
for(i in 1:nrow(test)){
for(j in 1:ncol(test)) {
print(test[i,j])
}
}