I do have the output of a function which looks like this
function(i,var1,list1)->h
and then the output
value
2.8763
There is a line break in the output and I only need the number bit of the result but not the string. Hence, I tried to use h[1] but this is
value
2.87..
and length(h) is also equal 1. Is there any way to access only the number in this case?
Thanks,
you are accessing only the value and the name you see is the name of the element in a vector you return. you can get rid of those names / attributes like this:
> v <- c("a" = 1, "b" = 2)
> v
a b
1 2
> attributes(v) <- NULL
> v
[1] 1 2
Related
I have a dataframe where i need to compare two columns and find the number of matching characters between two elements.
For eg: x and y are two elements to be compared which look like below:
x<- "1/2"
y<-"2/3"
I did unlisted and splitted them by '/' as below:
unlist(strsplit(x,"/"))->a
unlist(strsplit(y,"/"))->b
Then i used pmatch:
pmatch(a,b,nomatch =0)
[1] 0 1
Used sum() to know how many characters are matching:
sum(pmatch(a,b,nomatch =0))
[1] 1
However, when the comparison is done the other way:
pmatch(b,a,nomatch = 0)
[1] 2 0
Since there is only one match between the two string, why is it showing 2. It could be index. But i would need to get how many characters are same between the strings irrespective of the comparison a vs b or b vs a.
Could someone help how to get this.
Per ?pmatch, pmatch seeks matches for the elements of its first argument among those of its second.
For example, "2" in the first list matches the second element in the second list.
> pmatch(c("2", "1"),c("3","2"),nomatch =0)
# [1] 2 0
One way to know the number of elements got matched is to sum non-zero elements:
sum(pmatch(c("2", "1"),c("3","2"),nomatch =0) != 0)
# [1] 1
Both
sum(pmatch(b, a, nomatch = 0) != 0) # 1
sum(pmatch(a, b, nomatch = 0) != 0) # 1
return the same value.
Another option could be
sum(b %in% a)
[1] 1
sum(a %in% b)
[1] 1
I have a vector named "vec". I label the elements from "a" to "m"
vec <- c(1,1,1,2,2,2,2,2,2,4,4,4,4)
names(vec) <- c("a","b","c","d","e","f","g","h","i","j","k","l","m")
Then I split the vec according to the sequences.
split_vec <- split(vec, vec)
Now when I type
Spec_vec$"1" I get the first list.
Instead of typing the specific name as "1". I want to get the values
such as
spec_vec$vec[1]
But the above function doesn't work. Is there a way to get that?
You can do
split_vec[[as.character(vec[1])]]
# a b c
# 1 1 1
Notice that you need as.character, since just the number value from vec[i] would give incorrect results for calls like split_vec[[vec[10]]] where you would expect the third element.
split_vec[[vec[10]]]
# Error in split_vec[[vec[10]]] : subscript out of bounds
split_vec[[as.character(vec[10])]]
# j k l m
# 4 4 4 4
But in general, it's best to avoid such names that begin with numerics because, obviously, it's quite awkward and can cause trouble.
I have an example filter table as below and a big source data table. I need to do the merge using these two tables. If no column in the filter table contains ALL, use three columns to do the the merge (using Tran=1001, Acct=1 & Co=a to do the inner join with the data table).If one of them, ie Tran has ALL, use the remaining two columns to do the merge (using Acct=3 & Co=c to do the join). If two of them, ie Tran and Acct, have All, use the remaining one column to do the merge (using Co=b to do the join).
The real question is the number of columns is uncertain.
Can anyone help me with this?
Tran Acct Co
1001 1 a
1002 ALL ALL
ALL ALL b
ALL 4 ALL
1003 2 ALL
ALL 3 c
1004 ALL d
You're going to have to write a series of conditional statements using if, elseif and else. I'll use the %in% operator to check for this. The %in% operator returns a series of boolean values. The easiest way is to show through example:
> x <- c(1, 2, 3, 4, 5)
> y <- c(2, 3, 4, 5, 6)
> x %in% y
[1] FALSE TRUE TRUE TRUE TRUE
Notice that it returns FALSE for the first value as the value of 1 in x is not in y. You can do the same for the "ALL" value in your data set. I assume you are going row by row as you seemed to imply in your question. Let me know if you need to check the whole column first (you can use the any function for that case). Here is an example of your first condition:
# Assume that df is your data.frame of data.
for (i in 1:length(df$Tran)) {
if (!("All" %in% df$Tran[i]) & !("ALL" %in% df$Acct[i]) & !("All" %in% df$Co[i])) {
# Do your merge here
}
if ( [Put your next condition here] ) {
# Do the appropriate merge for that condition
}
...
Note that I used the "!" operator to get the inverse of whatever %in% returns because you want it to be the case where ALL is NOT in the row. I realize now that you could have just done All != df$Tran[1] since you are going row by row, but %in% might be more useful if you end up going for the whole column.
Hope this helps!
Editing in a new method now that it's more clear what the need is. So we have to find the number of "ALL" values in each row and then merge a certain way depending on the number of them. There are a lot of methods, but here's one I like:
> test <- data.frame(a = "ALL", b = 2, c = "ALL", d = 3, e = "ALL")
> test
a b c d e
1 ALL 2 ALL 3 ALL
> table(test[1, ] == "ALL")["TRUE"]
TRUE
3
Basically, I'm looking at the first row, and getting the number that return TRUE when asked if it contains the string "ALL". From here you can set conditionals on this number. To automate over the entire data frame, throw it in a for loop and set "1" equal to "i" or whatever you sequence variable is.
To get which rows have "ALL" in it (which in converse would tell which rows do not have "ALL" in it as well), you can use grep on each row. Here's a short example:
> # Initializing a sample data frame.
> df <- data.frame(a = "1", b = "ALL", c = "ALL", d = "5", e = "ALL")
> print(df)
a b c d e
1 1 ALL ALL 5 ALL
>
> # Finding the column numbers that have "ALL" in it using grep.
> places <- grep("ALL", df[1, ])
> print(places)
[1] 2 3 5
>
> # Each number corresponds to the order of the columns in the data frame and can be returned as such.
> nameCols <- names(df)[places]
> print(nameCols)
[1] "b" "c" "e"
>
> # Likewise, you can find what columns did not have "ALL" in it by doing the opposite.
> nameColsNOT <- names(df)[-places]
> print(nameColsNOT)
[1] "a" "d"
Iterate this method through a loop for each row in your data frame and use the conditional method I outlined above. Please note that this requires your columns to all be of "character" class, which I assume is the case already.
I am trying to obtain the sum of values of a column (B) based on the interval between two values on another column (A) in a "reference" dataframe (df):
A <- seq(1:10)
B <- c(4,3,5,7,5,7,4,7,3,7)
df <- data.frame(A,B)
I have found two ways of doing this:
y <- sum(subset(df, A < 3 & A >= 1, select = "B"))
> y
[1] 7
and
z <- with(df,sum(df[A<3 & A>=1,"B"]))
> z
[1] 7
However, I would like to do this based on a two vectors of values stored on another dataframe
C <- c(3,7,7)
D <- c(1,1,5)
df2 <- data.frame(C,D)
to obtain a column of y values for each pair of C and D values.
I have created a function:
myfn <- function(c,d) {
y <-sum(subset(df, A < c & A >= d, select = "B"))
return(y)
}
Which works fine with numbers
myfn(3,1)
[1] 7
but not with vectors.
myfn(c=C,d=D)
[1] 19
Warning messages:
1: In A < a :
longer object length is not a multiple of shorter object length
2: In A >= b :
longer object length is not a multiple of shorter object length
> myfn(df2$C,df2$D)
[1] 19
Warning messages:
1: In A < a :
longer object length is not a multiple of shorter object length
2: In A >= b :
longer object length is not a multiple of shorter object length
>
Does anyone have any suggestion about how I could calculate such interval for sequence of values?
Try:
mapply(myfn, C, D)
# [1] 7 31 12
The problem is that your function is not naturally vectorized. You can see that because your return value is a sum of the inputs, and sum is not a vectorized operation.
Beyond that, if you look at myfn, the expression A < c & A >= d doesn't make sense when c and d have more than one value. There, you are comparing each value in df to the corresponding value in your C and D vectors (so first value to first, second to second, etc.), instead of comparing all the values in df to each value in C and D in turn.
By using mapply, I'm basically looping through your function with as arguments a single value from C and D at a time.
Fortunately in your case it turns out that C,D have different number of elements than df, so you actually got a warning. If they were the same length you would not have gotten a warning and you would have gotten a single value answer, instead of the three you are presumably looking for.
There are better ways to do this, but the mapply approach is pretty trivial here and works with your code pretty much as is.
Another way...
is.between <- function(x,vec){
return(x>=min(vec) & x<max(vec))
}
apply(df2,1,function(x){sum(df[is.between(df$A,x),]$B)})
# [1] 7 31 12
I want to change the name of the output of my R function to reflect different strings that are inputted. Here is what I have tried:
kd = c("a","b","d","e","b")
test = function(kd){
return(list(assign(paste(kd,"burst",sep="_"),1:6)))
}
This is just a simple test function. I get the warning (which is just as bad an error for me):
Warning message:
In assign(paste(kd, "burst", sep = "_"), 1:6) :
only the first element is used as variable name
Ideally I would get ouput like a_burst = 1, b_burst = 2 and so on but am not getting close.
I would like split up a dataframe by contents of a vector and be able to name everything according to the name from that vector, similar to
How to split a data frame by rows, and then process the blocks?
but not quite. The naming is imperative.
Something like this, maybe?
kd = c("a","b","d","e","b")
test <- function(x){
l <- as.list(1:5)
names(l) <- paste(x,"burst",sep = "_")
l
}
test(kd)
You could use a vector instead of a list by way of setNames:
t1_6 <- setNames( 1:6, kd)
t1_6
a b d e b <NA>
1 2 3 4 5 6
> t1_6["a"]
a
1
Looking at the question again I wondered if you wnated to assign sequential names to a character vector:
> a1_5 <- setNames(kd, paste0("alpha", 1:5))
> a1_5
alpha1 alpha2 alpha3 alpha4 alpha5
"a" "b" "d" "e" "b"