I wanted to concatenate two columns whereby one column is numeric and the other one is character specifically, quality sign (+/-). Below is example:
test <- data.frame(cbind(c(4,-5,6),c("-","-","-")),stringsAsFactors = F)
test$X1 <- paste0(test$X2,test$X1)
test$X1 <- as.numeric(test$X1)
As we can see the output is introduced by NAs due to coercion.
Can anyone please give a hint to solve this as to put condition during concatenation? Thanks.
The real problem in your code is that for row 2 you get a string like this --4 (note the two minus signs). So there are plenty of options as the comments showed you, yet another one would be
as.numeric(paste0(test$X2, 1)) * as.numeric(test$X1)
# [1] -4 5 -6
Related
I am very new at R so I know the fix is simple, I would appreciate if someone could explain to me though my mistake and how to fix it.
dat4<-c(10, 11)
subDat<-dat4[,c(10,11)]
The error that I am getting is "Error in subDat4<-dat4[,c(10,11)] incorrect number of dimensions"
Thank you in advance
welcome to StackOverflow.
You are specifying the dat4 as a vector (one dimension object), but trying to subset as data.frame/tible (2 dimensional objects)...
To specify dat4[a,b], with a being the indication for rows, and b a indication for columns, you need to have columns and rows (data frame, matrix, ...)
Your data is not a matrix, thus, you can not subset a vector as a matrix. You can only subset matrix with square bracket as you did.
Try
dat4<-c(10, 11)
dat5<-c(12, 13)
mat1<-matrix(c(dat4,dat5),nrow=2)
mat1[1,2]
# 12
You can see my subst states row one column two which prints 12, that is the element that falls on row one column two.
If you want to subset the vector you provided you can go this way.
dat4[[1]]
#[1] 10
That show the first element of the vector 'dat4' and
dat4[[2]]
#[ 11
Which show the second element of 'dat4'
I hope this answer is of help to you.
I have a strings and it has some patterns like this
my_string = "`d#k`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`n$l`0.4`0.1`0.25`0.28`0.18`0.3`0.17`0.2`0.03`!lk`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`vnabgjd`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`pogk(`1.01`0.71`0.86`0.89`0.79`0.91`0.78`0.81`0.64`r!#^##niw`0.0014`0.0020`9.9999`9.9999`0.0020`0.0022`0.0032`9.9999`0.0000`
As you can see there is patterns [`nonnumber] then [`number.num~] repeated.
So I want to identify how many [`number.num~] are between [`nonnumber].
I tried to use regex
index <- gregexpr("`(\\w{2,20})`\\d\\.\\d(.*?)`\\D",cle)
regmatches(cle,index)
but using this code, the [`\D] is overlapped. so just It can't number how many the pattern are.
So if you know any method about it, please leave some reply
Using strsplit. We split at the backtick and count the position difference which of the values coerced to "numeric" yield NA. Note, that we need to exclude the first element after strsplit and add an NA at the end in the numerics. Resulting in a vector named with the non-numerical element using setNames (not very good names actually, but it's demonstrating what's going on).
s <- el(strsplit(my_string, "\\`"))[-1]
s.num <- suppressWarnings(as.numeric(s))
setNames(diff(which(is.na(c(s.num, NA)))) - 1,
s[is.na(s.num)])
# d#k n$l !lk vnabgjd pogk( r!#^##niw
# 9 9 9 9 9 9
I have a boolean vector in which I want to count the number of occurrences of some patterns.
For example, for the pattern "(1,1)" and the vector "(1,1,1,0,1,1,1)", the answer should be 4.
The only built-in function I found to help is grepRaw, which finds the occurrences of a particular string in a longer string. However, it seems to fail when the sub-strings matching the pattern overlap:
length(grepRaw("11","1110111",all=TRUE))
# [1] 2
Do you have any ideas to obtain the right answer in this case?
Edit 1
I'm afraid that Rich's answer works for the particular example I posted, but fails in a more general setting:
> sum(duplicated(rbind(c(FALSE,FALSE),embed(c(TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE),2))))
[1] 3
In this other example, the expected answer would be 0.
Using the function rollapply you can apply a moving window of width = 2 summing the values. Then you can sum the records where the result is equal to 2 i.e. sum(c(1,1))
library(zoo)
z <- c(1,1,1,0,1,1,1)
sum(rollapply(z, 2, sum) == 2)
I seem to be stuck behind some really simple problem. I just cannot figure it out nor can I find an answer here. I tried searching stackoverflow for almost an hour.
I want to find rows based on one column (direction "backward") and then multiply those rows in another column (amount) with -1 or any number for that matter.
amount direction
1 forward
2 forward
3 forward
4 forward
1 backward
2 backward
3 backward
So that I would get
amount direction
1 forward
2 forward
3 forward
4 forward
-1 backward
-2 backward
-3 backward
I know how to find the rows: df[grep("backward",df$direction),]
or how to multiply in general- df[,1]=df[,1](-1)
but I cannot put it together. I can pull out the ones I need and then multiply and then rbind or cbidn but if I have a really big df with many columns and rows I dont want to start pasting it all together again I just want to change something in one column based on another column.
I managed something like this but it does not want to multiply :
df$amount[df$direction %in% c("backward")] <- ((-1))
df$amount[grep("backward",df$direction)]<-((-1))
always get the same error:
Error: unexpected '' in "df$amount[grep("backward",df$direction)]<-*"
And I'm really sorry if this question exists already somewhere. I did find lots of similar questions but they did not help me out.
Thank you!
so as alexis said the answer is:
df$amount [grep ("backward", df$direction)] <- df$amount [grep ("backward", df$direction)]* (-1)
OR
df$amount [df$direction %in% c("backward")] <- df$amount [df$direction %in% c("backward")]* (-1)
If you have a dataframe like this
mydf <- data.frame(firstcol = c(1,2,1), secondcol = c(3,4,5))
Why would
mydf[mydf$firstcol,]
work but
mydf[firstcol,]
wouldn't?
You can do this:
mydf[,"firstcol"]
Remember that the column goes second, not first.
In your example, to see what mydf[mydf$firstcol,] gives you, let's break it down:
> mydf$firstcol
[1] 1 2 1
So really mydf[mydf$firstcol,] is the same as
> mydf[c(1,2,1),]
firstcol secondcol
1 1 3
2 2 4
1.1 1 3
So you are asking for rows 1, 2, and 1. That is, you are asking for your row one to be the same as row 1 of mydf, your row 2 to be the same as row 2 of mydf and your row 3 to be the same as row 1 of mydf; and you are asking for both columns.
Another question is why the following doesn't work:
> mydf[,firstcol]
Error in `[.data.frame`(mydf, , firstcol) : object 'firstcol' not found
That is, why do you have to put quotes around the column name when you ask for it like that but not when you do mydf$firstcol. The answer is just that the operators you are using require different types of arguments. You can look at '$' to see the form x$name and thus the second argument can be a name, which is not quoted. You can then look up ?'[', which will actually lead you to the same help page. And there you will find the following, which explains it. Note that a "character" vector needs to have quoted entries (that is how you enter a character vector in R (and many other languages).
i, j, ...: indices specifying elements to extract or replace. Indices
are ‘numeric’ or ‘character’ vectors or empty (missing) or
‘NULL’. Numeric values are coerced to integer as by
‘as.integer’ (and hence truncated towards zero). Character
vectors will be matched to the ‘names’ of the object (or for
matrices/arrays, the ‘dimnames’): see ‘Character indices’
below for further details.
Nothing to add to the very clear explanation of Xu Wang. You might want to note in addition that the package data.table allows you to use notation such as mydf[firstcol==1,] or mydf[,firstcol], that many find more natural.