This question already has answers here:
R fill vector efficiently
(4 answers)
Closed 6 years ago.
I have a vector of zeros, say of length 10. So
v = rep(0,10)
I want to populate some values of the vector, on the basis of a set of indexes in v1 and another vector v2 that actually has the values in sequence. So another vector v1 has the indexes say
v1 = c(1,2,3,7,8,9)
and
v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
In the end I want
v = c(0.1,0.3,0.4,0,0,0,0.5,0.1,0.9,0)
So the indexes in v1 got mapped from v2 and the remaining ones were 0. I can obviously write a for loop but thats taking too long in R, owing to the length of the actual matrices. Any simple way to do this?
You can assign it this way:
v[v1] = v2
For example:
> v = rep(0,10)
> v1 = c(1,2,3,7,8,9)
> v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
> v[v1] = v2
> v
[1] 0.1 0.3 0.4 0.0 0.0 0.0 0.5 0.1 0.9 0.0
You can also do it with replace
v = rep(0,10)
v1 = c(1,2,3,7,8,9)
v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
replace(v, v1, v2)
[1] 0.1 0.3 0.4 0.0 0.0 0.0 0.5 0.1 0.9 0.0
See ?replace for details.
Related
I'm working with a dataframe, entitled Clutch, of information about cards in a trading card game. One of the variables, CMD+, can consist of the following values:
"R+1"
"L+1"
"R+2"
"L+2"
0
What I want to do is to create a new variable, Clutch$C+, that takes these string values for each data point and replaces them with numbers. R+1 and L+1 are replaced with 0.5, and R+2 and L+2 are replaced with 1. 0 is unchanged.
How do I do this? Sorry if this is a basic question, my R skills aren't great at the minute, working on getting better.
probably not the most beautiful solution but this should work.
C<-rep(0,length(Clutch$CMD))
Clutch<-cbind(Clutch,C)
Clutch$C+[which(Clutch$CMD+=="R+1")]<-0.5
Clutch$C+[which(Clutch$CMD+=="L+1")]<-0.5
Clutch$C+[which(Clutch$CMD+=="R+2")]<-1
You can try:
paste0(as.numeric(gsub("\\D", "\\1", x))/2, sub("\\D", "\\1", x))
[1] "0.5+1" "0.5+1" "1+2" "1+2"
Here is one way using the fact that the result is half the digit in your string :
Clutch <- data.frame(`CMD+` = sample(c("R+1", "L+1", "R+2", "L+2", 0), 10, replace = TRUE))
Clutch[["C+"]] <- as.numeric(gsub("[^0-9]", "", Clutch$CMD))/2
Clutch
> Clutch
CMD. C+
1 R+1 0.5
2 R+2 1.0
3 R+1 0.5
4 L+1 0.5
5 L+1 0.5
6 R+1 0.5
7 R+1 0.5
8 L+1 0.5
9 0 0.0
10 L+1 0.5
You can simply use gsub
> as.numeric(gsub(".*[+]","",a))/2
[1] 0.5 0.5 1.0 1.0 0.0
If it is a data frame. You can use this-
> library(data.table)
> dt <- data.frame(CMD = c("R+1", "L+1", "R+2", "L+2", 0))
> setDT(dt)[,CMD:=as.numeric(gsub(".*[+]","",a))/2]
> dt
CMD
1: 0.5
2: 0.5
3: 1.0
4: 1.0
5: 0.0
Another idea is to use a simple ifelse statement that looks for 1 in the string and replaces with 0.5, and 2 to replace with 1, i.e.
#where x is your column,
as.numeric(ifelse(grepl('1', x), 0.5, ifelse(grepl('2', x), 1, x)))
#[1] 0.5 0.5 1.0 1.0 0.0
I'm not sure if the title is worded well, but here is the situation:
I have a meta data dataset, which can have any number of rows in it, e.g.:
Control_DF <- cbind.data.frame(
Scenario = c("A","B","C")
,Variable = c("V1","V2","V3")
,Weight = c("w1","w2","w3")
)
Using the data contained in Control_DF, I want to create a new version of each Variable on my main dataset, where I multiply the variable by the weight. So if my main dataset looks like this:
Main_Data <- cbind.data.frame(
V1 = c(1,2,3,4)
,V2 = c(2,3,4,5)
,V2 = c(3,4,5,6)
,w1 = c(0.1,0.5,1,0.8)
,w2 = c(0.2,1,0.3,0.6)
,w2 = c(0.3,0.7,0.1,0.2)
)
Then, in open code, what I want to do looks like this:
New_Data <- Main_Data %>%
mutate(
weighted_V1 = V1 * w1
,weighted_V2 = V2 * w2
,weighted_V3 = V3 * w3
)
However, I need a way of not hard coding this, and such that the number of variables being referenced is arbitrary.
Can anyone help me?
In base R with lapply, Map and cbind you could do as follows:
# with Control_DF create a list with pairs of <varName,wgt>
controlVarList = lapply(Control_DF$Scenario,function(x)
as.vector(as.matrix(Control_DF[Control_DF$Scenario==x,c("Variable","Weight")] ))
)
controlVarList
#[[1]]
#[1] "V1" "w1"
#
#[[2]]
#[1] "V2" "w2"
#
#[[3]]
#[1] "V3" "w3"
# A custom function for multiplication of both columns
fn_weightedVars = function(x) {
# x = c("V1","w1"); hence x[1] = "V1",x[2] = "w2"
# reference these columns in Main_Data and do scaling
wgtedCol = matrix(Main_Data[,x[1]] * Main_Data[,x[2]],ncol=1)
#rename as required
colnames(wgtedCol)= paste0("weighted_",x[1])
#return var
wgtedCol
}
#call function on each each list element
scaledList = Map(fn_weightedVars ,controlVarList)
Output:
scaledDF = do.call(cbind,scaledList)
#combine datasets
New_Data = data.frame(Main_Data,scaledDF)
New_Data
# V1 V2 V3 w1 w2 w3 weighted_V1 weighted_V2 weighted_V3
#1 1 2 3 0.1 0.2 0.3 0.1 0.4 0.9
#2 2 3 4 0.5 1.0 0.7 1.0 3.0 2.8
#3 3 4 5 1.0 0.3 0.1 3.0 1.2 0.5
#4 4 5 6 0.8 0.6 0.2 3.2 3.0 1.2
I am looking to extract the longest ordered portion of a vector. So for example with this vector:
x <- c(1,2,1,0.5,1,4,2,1:10)
x
[1] 1.0 2.0 1.0 0.5 1.0 4.0 2.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
I'd apply some function, get the following returned:
x_ord <- some_func(x)
x_ord
[1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
I've been trying to leverage is.unsorted() to determine at what point the vector is no longer sorted. Here is my messy attempt and what I have so far:
for(i in 1:length(x)){
if( is.unsorted(x[i:length(x)])==TRUE ){
cat(i,"\n")}
else{x_ord=print(x[i])}
}
However, this clearly isn't right as x_ord is producing a 10. I am also hoping to make this more general and cover non increasing numbers after the ordered sequence as well with a vector something like this:
x2 <- c(1,2,1,0.5,1,4,2,1:10,2,3)
Right now though I am stuck on identifying the increasing sequence in the first vector mentioned.
Any ideas?
This seems to work:
s = 1L + c(0L, which( x[-1L] < x[-length(x)] ), length(x))
w = which.max(diff(s))
x[s[w]:(s[w+1]-1L)]
# 1 2 3 4 5 6 7 8 9 10
s are where the runs start, plus length(x)+1, for convenience:
the first run starts at 1
subsequent runs starts where there is a drop
we tack on length(x)+1, where the next run would start if the vector continued
diff(s) are the lengths of the runs and which.max takes the first maximizer, to break ties.
s[w] is the start of the chosen run; s[w+1L] is the start of the next run; so to get the numbers belonging to the chosen run: s[w]:(s[w+1]-1L).
Alternately, split and then select the desired subvector:
sp = split(x, cumsum(x < c(-Inf, x[-length(x)])))
sp[[which.max(lengths(sp))]]
# 1 2 3 4 5 6 7 8 9 10
I want to combine two matrices with partly overlapping rownames in R. When the rownames match, values from the two matrices should end up as adjacent columns. When the rownames only occur in one matrix, empty space should be inserted for the other matrix.
Data set:
testm1 <- cbind("est"=c(1.5,1.2,0.7,4.0), "lci"=c(1.1,0.9,0.5,0.9), "hci"=c(2.0,1.7,0.8,9.0))
rownames(testm1) <- c("BadFood","NoActivity","NoSunlight","NoWater")
testm1 #Factors associated with becoming sick
testm2 <- cbind("est"=c(3.0,2.0,0.9,7.0), "lci"=c(1.3,1.2,0.2,2.0), "hci"=c(5.0,3.1,1.7,9.0))
rownames(testm2) <- c("BadFood","NoActivity","Genetics","Age")
testm2 #Factors associated with dying
Desired output:
Sick Dying
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 - - -
NoWater 4.0 0.9 9.0 - - -
Genetics - - - 0.9 0.2 1.7
Age - - - 7.0 2.0 9.0
Is there a simple way to do this that would work for all matrices?
Here is a base R method that keeps everything in matrix form:
# get rownames of new matrix
newNames <- union(rownames(testm1), rownames(testm2))
# construct new matrix
newMat <- matrix(NA, length(newNames), 2*ncol(testm2),
dimnames=list(c(newNames), rep(colnames(testm1), 2)))
# fill in new matrix
newMat[match(rownames(testm1), newNames), 1:ncol(testm1)] <- testm1
newMat[match(rownames(testm2), newNames), (ncol(testm1)+1):ncol(newMat)] <- testm2
In the final two lines, match is used to find the proper row indices by row name.
This returns
newMat
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 NA NA NA
NoWater 4.0 0.9 9.0 NA NA NA
Genetics NA NA NA 0.9 0.2 1.7
Age NA NA NA 7.0 2.0 9.0
I think this does what you are after though its not that pretty and requires the data to be a data.frame not a matrix. Hope it helps at least !
( Code was adapted from this question & answer https://stackoverflow.com/a/34530141/4651564 )
library(dplyr)
dat1 <- as.data.frame(testm1)
dat2 <- as.data.frame(testm2)
full_join( dat1 %>% mutate(Symbol = rownames(dat1) ),
dat2 %>% mutate(Symbol = rownames(dat2) ),
by = 'Symbol')
You can do it using merge() function.
First of all cast your test matrices into dataframes, then use merge on the dataframes, finally convert the result in a matrix (but do you necessarily need a matrix?).
Here's an example code:
testm1 <- as.data.frame(testm1)
testm2 <- as.data.frame(testm2)
result <- merge(testm1, testm2, by='row.names', all.x=T, all.y=T)
# all.x is needed if you want to save rows not matched in the merge process
result <- as.matrix(result)
If you want to obtain a data frame, simply omit the last line of code. Hope this helps.
I have a data frame like this:
GN SN
a 0.1
b 0.2
c 0.3
d 0.4
e 0.4
f 0.5
I would like the following output:
GN
a
0.1
b
0.2
c
0.3
Can anyone help me? How to "interleave" the elements of the second column to the elements of the first column to gain the desired output?
First let's create some data:
dd = data.frame(x = 1:10, y = LETTERS[1:10])
Next, we need to make sure the y column is a character and not a factor (otherwise, it will be converted to a numeric)
dd$y = as.character(dd$y)
Then we transpose the data frame and convert to a vector:
as.vector(t(dd))
However, a more pertinent question is why you would want to do this.