conditional subseting with square brackets or inside square brackets - r

I have two vectors p1,p2 they report the same information except p2 is more precise. So I want to pick compare the 2 and pick the value from p2 except if the difference between the 2 vectors is > k. In that case I want the value from p1 to be picked in the final product "pd".
k <- 5
p1 <- c(21,43,62,88,119,156,264)
p2 <- c(19,42,62,84,104,156,262)
pd should look like:
pd <- c(19,42,62,84,119,156,262)
I have seen code that specified the selection condition inside the square brackets, but can't figure out how to duplicate it. Something similar to pd <- p2[p1, p1-p2 >5], but not exactly because this obviously doesn't evaluate. p2[p1-p2<5] works to select the positive cases but the 5th case where the condition evaluate to FALSE is skipped.

May be
ifelse(abs(p2-p1) <=k, p2, p1)
#[1] 19 42 62 84 119 156 262
Or without using ifelse
indx <- abs(p1-p2) >k
pd <- p2
pd[indx] <- p1[indx]
pd
#[1] 19 42 62 84 119 156 262

Related

drawing a value from a vector r

After removing the values from the vector from 1 to 100 I have the following vector:
w
[1] 2 5 13 23 24 39 41 47 48 51 52 58 61 62 70 71 72 90
I am now trying to draw values from this vector with the sample function
for(x in roznica)
{
if(licznik_2 != licznik_1 )
{
roznica_proces_2 <- sample(1:w, roznica)
} else {
roznica_proces_2 <- NA
}
}
I tried various combinations with the sample
If w is the name of the vector then you would NOT use sample(1:w, ...). For one thing 1:w doesn't really amke sense since the : operator expects its second argument to be a single number, while w is apparently on the order of 15 values. Depending on what roznica is (and hopefully it is a single integer) then you might use:
sample(w, roznica) # returns a vector of length roznica's value of randomly ordered values in `w`.
The other problem is that you are currently overwirign any values from prior iterations of the for loop. So you might want to use:
roznica_proces_2[roznica] <- sample(1:w, roznica)
You would of course need to have initialized roznica_proces_2, perhaps with:
roznica_proces_2 <- list()
Regarding your query in the comment :
I am only concerned with the sample function itself: I will show an example : w [1] 31 and now I want to draw 1 number from that in ( which is 31) proces_nr_2 <- sample(w, 1) What does he get? proces_nr_2 [1] 26
The reason that happens is because when a vector is of length 1 the sampling takes from 1 to that number. It is explained in the help page of ?sample.
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x
So if you have only 1 number to sample just return that number directly instead of passing it in sample.

Shortcut way to select a range of columns and to addition to a vector

I am doing a simple addition:
a <- data$a + data$b + data$c + data$d
However, I have a data set where there a 50 columns in the imported data and wondering if there is a short cut to select these, like:
data$a:data$z
And just add them up?
I know I can select a range by simple:
dataframe[11:60]
But then how to add them?
Edit:
A more concrete example
affect <- well_being_df$Affect1 + ... well_being_df$Affect50
affect
<labelled doubled>
[1] 21 23 43 8 10 ...
[38] 42 42 54 ...
[75] 23 14 42 23 ... etc
labels:
value label
0 Not at all
10 Completely
You can access your columns without "$" but still can use their labels :
rowSums(data[,c("a","b","c")]
If your columns are too much and u can't type "a b c d ... z", you can use ascii code of them with one loop :
vec <- rep(0,10)
for (i in 1:10)
{
vec[i]<- intToUtf8(64+i)
}
It provides you "A", "B", ... ,"J" ; now u can use rowSums(data[,vec])
About your last question in your comment, when u use "," in data[] it defines row's index before it and column's index after it, also in data[] you can use a logical values, because of it above codes running correct.

Extract data distributed equally from a dataframe - R

I have a data.frame (df) with different number of rows (numElement) and I wish to extract from it X elements (numExtract) distributed equally in the df and store them in the new dataframe (extractData). When I use the script below sometimes I get in the extractData different number of element (bigger by one from the numExtract). How can I fix it?
Script:
numElement<-400
df<-data.frame(seq(1:numElement))
numExtract<-5
extractData <- df[seq(1, nrow(df), by = round(nrow(df)/numExtract)),]
numElement<-400
df<-data.frame(seq(1:numElement))
numExtract<-7
extractData <- df[seq(1, nrow(df), by = round(nrow(df)/numExtract)),]
I cannot comment yet but round without extra arguments rounds the number to the nearest integer.
In your first case you want every 80th element and then in the second case every 57th element and that means you'll get the elements with indexes of 1 58 115 172 229 286 343 400 (total 8 indexes here).
Custom function
Use cut to obtain more intuitive intervals, and extract the breaks. It uses gsubfn:strapply for substring extraction
library(gsubfn)
myfun <- function(maxval, numbreaks) {
require(gsubfn)
x <- unique(cut(1:maxval, numbreaks-1))
A <- sapply(x, function(Z) round(as.numeric(strapply(as.character(Z), "^[(](\\S+)[,]", perl=TRUE))))
A <- c(A, maxval)
return(A)
}
Output
myfun(400, 5)
# 1 101 200 300 400
myfun(400, 7)
# 1 68 134 200 267 334 400

How to invert characters in a Vector in R language?

I'm a beginner, exploring R language. Was working on below data.frame and I would like to invert the Sex vector values. i.e 'F' -> 'M' and 'M' -> 'F'
sampleData
Age Height Sex
Alex 25 177 F
Lilly 31 163 F
Mark 23 190 M
Oliver 52 179 M
Martha 76 163 F
Lucas 49 183 M
Caroline 26 164 F
I tried three ways but couldn't hit the right approach.
Replaced F with M and vice-versa, but wouldn't affect the actual values in the Vector.
levels(Sex)[1] <- "F"
levels(Sex)[2] <- "M"
Tried below using 'mapvalues' function but still no changes.
library(plyr)
mapvalues(sampleData$Sex, from = c("F", "M"), to = c("M", "F"))
Converted Sex to a matrix and applied 'solve', but learnt it can be applied only on numeric matrix.
Sex <- as.matrix(Sex)
solve(sampleData$Sex)
Could someone please assist me on resolving the character inversion ?!
You can use simply the statement ifelse like this.
For simplicity I have created a similar data frame:
sampleData <- data.frame(Age = c(25,31,23,15), Height = c(177,163,190,163), Sex = c("M","F","F","M"))
And then you can use ifelse
sampleData$Sex <- ifelse(sampleData$Sex=="F","M","F")
You could also write a function to convert it:
convertChar <- function(vec){
if(length(unique(vec))!=2)
stop("Vector has more or less than 2 unique values")
newChar <- ifelse(vec == unique(vec)[1],unique(vec)[2],unique(vec)[1])
return(newChar)
}
convertChar(c("M","M","F","F"))

subset rows + context

I haven't been able to figure out an easy way to include some context ( n adjacent rows ) around the rows I want to select.
I am more or less trying to mirror the -C option of grep to select some rows of a data.frame.
Ex:
a= data.frame(seq(1:100))
b = c(50, 60, 61)
Let's say I want a context of 2 lines around the rows indexed in b; the desired output should be the data frame subset of a with the rows 48,49,50,51,52,58,59,60,61,62,63
You can do something like this, but there may be a more elegant way to compute the indices :
a= data.frame(seq(1:100))
b = c(50, 60, 61)
context <- 2
indices <- as.vector(sapply(b, function(v) {return ((v-context):(v+context))}))
a[indices,]
Which gives :
[1] 48 49 50 51 52 58 59 60 61 62 59 60 61 62 63
EDIT : As #flodel points out, if the indices may overlap you must add the following line :
indices <- sort(unique(indices))

Resources