How to invert characters in a Vector in R language? - r

I'm a beginner, exploring R language. Was working on below data.frame and I would like to invert the Sex vector values. i.e 'F' -> 'M' and 'M' -> 'F'
sampleData
Age Height Sex
Alex 25 177 F
Lilly 31 163 F
Mark 23 190 M
Oliver 52 179 M
Martha 76 163 F
Lucas 49 183 M
Caroline 26 164 F
I tried three ways but couldn't hit the right approach.
Replaced F with M and vice-versa, but wouldn't affect the actual values in the Vector.
levels(Sex)[1] <- "F"
levels(Sex)[2] <- "M"
Tried below using 'mapvalues' function but still no changes.
library(plyr)
mapvalues(sampleData$Sex, from = c("F", "M"), to = c("M", "F"))
Converted Sex to a matrix and applied 'solve', but learnt it can be applied only on numeric matrix.
Sex <- as.matrix(Sex)
solve(sampleData$Sex)
Could someone please assist me on resolving the character inversion ?!

You can use simply the statement ifelse like this.
For simplicity I have created a similar data frame:
sampleData <- data.frame(Age = c(25,31,23,15), Height = c(177,163,190,163), Sex = c("M","F","F","M"))
And then you can use ifelse
sampleData$Sex <- ifelse(sampleData$Sex=="F","M","F")

You could also write a function to convert it:
convertChar <- function(vec){
if(length(unique(vec))!=2)
stop("Vector has more or less than 2 unique values")
newChar <- ifelse(vec == unique(vec)[1],unique(vec)[2],unique(vec)[1])
return(newChar)
}
convertChar(c("M","M","F","F"))

Related

Shortcut way to select a range of columns and to addition to a vector

I am doing a simple addition:
a <- data$a + data$b + data$c + data$d
However, I have a data set where there a 50 columns in the imported data and wondering if there is a short cut to select these, like:
data$a:data$z
And just add them up?
I know I can select a range by simple:
dataframe[11:60]
But then how to add them?
Edit:
A more concrete example
affect <- well_being_df$Affect1 + ... well_being_df$Affect50
affect
<labelled doubled>
[1] 21 23 43 8 10 ...
[38] 42 42 54 ...
[75] 23 14 42 23 ... etc
labels:
value label
0 Not at all
10 Completely
You can access your columns without "$" but still can use their labels :
rowSums(data[,c("a","b","c")]
If your columns are too much and u can't type "a b c d ... z", you can use ascii code of them with one loop :
vec <- rep(0,10)
for (i in 1:10)
{
vec[i]<- intToUtf8(64+i)
}
It provides you "A", "B", ... ,"J" ; now u can use rowSums(data[,vec])
About your last question in your comment, when u use "," in data[] it defines row's index before it and column's index after it, also in data[] you can use a logical values, because of it above codes running correct.

see if there is a match in the characters in one column and perform a function on its rows in R

So I am trying to find if there is any match with characters in one column of a dataframe and if it matches - I will be performing a function on its rows that has numerical values.
How can i achieve it in R
eg. dataframe
A B C D
Hyper thread 760 85.49 889
Antihypertensive_drug 624 70.19 889
Strom practise 139 15.64 889
Antihypertensive_drug 44.8 67 29
if there is match of character "antihypertensive" then
I will running a function on col. C and D ----- (sum(df$C,df$D))
I tried using
the approximate match from plotmath
a= which(df$A %~~% "antihypertensive") and then use that as an index to run the sum function... but no luck any suggestions please
Try
indx <- grepl('antihypertensive', df$A, ignore.case=TRUE)
with(df, ifelse(indx, C+D,NA))
Your desired output is unclear, if you only want sum per row, you can use rowSums or Reduce
indx <- grep('antihypertensive', df$A, ignore.case = TRUE) # borrowed this from #akrun
rowSums(df[indx, c("C", "D")])
# 2 4
# 959.19 96.00
Or
Reduce(`+`, df[indx, c("C", "D")])
# [1] 959.19 96.00
Or you can sum the whole thing
sum(df[indx, c("C", "D")])
# [1] 1055.19
You can use grep to index rows.
whichRows <- grep("antihypertensive", dataframe[,1], ignore.case=TRUE)

conditional subseting with square brackets or inside square brackets

I have two vectors p1,p2 they report the same information except p2 is more precise. So I want to pick compare the 2 and pick the value from p2 except if the difference between the 2 vectors is > k. In that case I want the value from p1 to be picked in the final product "pd".
k <- 5
p1 <- c(21,43,62,88,119,156,264)
p2 <- c(19,42,62,84,104,156,262)
pd should look like:
pd <- c(19,42,62,84,119,156,262)
I have seen code that specified the selection condition inside the square brackets, but can't figure out how to duplicate it. Something similar to pd <- p2[p1, p1-p2 >5], but not exactly because this obviously doesn't evaluate. p2[p1-p2<5] works to select the positive cases but the 5th case where the condition evaluate to FALSE is skipped.
May be
ifelse(abs(p2-p1) <=k, p2, p1)
#[1] 19 42 62 84 119 156 262
Or without using ifelse
indx <- abs(p1-p2) >k
pd <- p2
pd[indx] <- p1[indx]
pd
#[1] 19 42 62 84 119 156 262

Changing multiple data frame column titles in r

The program that I am running creates three data frames using the following code:
datuniqueNDC <- data.frame(lapply(datlist, function(x) length(unique(x$NDC))))
datuniquePID <- data.frame(lapply(datlist, function(x) length(unique(x$PAYERID)))
datlengthNDC <- data.frame(lapply(datlist, function(x) length(x$NDC)))
They have outputs that look like this:
X182L X178L X76L
1 182 178 76
X34L X31L X7L
1 34 31 7
X10674L X10021L X653L
1 10674 10021 653
What I am trying to do is combine the rows together into one data frame with the desired outcome being:
X Y Z
1 182 178 76
2 34 31 7
3 10674 10021 653
but the rbind command doesn't work due to the names of all the columns being different. I can get it to work by using the colnames command after creating each variable above, but it seems like there should be a more efficient way to accomplish this by using one of the apply commands or something similar. Thanks for the help.
one way, since evreything seems to be a numeric, would be this:
mylist <- list(dat1,dat2,dat3)
# assuming your three data.frames are dat1:dat3 respectively
do.call("rbind",lapply(mylist, as.matrix))
# X182L X178L X76L
#[1,] 182 178 76
#[2,] 34 31 7
#[3,] 10674 10021 653
basically this works because your data are matrices not dataframes, then you only need to change names once at the end.
Since the functions you use in you lapply calls are scalars, it would be easier if you use sapply. sapply returns vectors which you can rbind
datuniqueNDC <- sapply(datlist, function(x) length(unique(x$NDC)))
datuniquePID <- sapply(datlist, function(x) length(unique(x$PAYERID))
datlengthNDC <- sapply(datlist, function(x) length(x$NDC))
dat <- as.data.frame(rbind(datuniqueNDC,datuniquePID,datlengthNDC))
names(dat) <- c("x", "y", "z")
Another solution is to calculate all three of your statistics in one function:
dat <- as.data.frame(sapply(datlist, function(x) {
c(length(unique(x$NDC)), length(unique(x$PAYERID), length(x$NDC))
}))
names(dat) <- c("x", "y", "z")

subset rows + context

I haven't been able to figure out an easy way to include some context ( n adjacent rows ) around the rows I want to select.
I am more or less trying to mirror the -C option of grep to select some rows of a data.frame.
Ex:
a= data.frame(seq(1:100))
b = c(50, 60, 61)
Let's say I want a context of 2 lines around the rows indexed in b; the desired output should be the data frame subset of a with the rows 48,49,50,51,52,58,59,60,61,62,63
You can do something like this, but there may be a more elegant way to compute the indices :
a= data.frame(seq(1:100))
b = c(50, 60, 61)
context <- 2
indices <- as.vector(sapply(b, function(v) {return ((v-context):(v+context))}))
a[indices,]
Which gives :
[1] 48 49 50 51 52 58 59 60 61 62 59 60 61 62 63
EDIT : As #flodel points out, if the indices may overlap you must add the following line :
indices <- sort(unique(indices))

Resources