R index variable shrunk to number of unique groups - r

I have a data frame, dat, with 214 rows of data. Each row contains these variables: Species and Mode either red or green. I have sorted the data by Species. I would like to create a numeric index variable where if mode is red then index = 0 else index = 1.
Further, the index can only be as long as the unique number of species that exist (N=72), such that, if there are 5 of speciesA, red and 7 of speciesB, green that is a red species, then row 1 = 0 and row 2 = 1and so on. Here is the code I have tried so far:
index <- for (q in 1:unique(species)) {
ifelse(mode[q]=='red',0,1)
}

index <- as.numeric(factor(my_dataframe$mode))
A factor, under the hood, is stored as an integer. So the conversion from factor to numeric index is 1 to 1.

Related

R querying second dataframe by the range to which a numerical value in column belongs and returning the corresponding value

I have a dataframe column with numbers ranging from 0 - 50 (column A). I have another dataframe with two columns, one column shows a numerical range (column B) and the other shows a corresponding value (column C). In the first dataframe, I would like to add a column D that is the result of finding the range (in column B) to which the value in column A belongs and returning the corresponding value (column C).
A
1
50
B
C
0-10
Low
...
...
41-50
High
A
D
1
Low
50
High
If your categories are adjacent (no gaps between ranges), findInterval might do the trick (replace 2nd dataframe with named lookup vector as fit). Example:
values = c(12, 2, 42)
## define categories by lower bound:
categories = c(low=0, middle=10, high=40)
names(categories)[
findInterval(values, categories)
]

How to replace values in the columns of a dataframe based on the values in the other column in R?

I have a dataframe containing the safety data for 100 patients. There are different safety factors for each patient with the size of that specific factor.
v1_d0_urt_redness v1_d0_urt_redness_size v1_d1_urt_redness v1_d1_urt_redness_size ...
P1 1 20
P2 1 NA
P3 0 NA
.
.
.
Here redness=1 means there was redness and redness=0 means there was no redness, and therefore the redness_size was not reported.
In order to find what proportion of the data is missing I need to code the data as follows:
if (the column containing redness=1 & the column containing redness_size=NA) then (the column containing redness_size<-NA) else if (the column containing redness=0 then the column containing redness_size<-0) to have this coded for d0,d1,.. and to repeat this process for the other variables like hardness, swelling and etc. Any ideas how one could implement this in R?
If I understand well what you are trying to do and assuming your dataframe is called df, you can change values of the column redness_size by doing this:
df[df[,endsWith(colnames(df),"_redness")] == 1 & is.na(df[,endsWith(colnames(df),"redness_size")]),endsWith(colnames(df),"redness_size")] <- NA
df[df[,endsWith(colnames(df),"_redness")] == 1, endsWith(colnames(df),"redness_size")] <- 0

In R, how to group by value sign?

I have a data.frame with columns order, x, sign. I want to create groups by sign but keeping order columns. The column sign describes by 0 that x value is positive number, and by 1 that x value is a negative number or zero. The output that I want is kind of:
Group1: order = 0
Group2: order = 1 and 2
Group3: order = 3,4,5,6,7
Group4: order = 8,9,10,11
Group5: order = 12,13,14
Group6: order = 15
After that, I would like to calculate the mean of x values by my Group1,Group2....
Table image description here

create lists that contain the rownumbers for which column i contains the maximum value of that row

In a dataframe of 4 columns, I'm looking for an elegant way to get 3 lists that contain the names from column 1 if the maximum of that row in which that name is, is respectively in column 2, 3 or 4.
the first column contains parameter names,
column 2 a shapiro test outcome on the raw data of parameter x
column 3, shapiro test outcome of log10 transformed data for parameter x
column 4, shapiro test outcome of a custom transformation given by the user for parameter x
if this is the data:
Parameter xval xlog10val xcustomval
1 FWS.Range 0.62233371 0.9741614 0.9619065
2 FL.Red.Range 0.48195980 0.9855781 0.9643206
3 FL.Orange.Range 0.43338087 0.9727243 0.8239867
4 FL.Yellow.Range 0.53554943 0.9022795 0.9223407
5 FL.Red.Gradient 0.35194524 0.9905047 0.5718224
6 SWS.Range 0.46932823 0.9487955 0.9825318
7 SWS.Length 0.02927791 0.4565962 0.7309313
8 FWS.Fill.factor 0.93764311 0.8039806 0.0000000
9 FL.Red.Total 0.22437754 0.9655873 0.9923307
QUESTION: how to get a list that tells me all parameter names where xlog10val is the highest of the three columns (xval, xlog10val, xcuxtomval)
detailed explanation, ignore perhaps. ....
list 1, the rows where xval is the highest value, should be looking like this: 'FWS.Fill.factor' since that is the only row where xval has the highest score
list 2 is the list of all rows where xlog10val is the maximum value, and thus should contain the names of parameters where xlog10val is the maximum of that row:
'FWS.Range', 'FL.Red.Range', 'FL.Orange.Range',
'FL.Red.Gradient', 'FWS.Fill.factor'
and list 3 the rest of the names
I tried something like
df$Parameter[which(df$xval == max(df[ ,2:4]))]
but this gives integer(0) results.
EDIT
to clarify:
Lets start with looking at column 2 (xval).
PER row I need to test whether xval is the maximum of the 3 columns; xval, xlog10val, xcustomval
if this is the case, add the parameter in THAT row to the list of xval_is_the_max_of_3_columns list
Then we do the same PER row for xlog10val. IF xlog10val in row i is max of columns 2:4, add the name of that ROW to xlog10val_is_the_max_of_3_columns list.
To make the DF:
df <- data.frame(Parameter = c('FWS.Range', 'FL.Red.Range', 'FL.Orange.Range', 'FL.Yellow.Range', 'FL.Red.Gradient','SWS.Range','SWS.Length','FWS.Fill.factor','FL.Red.Total'),
xval = c(0.622333705577588,0.481959800402278,0.433380866119736,0.535549430820635,0.351945244290616,0.469328232931424,0.0292779051823701,0.93764311477813,0.224377540663707),
xlog10val = c( 0.974161367853916,0.985578135386898,0.97272429360688,0.902279501804112,0.990504657326703,0.94879549470406,0.45659620937997,0.803980592920426,0.965587334461157),
xcustomval = c(0.961906534164457,0.964320569400919,0.823986745004031,0.922340716468745,0.571822393107348,0.982531798077881,0.73093132928955,0,0.992330722386105))
We can use max.col to get the index of the maximum value per each row and with that we subset the 'Parameter'
i1 <- max.col(df[-1], 'first')
split(df$Parameter, i1)
EDIT: Based on the discussion with #Mark
I'm not sure exactly how you're selecting the parameters for list two and three, however, you can try something like this as well
df$Parameter <- as.character(df$Parameter)
par.xval.max <- df[which.max(df$xval), "Parameter"]
par.col3.gt.max <- df[df$xlog10val > max(df$xval), "Parameter"]
par.rem <- df$Parameter[! df$Parameter %in% c(par.xval.max, par.col3.gt.max)]
In this case, the values from column three are greater than the max(df$xval), and the remaining parameters are taken by negative selection using %in%

Representing experimental conditions/groups in R with one variable, when raw data uses 3

How can I transform values of three separate variables in R to create new values in a single, combined variable? I have experimental data with three conditions, 'negative', 'control', and 'pro'. The data in raw form gives information about who was in what condition (each participant/row could only be in each condition) by putting a '1' next to a variable named for that condition, then the value is missing if a participant was not in that condition. I would like to create a single variable called "Manip", with values of -1 (for those with the value of 1 in the negative condition), 0 (for those with a value of 1 in the control condition), and 1 (for those in the pro condition). Thank you!
Supposing that your data frame is named df
df$Manip[df$negative==1] <- -1
df$Manip[df$control==1] <- 0
df$Manip[df$positive==1] <- 1
Alternatively you could also make this a fancy factor, like so
df$Manip[df$negative==1] <- 'negative'
df$Manip[df$control==1] <- 'control'
df$Manip[df$positive==1] <- 'positive'
df$Manip <- as.factor(df$Manip,
levels=c(-1,0,1),
labels=('negative','control','positive'))

Resources