Referencing Variable Row Number in R - r

I am new to R and would appreciate your help with the following question:
I have code that runs through all of the values (x) in a column of a dataset called m, comparing them one by one to a fixed value via a for loop. I'd like for x to be compared to my fixed value (0.17) ONLY IF the cell in m[(SAME ROW AS x), "reference_column_name"] contains a certain string.
The goal is to get at the end of m a column of values 0,1,2, or 3 based the comparison of x with a cell from the reference column with the same row number as x. Something like this:
new_column
0
2
2
3
1
1
2
0
3
How do I refer to the row of x (as my variable is changing as the for loop continues)?
With what can I replace "(SAME ROW AS x)"?
this is my code:
m$new_colum <- 0 #I start by assigning everything the value 0.
for (x in m$current_column) {
if ((grepl("string",((m[(SAME ROW AS x),"reference_column_name"])),fixed=TRUE))==TRUE){
if (is.na(x)){
m$new_column<-0
}
else if (x <= 0.17) {
m$new_column<-1}
else if (x > 0.17) {
m$new_column<-2}
}
else {m$new_column<-3}
}
I have changed all of the variable and column names to make reading this question easier - I am aware that names should be shorter.
Thanks for your help!

As per my understanding of your question, here is my solution:
m$new_column <- ifelse(grepl("string", m$ref_column), ifelse(is.na(m$x), 0, ifelse(m$x <= 0.17, 1, 2)), 3)
This code will first check for the string in the reference column at the same row. If it doesn't find it will equate to 3. If it finds it, it will go further into the 2nd ifelse block.
- In this block, it will first check for NA and assign a 0, else it goes into the 3rd ifelse block where it finally checks for your "x" column to have the value of 0.17 or less and assign 1 else 2.
Hope this helps

Could use a series of properly indexed assignments:
dat <- data.frame( x=runif(20), ref_col=sample( c("string", "not string"), 20, repl=TRUE) )
dat$new_col[dat$x > 0.17 & dat$ref_col=="string"] <- 2
dat$new_col[dat$x <= 0.17 & dat$ref_col=="string"] <- 1
dat$new_col[ is.na(dat$x)] <- 0
dat$new_col[ dat$ref_col != "string"] <- 3
dat
Didn't have any NA's in my x's but I predict they would have been properly assigned

Related

Fill the gaps, depending on the length of missing values and the last & previous known value in R

consider following dataset:
df<-data.frame(ID=c(1,2), Value_1=c(1,7), Value_2= c(NA,10), Value_3=c(NA,13), Value_4=c(7,NA))
What I would like to achieve is this:
df_target<-data.frame(ID=c(1,2), Value_1=c(1,7), Value_2= c(3,10), Value_3=c(5,13), Value_4=c(7,16))
As you can see here we have two diffrent issues:
In the first column I would like to do the following operation:
"(last_know + previous_know)/number_of_elements"and add this number to the last known value, proceed until you reach the last value:
i.e.
(1+7)/4=2 --> 1; 1+2; 1+2+2; 7
The secound one is to do lm() to predict the last value.
but how to combine this? Especially the first case is the most challenging part.
I guess it should be done with median(last_known, previous_known), and then somehow count the missing values, and map it to the na_count_id and than add to the multiplication of mean and the corresponding na_count_id:
previous_known_value + na_count_id*median
Thanks in advance for your help!
Here is a solution that works. This should work even if there is an NA in the first column, based on testing I did. Basically, I iterate over every row by column. The increaser variable is the amount by which the column must be increased over the previous column to get the pattern you are looking to achieve.
library(tidyverse)
df <- column_to_rownames(df, var = "ID") # need to convert ID column to rownames
for(i in 1:nrow(df)){
increaser <- as.numeric((range(df[i,], na.rm = TRUE)[2] - range(df[i,], na.rm = TRUE)[1])/(which.max(df[i,]) - which.min(df[i,]))) # increaser is calculated by taking the range of the row and dividing by the difference between the indices of the max and min of the row
for(j in 1:ncol(df)){ # this iterates through every column
if(is.na(df[i,j])){
if(j == 1){ # special calculation needed for first column since there's no previous column to increase by
df[i, j] <- df[i, min(which(!is.na(df[i,])))] - increaser*(min(which(!is.na(df[i,])))-j) # this finds the next non NA column for that row, and subtracts that next non-NA column from the difference in the index positions multiplied by the increaser
} else {
df[i, j] <- df[i, j-1] + increaser # this is for an NA position which is not in the first column
}
} else {
df[i, j] <- df[i, j] # if a position is not NA, no calculations needed
}
}
}
# this loop returns the following. You can convert the row ID back to a column if desired.
# Value_1 Value_2 Value_3 Value_4
#1 1 3 5 7
#2 7 10 13 16

How to drop a buffer of rows in a data frame around rows of a certain condition

I am trying to remove rows in a data frame that are within x rows after rows meeting a certain condition.
I have a data frame with a response variable, a measurement type that represents the condition, and time. Here's a mock data set:
data <- data.frame(rlnorm(45,0,1),
c(rep(1,15),rep(2,15),rep(1,15)),
seq(
from=as.POSIXct("2012-1-1 0:00", tz="EST"),
to=as.POSIXct("2012-1-1 0:44", tz="EST"),
by="min"))
names(data) <- c('Variable','Type','Time')
In this mock case, I want to delete the first 5 rows in condition 1 after condition 2 occurs.
The way I thought about solving this problem was to generate a separate vector that determines the distance that each observation that is a 1 is from the last 2. Here's the code I wrote:
dist = vector()
for(i in 1:nrow(data)) {
if(data$Type[i] != 1) dist[i] <- 0
else {
position = i
tempcount = 0
while(position > 0 && data$Type[position] == 1){
position = position - 1
tempcount = tempcount + 1
}
dist[i] = tempcount
}
}
This code will do the trick, but it's extremely inefficient. I was wondering if anyone had some cleverer, faster solutions.
If I understand you correctly, this should do the trick:
criteria1 = which(data$Type[2:nrow(data)] == 2 & data$Type[2:nrow(data)] != data$Type[1:nrow(data)-1]) +1
criteria2 = as.vector(sapply(criteria1,function(x) seq(x,x+5)))
data[-criteria2,]
How it works:
criteria1 contains indices where Type==2, but the previous row is not the same type. The strange lookign subsets like 2:nrow(data) are because we want to compare to the previous row, but for the first row there is no previous row. herefore we add +1 at then end.
criteria2 contains sequences starting with the number in criteria1, to those numbers+5
the third row performs the subset
This might need small modification, I wasn't exactly clear what criteria 1 and criteria 2 were from your code. Let me know if this works or you need any more advice!

How do i shift the values in a column of a data frame either up or down?

I'm trying to write some code that effectively shifts the values in the first column of a dataframe either up or down. The conditions for it moving up or down are as follows:
1) If the difference between the value directly below the selected element in the data frame 'playerlist' and the value of the selected element is less than OR equal to the difference between the value directly above the selected element in the data frame and the value of the selected element then the data in the first column shift up (i.e. the playerlist[1, 1] becomes playerlist[2, 1], playerlist[2, 1] becomes playlist[3, 1] etc.).
2) If the converse is true, (i.e. the difference between the value directly below the selected element in the data frame 'playerlist' and the value of the selected element is (only) more than the difference between the value directly above the selected element in the data frame and the value of the selected element) then the data shifts down (i.e. the playerlist[3, 1] becomes playerlist[2, 1], playerlist[2, 1] becomes playlist[1, 1] etc.).
3) If neither the above value or the below value of the selected element's value are less than the selected value, then nothing happens.
NB:
*number_of_players is an external input, in the below example it is running with value 7 (i.e. this means that playerlist contains 7 rows.
**Take x to be the row of the selected data (i.e. so the selected data is always playerlist[x, 1]).
dicear <- function(x){ #x is the player playing the card
y <- x-1
z <- x+1
if(x <- 1){
y <- number_of_players
}
if(x <- number_of_players){
z <- 1
}
if(playerlist[x, 1]>playerlist[z, 1] & (playerlist[x, 1]-playerlist[z, 1]) >= (playerlist[x, 1] - playerlist[y, 1])){
for(i in 1:nrow(playerlist)){
dummy <- i+1
if(i <- nrow(playerlist)){
dummy <- 1
}
else{
dummy <- i+1
}
playerlist[i, 1] <<- playerlist[dummy, 1]
}
}
else {
if(playerlist[x, 1]>playerlist[y, 1] & (playerlist[x, 1]-playerlist[y, 1]>(playerlist[x, 1]-playerlist[z, 1]))){
for(i in 1:nrow(playerlist)){
dummy <- i-1
if(i <- 1){
dummy <- nrow(playerlist)
}
else{
dummy <- i-1
}
playerlist[i, 1] <<- playerlist[dummy, 1]
}
}
}
}
To help clarify the question that you have, I am providing some guidelines to make this problem easier for me to approach. Shifting data around in a vector is simpler to consider than moving data in columns of a data frames. Data frames columns can be vectorized (saved as vectors). Your question asks to evaluate the differences between the previous value (i-1), and the following value (i+1) where i is the value being evaluated. As given, this excludes the first and final values. The first value has no previous value and the final value has no next value to perform the difference calculation. I will focus on a single position in a given vector.
Given the vector, z <- c(1, 2, 8, 4, 5) lets go through the procedure given by your guidelines. For simplicity, calculate the absolute value of the differences evaluating z[2].
> abs(z[1] - z[2])
[1] 1
> abs(z[2] - z[3])
[1] 6
The abs(z[2] - z[3]) > abs(z[1] -z[2]) and the value z[2] shifts down to modify the vector to be 'z <- c(1,8,2,4,5)`.
Repeating the procedure on z[2], which is now 8, gives the following result: z <- c(1,2,8,4,5) which is the original vector. So instead let's test z[3], which is 2: z <- c(1,2,8,4,5). Again back to the original vector. My thinking may be flawed. Please provide examples in the comments if I have made a mistake.
That considered, the following may be useful.
z <- c(1,2,8,4,5)
for(i in 1:5) print(c(z[i], z[-i]))
If all you want to do is shift a particular value around, the simple for loop given above will print() those sequences where i is the iteration variable from one through 5 (1:5). As i advances, the resulting vector shifts the place value at z[i] to the first position (i.e z[1] <- z[i]). None of the values are lost.
> for(i in 1:5) print(c(z[i], z[-i]))
[1] 1 2 8 4 5
[1] 2 1 8 4 5
[1] 8 1 2 4 5
[1] 4 1 2 8 5
[1] 5 1 2 8 4
You can also calculate the differences in the vector using diff().
> diff(z)
[1] 1 6 -4 1
> abs(diff(z))
[1] 1 6 4 1
The values calculated by diff() are essentially the same values you wish to evaluate where the first value given by diff() is the difference between z[1] and z[2]; the second is the difference between z[2] and z[3], and so on. You could perform comparisons using difference values from diff().
> diffs <- diff(z)
> diffs[1] > diffs[2] | diffs[2] <= diffs[3]
[1] FALSE
> diffs[2] > diffs[3] | diffs[3] <= diffs[4]
[1] TRUE
Keep in mind that z has 5 elements whereas diffs has only 4 elements.
You may provide some details about your questions such as an example of playerlist and output for what changes you propose to see in the data frame.
Another option that may or may not be helpful is the sort() function. When I first read your question I thought maybe you were trying to sequentially sort your data one at a time, for example to change a a player's ranking according to a turn of a game. You may arrange your data from smallest to largest using sort().
> sort(z)
[1] 1 2 4 5 8
There is an interesting principle here to explore.

Operations on elements of column vectors

I have a column vector containing 1's. I also have another numeric column containing numbers.
Example:
day_eq day
1 1
1 5
1 3
1 2
I now want to say:
If an element from day is smaller than its corresponding element in day_eq,
make invalid (a column vector element) = 5.
This is my code:
for (i in 1:nrow(setin)){
if (setin[[i,"day"]]<setin[[i,"day_eq"]]){
setin[[i,"valid"]] = 0
setin[[i,"invalid_code"]] = 5
}
}
It isn't working. It keeps saying:
Error in if (setin[[i, "day"]] < setin[[i, "day_eq"]]) { :
missing value where TRUE/FALSE needed
or
In if (test.ID1$day_eq > test.ID1$day) { :
the condition has length > 1 and only the first element will be used
Where test.ID1 is the set name.
You don't need a loop for that. I'm not sure exactly what you are doing... but ifelse should be able to help you...
setin$valid <- ifelse(setin$day < setin$day_eq, 0, NA)
setin$invalid_code <- ifelse(setin$day < setin$day_eq, 5, NA)
your data is
day_eq <- c(1,1,1,1)
day <- c (1,5,3,2)
setin <- data.frame(day_eq,day)
the solution using dplyr is
library(dplyr)
setin %>% mutate(invalid = ifelse (day < day_eq, 5, 0))
I used setin as set name, however, you also use test.ID1, so just replace it in case

Selectively replacing columns in R with their delta values

I've got data being read into a data frame R, by column. Some of the columns will increase in value; for those columns only, I want to replace each value (n) with its difference from the previous value in that column. For example, looking at an individual column, I want
c(1,2,5,7,8)
to be replaced by
c(1,3,2,1)
which are the differences between successive elements
However, it's getting really late in the day, and I think my brain has just stopped working. Here's my code at present
col1 <- c(1,2,3,4,NA,2,3,1) # This column rises and falls, so we want to ignore it
col2 <- c(1,2,3,5,NA,5,6,7) # Note: this column always rises in value, so we want to replace it with deltas
col3 <- c(5,4,6,7,NA,9,3,5) # This column rises and falls, so we want to ignore it
d <- cbind(col1, col2, col3)
d
fix_data <- function(data) {
# Iterate through each column...
for (column in data[,1:dim(data)[2]]) {
lastvalue <- 0
# Now walk through each value in the column,
# checking to see if the column consistently rises in value
for (value in column) {
if (is.na(value) == FALSE) { # Need to ignore NAs
if (value >= lastvalue) {
alwaysIncrementing <- TRUE
} else {
alwaysIncrementing <- FALSE
break
}
}
}
if (alwaysIncrementing) {
print(paste("Column", column, "always increments"))
}
# If a column is always incrementing, alwaysIncrementing will now be TRUE
# In this case, I want to replace each element in the column with the delta between successive
# elements. The size of the column shrinks by 1 in doing this, so just prepend a copy of
# the 1st element to the start of the list to ensure the column length remains the same
if (alwaysIncrementing) {
print(paste("This is an incrementing column:", colnames(column)))
column <- c(column[1], diff(column, lag=1))
}
}
data
}
fix_data(d)
d
If you copy/paste this code into RGui, you'll see that it doesn't do anything to the supplied data frame.
Besides losing my mind, what am I doing wrong??
Thanks in advance
Without addressing the code in any detail, you're assigning values to column, which is a local variable within the loop (i.e. there is no relationship between column and data in that context). You need to assign those values to the appropriate value in data.
Also, data will be local to your function, so you need to assign that back to data after running the function.
Incidentally, you can use diff to see if any value is incrementing rather than looping over every value:
idx <- apply(d, 2, function(x) !any(diff(x[!is.na(x)]) < 0))
d[,idx] <- blah
diff calculates the difference between consecutive values in a vector. You can apply it to each column in a dataframe using, e.g.
dfr <- data.frame(x = c(1,2,5,7,8), y = (1:5)^2)
as.data.frame(lapply(dfr, diff))
x y
1 1 3
2 3 5
3 2 7
4 1 9
EDIT: I just noticed a few more things. You are using a matrix, not a data frame (as you stated in the question). For your matrix 'd', you can use
d_diff <- apply(d, 2, diff)
#Find columns that are (strictly) increasing
incr <- apply(d_diff, 2, function(x) all(x > 0, na.rm=TRUE))
#Replace values in the approriate columns
d[2:nrow(d),incr] <- d_diff[,incr]

Resources