How to subtract a value from specific values in a column on R - r

So I am working on a data frame on a column that should say hours of sleep per night however using difftime() function has given values which show the number of hours sleep in negative values for some and the number of hours awake in positive values for others. I want to subtract 24 from just those who are above 0 (non negative numbers) so I have done:
data$Sleep.time <- with(data = data,
difftime(Bed.time, Waking.up.time, units = "hours"))
data$Sleep.time <- as.numeric(data$Sleep.time)
data$subtract <- (24)
data$Sleep.time <- if (data$Sleep.time>0) {data$Sleep.time - data$subtract}
So this just takes 24 away from all of the values so my values that are already negative are completely wrong. I'm not quite sure how to use the if function so this works properly any help would be great!

if is not vectorized i.e. it expects a logical expression with length 1. The 'Sleep.time' column will have more than one element. We may either use ifelse or create an index and use it to subtract and assign
i1 <- data$Sleep.time> 0
data$Sleep.time[i1] <- data$Sleep.time[i1] - data$subtract[i1]

You could try using ifelse
something like this
data$Sleep.time <- ifelse(data$Sleep.time > 0, data$Sleep.time - 24, data$Sleep.time)
Syntax: ifelse(condition, if true, else) returns a vector if the condition is applied on a vector.
Hope it helps and this is vectorized, so much faster than a loop.

Related

How to remove some values from a 4-dimensional matrix?

I'm working with a 4-dimensional matrix (Year, Simulation, Flow, Time instant: 10x5x20x10) in R. I need to remove some values from the matrix. For example, for year 1 I need to remove simulations number 1 and 2; for year 2 I need to remove simulation number 5.
Can anyone suggest me how I can make such changes?
Arrays (which is how R documentation usually refers to higher-dimensional 'matrices') can be indexed with negative values in the same way as matrices or vectors: a negative value removes the corresponding row/column/slice. So if you wanted to remove year 1 completely (for example), you could use a[-1,,,]; to remove simulation 5 completely, a[,-5,,].
However, arrays can't be "ragged", there has to be something in every row/column/slice combination. You could replace the values you want to remove with NAs (and then make sure to account for the NAs appropriately when computing, e.g. using na.rm = TRUE in sum()/min()/max()/median()/etc.): a[1,1:2,,] <- NA or a[2,5,,] <- NA in your examples.
If you knew that all values of Flow and Time would always be present, you could store your data as a list of lists of matrices: e.g.
results <- list(Year1 = list(Simulation1 = matrix(...),
Simulation2 = matrix(...),
...),
Year2 = list(Simulation1 = matrix(...),
Simulation2 = matrix(...),
...))
Then you could easily remove years or simulations within years (by setting them to NULL, but it would make indexing a little bit harder (e.g. "retrieve Simulation1 values for all years" would require an lapply or a loop across years).

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

Is it possible to create a countif like function in R using ranges?

I've already read this question with an approach to counting entries in R:
how to realize countifs function (excel) in R
I'm looking for a similar approach, except that I want to count data that is within a given range.
For example, let's say I have this dataset:
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
Following the approach on the linked question, we would develop something like this:
count <- data$values == 1.5
sum(count)
Problem is, I want to be able to include in the count anything that varies 0.2 from 1.5 - that is, all possible number from 1.3 to 1.7.
Is there a way to do so?
sum(data$values>=1.3 & data$values<=1.7)
As the explanation in the question you linked to points out, when you just write out a boolean condition, it generates a vector of TRUEs and FALSEs the same length as your original dataframe. TRUE equals 1 and FALSE equals 0, so summing across it gives you a count. So it simply becomes a matter of putting your condition as a boolean phrase. In the case of more than one condition, you connect them with & or | (or) -- much the same way that you could do in excel (only in excel you have to do AND() or OR()).
(For a more general solution, you can use dplyr::between - it's also supposed to be faster since it's implemented in C++. In this case, it would be sum(between(data$values,1.3,1.7).)
Like #doviod writes, you can use a compound logical condition.
My approach is different, I wrote a function that takes the vector and as range the center point value and the distance delta.
After a suggestion by #doviod, I have set a default value delta = 0, so that if only value is passed, the function returns
a count of cases where the values equal the value the user provides.
(doviod, in the comment)
countif <- function(x, value, delta = 0)
sum(value - delta <= x & x <= value + delta)
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
countif(data$values, 1.5, 0.2)
#[1] 3
which identifies the location of all values in your vector that satisfy your criterion, and length subsequently counts the 'hits'.
length( which(data$values>=1.3 & data$values<=1.7) )
[1] 3

Count duration of value in vector in R

I am trying to count the length of occurrances of a value in a vector such as
q <- c(1,1,1,1,1,1,4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,6,6,6,6,1,1,4,4,4)
Actual vectors are longer than this, and are time based. What I would like would be an output for 4 that tells me it occurred for 12 time steps (before the vector changes to 6) and then 3 time steps. (Not that it occurred 15 times total).
Currently my ideas to do this are pretty inefficient (a loop that looks element by element that I can have stop when it doesn't equal the value I specified). Can anyone recommend a more efficient method?
x <- with(rle(q), data.frame(values, lengths)) will pull the information that you want (courtesy of d.b. in the comments).
From the R Documentation: rle is used to "Compute the lengths and values of runs of equal values in a vector – or the reverse operation."
y <- x[x$values == 4, ] will subset the data frame to include only the value of interest (4). You can then see clearly that 4 ran for 12 times and then later for 3.
Modifying the code will let you check whatever value you want.

An elegant way to count number of negative elements in a vector?

I have a data vector with 1024 values and need to count the number of negative entries. Is there an elegant way to do this without looping and checking if an element is <0 and incrementing a counter?
You want to read 'An Introduction to R'. Your answer here is simply
sum( x < 0 )
which works thanks to vectorisation. The x < 0 expression returns a vector of booleans over which sum() can operate (by converting the booleans to standard 0/1 values).
There is a good answer to this question from Steve Lianoglou How to identify the rows in my dataframe with a negative value in any column?
Let me just replicate his code with one small addition (4th point).
Imagine you had a data.frame like this:
df <- data.frame(a = 1:10, b = c(1:3,-4, 5:10), c = c(-1, 2:10))
This will return you a boolean vector of which rows have negative values:
has.neg <- apply(df, 1, function(row) any(row < 0))
Here are the indexes for negative numbers:
which(has.neg)
Here is a count of elements with negative numbers:
length(which(has.neg))
The above solutions prescribed need to be tweaked in-order to apply this for a df.
The below command helps get the count of negative or any other symbolic logical relationship.
Suppose you have a dataframe:
df <- data.frame(x=c(2,5,-10,NA,7), y=c(81,-1001,-1,NA,-991))
In-order to get count of negative records in x:
nrow(df[df$x<0,])

Resources