Changing an (empty) numeric value to 0 [duplicate] - r

I have a vector 'y' and I count the different values using table:
y <- c(0, 0, 1, 3, 4, 4)
table(y)
# y
# 0 1 3 4
# 2 1 1 2
However, I also want the result to include the fact that there are zero 2's and zero 5's. Can I use table() for this?
Desired result:
# y
# 0 1 2 3 4 5
# 2 1 0 1 2 0

Convert your variable to a factor, and set the categories you wish to include in the result using levels. Values with a count of zero will then also appear in the result:
y <- c(0, 0, 1, 3, 4, 4)
table(factor(y, levels = 0:5))
# 0 1 2 3 4 5
# 2 1 0 1 2 0

Related

Classify column values into new column with specific output

I want to classify the rows of a data frame based on a threshold applied to a given numeric reference column. If the reference column has a value below the threshold, then the result is 0, which I want to add to a new column. If the reference column value is over the threshold, then the new column will have value 1 in all consecutive rows with value over the threshold until a new 0 result comes up. If a new reference value is over the threshold then the value to add is 2, and so on.
If we set up the threshold > 2 then an example of what I would like to obtain is:
row
reference
result
1
2
0
2
1
0
3
4
1
4
3
1
5
1
0
6
6
2
7
8
2
8
4
2
9
1
0
10
3
3
11
6
3
row <- c(1:11)
reference <- c(2,1,4,3,1,6,8,4,1,3,6)
result <- c(0,0,1,1,0,2,2,2,0,3,3)
table <- cbind(row, reference, result)
Thank you!
We can use run-length encoding (rle) for this.
The below assumes a data.frame:
r <- rle(quux$reference <= 2)
r$values <- ifelse(r$values, 0, cumsum(r$values))
quux$result2 <- inverse.rle(r)
quux
# row reference result result2
# 1 1 2 0 0
# 2 2 1 0 0
# 3 3 4 1 1
# 4 4 3 1 1
# 5 5 1 0 0
# 6 6 6 2 2
# 7 7 8 2 2
# 8 8 4 2 2
# 9 9 1 0 0
# 10 10 3 3 3
# 11 11 6 3 3
Data
quux <- structure(list(row = 1:11, reference = c(2, 1, 4, 3, 1, 6, 8, 4, 1, 3, 6), result = c(0, 0, 1, 1, 0, 2, 2, 2, 0, 3, 3)), row.names = c(NA, -11L), class = "data.frame")
As noted in the comments by #Sotos, would consider alternative name for your object.
Since it wasn't clear if data.frame or matrix, assume we have a data.frame df based on your data:
df <- as.data.frame(table)
And have a threshold of 2:
threshold = 2
You can adapt this solution by #flodel:
df$new_result = ifelse(
x <- reference > threshold,
cumsum(c(x[1], diff(x) == 1)),
0)
df
In this case, the diff(x) will include a vector, where values of 1 indicate where result should be increased by cumsum (in the sample data, this occurs in rows 3, 6, and 10). These are transitions from FALSE to TRUE (0 to 1), where reference goes from below to above threshold. Note that x[1] is added/combined since the diff values will be 1 element shorter in length.
Using the ifelse, these new incremental values only apply to those where reference exceeds threshold, otherwise set at 0.
Output
row reference result new_result
1 1 2 0 0
2 2 1 0 0
3 3 4 1 1
4 4 3 1 1
5 5 1 0 0
6 6 6 2 2
7 7 8 2 2
8 8 4 2 2
9 9 1 0 0
10 10 3 3 3
11 11 6 3 3

Replace the same values in the consecutive rows and stop replacing once the value has changed in R

I want to find a way to replace consecutive same values into 0 at the beginning of each trial, but once the value has changed it should stop replacing and keep the value. It should occur every trials per subject.
For example, first subject has multiple trials (1, 2, etc). At the beginning of each trial, there may be some consecutive rows with the same value (e.g., 1, 1, 1). For these values, I would like to replace them to 0. However, once the value has changed from 1 to 0, I want to keep the values in the rest of the trial (e.g., 0, 0, 1).
subject <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
trial <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
value <- c(1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1)
df <- data.frame(subject, trial, value)
Thus, from the original data frame, I would like to have a new variable (value_new) like below.
subject trial value value_new
1 1 1 1 0
2 1 1 1 0
3 1 1 1 0
4 1 1 0 0
5 1 1 0 0
6 1 1 1 1
7 1 2 1 0
8 1 2 1 0
9 1 2 0 0
10 1 2 1 1
11 1 2 1 1
12 1 2 1 1
I was thinking to use tidyr and group_by(subject, trial) and mutate a new variable using conditional statement, but no idea how to do that. I guess I need to use rle(), but again, have no clue of how to replace the consecutive values into 0, and stop replacing once the value has changed and keep the rest of the values.
Any suggestions or advice would be really appreciated!
You can use rleid from data.table :
library(data.table)
setDT(df)[, new_value := value * +(rleid(value) > 1), .(subject, trial)]
df
# subject trial value new_value
# 1: 1 1 1 0
# 2: 1 1 1 0
# 3: 1 1 1 0
# 4: 1 1 0 0
# 5: 1 1 0 0
# 6: 1 1 1 1
# 7: 1 2 1 0
# 8: 1 2 1 0
# 9: 1 2 0 0
#10: 1 2 1 1
#11: 1 2 1 1
#12: 1 2 1 1
You can also do this with dplyr :
library(dplyr)
df %>%
group_by(subject, trial) %>%
mutate(new_value = value * +(rleid(value) > 1))

A condition to all variable in r

I want to make a table consisting of 0 and 1.
If a variable is larger than 0, it will be 1 otherwise 0.
As the dataset has over 1,000 columns, I should use the 'sapply?' function on this question.
how do I make the code?
We can specify the condition and replace the value for a data frame. No "apply" family function is needed.
# Create an example data frame
dt <- data.frame(A = c(0, 1, 2, 3, 4),
B = c(4, 6, 8, 0, 7),
C = c(0, 0, 5, 5, 2))
# View dt
dt
# A B C
# 1 0 4 0
# 2 1 6 0
# 3 2 8 5
# 4 3 0 5
# 5 4 7 2
# Replace values larger than 0 to be 1
dt[dt > 0] <- 1
# View dt again
dt
# A B C
# 1 0 1 0
# 2 1 1 0
# 3 1 1 1
# 4 1 0 1
# 5 1 1 1

Counting all values in number range for each column with DPLYR

I have the following sample data frame:
df <- data.frame("Alpha" = c(NA, NA, 6, 5, 4, 6, 5, 3), "Beta" = c(3, 3, 4, 2, 6, NA, NA, NA), "Gamma" =c(6, 2, 3, 1, NA, NA, 5, 4))
From this data, I would like to get a count of all values between 0 and 6 for each column. The data frame does not contain all values between 0 and 6, so the final output would look something like this:
result <- data.frame("value"=c(0, 1, 2, 3, 4, 5, 6),
"Alpha"=c(0, 0, 0, 1, 1, 2, 2),
"Beta"=c(0, 0, 1, 2, 1, 0, 1),
"Gamma"=c(0, 1, 1, 1, 1, 1, 1))
value Alpha Beta Gamma
0 0 0 0
1 0 0 1
2 0 1 1
3 1 2 1
4 1 1 1
5 2 0 1
6 2 1 1
My first inclination was to reiterate the distinct() function in dplyr. I was thinking of using something like this:
df.alpha <- df %>% distinct(Alpha)
df.beta <- df %>% distinct(Beta)
df.gamma <- df %>% distinct(Gamma)
Afterward, I would bind them together. However, I encounter three issues:
There's a lot of copy and pasting here (there are more columns in my real data frame)
The results do not have the same length, which makes binding difficult; and
"0" is not a value in the original table, so it does not get counted in the results.
I found a similar question in this stackoverflow post on counting a specific value in multiple columns at once. However, unlike that post, the issue I have here is that there is no variable to "group by".
Do folks have any suggestions on how I can produce a count of values between a range of integers for all columns? Thanks so much!
Maybe something like this:
> df[] <- lapply(df,function(x) factor(x,levels = 0:6))
> data.frame(lapply(df,tabulate))
Alpha Beta Gamma
1 0 0 0
2 0 0 1
3 0 1 1
4 1 2 1
5 1 1 1
6 2 0 1
7 2 1 1
A one-liner similar to joran's answer is
cbind.data.frame(values=0:6, sapply(df, function(x) table(factor(x, levels=0:6))))
this returns
values Alpha Beta Gamma
0 0 0 0 0
1 1 0 0 1
2 2 0 1 1
3 3 1 2 1
4 4 1 1 1
5 5 2 0 1
6 6 2 1 1
Replacing table with the tabulate function should speed up the result and also simplify the output.
Another idea with tidyverse:
library(dplyr)
library(purrr)
df %>%
mutate_all(factor, levels = 0:6) %>%
map_dfc(~ c(table(.))) %>%
cbind(values = 0:6, .)
Result:
values Alpha Beta Gamma
1 0 0 0 0
2 1 0 0 1
3 2 0 1 1
4 3 1 2 1
5 4 1 1 1
6 5 2 0 1
7 6 2 1 1

counting number of same values and printing them in R

I have a vector with repeated numbers. I want to count the number of repeated numbers and print the output.
This is my input:
deg <- c(2, 1, 4, 3, 2, 4, 2, 5, 2, 2, 1, 2)
df <- data.frame(table(deg))
This is my output:
deg Freq
1 1 2
2 2 6
3 3 1
4 4 2
5 5 1
Here in my output I want to print the data frame from 0 to 5, where 0 is the starting element and 5 is the max element in the vector. The output I want to get is:
deg Freq
1 0 0
2 1 2
3 2 6
4 3 1
5 4 2
6 5 1
Someone please help with this!!!
If we're starting from df we can just unpack the data, add zero as a factor level, then re-tabulate:
f <- with(df, factor(rep(deg, Freq), levels = union(0, levels(deg))))
as.data.frame(table(deg = f))
# deg Freq
# 1 0 0
# 2 1 2
# 3 2 6
# 4 3 1
# 5 4 2
# 6 5 1
If we're starting with the vector deg, it's easier. We can just add zero as a factor level then tabulate:
f <- factor(deg, levels = union(0, sort(unique(deg))))
as.data.frame(table(deg = f))
# deg Freq
# 1 0 0
# 2 1 2
# 3 2 6
# 4 3 1
# 5 4 2
# 6 5 1
Try this:
df <- data.frame(deg=seq(0,max(deg)),
Freq=sapply(seq(0,max(deg)),function(x) length(which(deg==x))))
Output:
deg Freq
1 0 0
2 1 2
3 2 6
4 3 1
5 4 2
6 5 1
You can add a row to df:
#convert deg from factor back to numeric
df$deg = as.numeric(as.character(df$deg))
# add 0 deg with 0 freq if it doesn't exist already in df
if (!any(df$deg == 0)) {
df = rbind(df, c(0,0))
# sort df by deg
df = df[order(df$deg),]
}
Try this
rbind(data.frame(deg=0, Freq=0)[!(c(0) %in% deg)], as.data.frame(table(deg)))
# deg Freq
# 1 0 0
# 2 1 2
# 3 2 6
# 4 3 1
# 5 4 2
# 6 5 1
The expand_df function below can help you get the desired output
deg = c(2, 1, 4, 3, 2, 4, 2, 5, 2, 2, 1, 2)
df = as.data.frame(table(deg))
expand_df = function(df){
upd_list = 0: max(as.numeric(as.character(df[,1])))
upd_df = as.data.frame(upd_list)
merged_df = merge(upd_df, df,all.x=TRUE,by.x=colnames(upd_df)[1], by.y=colnames(df)[1])
merged_df[,2] = ifelse(is.na(merged_df[,2]),0,merged_df[,2])
merged_df
}
expand_df(df)

Resources