I pulled a data.frame from the internet and need to shift completely, (5 of 168) specific rows to the left one column. I thought best to append a column to the front of the data.frame and move the rows over but am unsuccessful. For example, I need something like this:
a b c d e >>> a b c d e
0 1 2 3 4 0 1 2 3 4
0 0 1 2 3 0 1 2 3 NA
0 1 2 3 4 0 1 2 3 4
If you know which rows you want to shift, you can replace the first value(s) of these rows with NA, and then use hacksaw::shift_row_values.
library(hacksaw)
data[2, "a"] <- NA
data %>%
shift_row_values(at = 2)
a b c d e
1 0 1 2 3 4
2 0 1 2 3 NA
3 0 1 2 3 4
data
data <- read.table(header = T, text = "
a b c d e
0 1 2 3 4
0 0 1 2 3
0 1 2 3 4 ")
Another possible solution, based on base R:
rows <- 2:3
df[rows,] <- cbind(df[rows, -1], NA)
df
#> a b c d e
#> 1 0 1 2 3 4
#> 2 0 1 2 3 NA
#> 3 1 2 3 4 NA
You can replace a row with an offset part plus NA like this:
dat[2,] <- c(dat[2, 2:5], NA)
Data:
dat <- read.table(text="
a b c d e
0 1 2 3 4
0 0 1 2 3
0 1 2 3 4",
header=TRUE)
Related
I have a dataset from which I made a reproducible example:
set.seed(1)
Data <- data.frame(
A = sample(0:5),
B = sample(0:5),
C = sample(0:5),
D = sample(0:5),
corr_A.B = sample(0:5),
corr_A.C = sample(0:5),
corr_A.D = sample(0:5))
> Data
A B C D corr_A.B corr_A.C corr_A.D
1 1 5 4 2 1 2 4
2 5 3 1 3 5 5 0
3 2 2 3 4 0 1 2
4 3 0 5 0 4 0 1
5 0 4 2 1 2 3 3
6 4 1 0 5 3 4 5
And I would like to check, for each column B, C and D, if one of their cell is equal to 0, I would like to replace, on the same row, the corresponding corr_A column with NA. For instance, since Data$B[4] is equal to 0, I would like Data$corr_A.B[4] to be replaced by NA.
I look to obtain the following result:
> Data
A B C D corr_A.B corr_A.C corr_A.D
1 1 5 4 2 1 2 4
2 5 3 1 3 5 5 0
3 2 2 3 4 0 1 2
4 3 0 5 0 NA 0 NA
5 0 4 2 1 2 3 3
6 4 1 0 5 3 NA 5
I have tried different ways, using for loops, but I am struggling a lot. Also, in the dataset I am working on, there are many other columns that do not need to be checked for that condition, I would like to be able to specifically designated in which columns I am looking for 0 values.
If someone would be kind enough to give it a try? Many thanks
A one-liner using function is.na<-.
is.na(Data[5:7]) <- Data[2:4] == 0
Data
# A B C D corr_A.B corr_A.C corr_A.D
#1 1 5 4 2 1 2 4
#2 5 3 1 3 5 5 0
#3 2 2 3 4 0 1 2
#4 3 0 5 0 NA 0 NA
#5 0 4 2 1 2 3 3
#6 4 1 0 5 3 NA 5
For a base R solution, we can just use ifelse here:
Data$corr_A.B <- ifelse(Data$B == 0, NA, Data$corr_A.B)
Data$corr_A.C <- ifelse(Data$C == 0, NA, Data$corr_A.C)
Data$corr_A.D <- ifelse(Data$D == 0, NA, Data$corr_A.D)
df<- data.frame(A=c(1,5,2,3,0,4),
B=c(5,3,2,0,4,1),
C=c(4,1,3,5,2,0),
D=c(2,3,4,0,1,5),
corr_A.B=c(1,5,0,4,2,3),
corr_A.C=c(2,5,1,0,3,4),
corr_A.D=c(4,0,2,1,3,5))
df %>% mutate(corr_A.B=case_when(B==0 ~ NA_real_,
TRUE~ corr_A.B),
corr_A.C=case_when(C==0 ~NA_real_,
TRUE ~ corr_A.C),
corr_A.D=case_when(D==0 ~ NA_real_,
TRUE ~ corr_A.D))
A B C D corr_A.B corr_A.C corr_A.D
1 1 5 4 2 1 2 4
2 5 3 1 3 5 5 0
3 2 2 3 4 0 1 2
4 3 0 5 0 NA 0 NA
5 0 4 2 1 2 3 3
6 4 1 0 5 3 NA 5
A base, one-liner, vectorized, but convoluted solution:
Data[t(t(which(Data[,2:4]==0,arr.ind=TRUE))+c(0,4))]<-NA
Using apply(). You could do:
cbind(Data,apply(Data[c("B","C","D")],2,function(x){
ifelse(x==0,NA,x)
}))
I need to create multiple (several 1000) resampled datasets from a large database. I have three categorical variables. Site (S), Transect(T), Quadrat(Q). The response variable is Value (V), which is the result of the particular S, T, & Q combination. Quads along each transect at each site. I pasted an abbreviated dataset below.
S T Q V
A 1 1 8
A 1 2 5
A 1 3 0
A 2 1 0
A 2 2 15
A 2 3 0
A 3 1 0
A 3 2 25
A 3 3 0
B 1 1 0
B 1 2 1
B 1 3 0
B 2 1 33
B 2 2 1
B 2 3 2
B 3 1 0
B 3 2 207
B 3 3 0
C 1 1 0
C 1 2 1
C 1 3 0
C 2 1 45
C 2 2 33
C 2 3 0
C 3 1 0
C 3 2 1
C 3 3 0
The idea would be that for a given site, the resampled dataset would contain ## of quads from transect 1 to n, where ## would be the number of quadrats(Q) per transect (T) per site (S). I am not trying to resample the dataset based on S, T, & Q. I would like to be able to resample a user-defined number of rows, based on the conditions I define. For example, if I chose to resample using based on 2 quadrats(Q) per transect (T) per site(S), I envision the resampled dataset looking like the below example.
S T Q V
A 1 1 8
A 1 3 0
A 2 1 0
A 2 2 15
A 3 2 25
A 3 3 0
B 1 2 1
B 1 3 0
B 2 2 1
B 2 3 2
B 3 1 0
B 3 2 207
C 1 1 0
C 1 3 0
C 2 1 45
C 2 3 0
C 3 2 1
C 3 3 0
Please let me know if that doesn't make sense and I'll revise until it does. Thanks for any assistance!
Consider by to slice dataframes by Site and Transect factors and then sample random rows:
set.seed(444)
quads <- 2
# BUILD LIST OF SUBSETTED RANDOM SAMPLED DATAFRAMES
df_list <- by(df, df[c("S", "T")], FUN=function(df) df[sample(nrow(df), quads),])
# STACK ALL DATAFRAMES INTO ONE FINAL DF
sample_df <- do.call(rbind, df_list)
# SORT DATAFRAME BY S AND T
sample_df <- with(sample_df, sample_df[order(S, T),])
# RESET ROW NAMES
row.names(sample_df) <- NULL
sample_df
# S T Q V
# 1 A 1 1 8
# 2 A 1 3 0
# 3 A 2 2 15
# 4 A 2 1 0
# 5 A 3 1 0
# 6 A 3 3 0
# 7 B 1 2 1
# 8 B 1 1 0
# 9 B 2 3 2
# 10 B 2 1 33
# 11 B 3 1 0
# 12 B 3 2 207
# 13 C 1 1 0
# 14 C 1 2 1
# 15 C 2 1 45
# 16 C 2 3 0
# 17 C 3 3 0
# 18 C 3 2 1
Data
txt = '
S T Q V
A 1 1 8
A 1 2 5
A 1 3 0
A 2 1 0
A 2 2 15
A 2 3 0
A 3 1 0
A 3 2 25
A 3 3 0
B 1 1 0
B 1 2 1
B 1 3 0
B 2 1 33
B 2 2 1
B 2 3 2
B 3 1 0
B 3 2 207
B 3 3 0
C 1 1 0
C 1 2 1
C 1 3 0
C 2 1 45
C 2 2 33
C 2 3 0
C 3 1 0
C 3 2 1
C 3 3 0'
df = read.table(text=txt, header=TRUE)
To build randomly generated dataframes, simply extend out quads and run it through lapply:
max_quads <- 3
quads <- replicate(1000, sample(1:max_quads, 1))
df_list <- lapply(quads, function(q) {
by_list <- by(df, df[c("S", "T")], FUN=function(df) df[sample(nrow(df), q),]))
sample_df <- do.call(rbind, by_list)
sample_df <- with(sample_df, sample_df[order(S, T),])
row.names(sample_df) <- NULL
return(sample_df)
})
i want to create a new variable in a data frame that contains information about the other variables.
I have got a large data frame. To keep it short let's say:
a <- c(1,0,2,3)
b <- c(3,0,1,1)
c <- c(2,0,2,2)
d <- c(4,1,1,1)
(df <- data.frame(a,b,c,d) )
a b c d
1 1 3 2 4
2 0 0 0 1
3 2 1 2 1
4 3 1 2 1
Aim: Create a new variable that informs me if one person (row) has cero reports (or missings / NA) either in the variables a+b or in the variables c+d.
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1
As i have a large data frame i was thinking about the use of df[1:2] and df[3:4] so that i do not need to type every variable name. But i am not sure which is the best way to implement it. Maybe dplyr has a nice option?
df$x <- ifelse(rowSums(df), 1, NA)
EDIT: Answer to the updated question:
df$x <- ifelse(rowSums(df[1:2])&rowSums(df[3:4]), 1, NA)
gives,
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1
i have a dataframe structured like this
time <- c(1,1,1,1,2,2)
group <- c('a','b','c','d','c','d')
number <- c(2,3,4,1,2,12)
df <- data.frame(time,group,number)
time group number
1 1 a 2
2 1 b 3
3 1 c 4
4 1 d 1
5 2 c 2
6 2 d 12
in order to plot the data i need it to contain the values for each group (from a-d) at each time interval, even if they equal zero. so a data frame looking like this:
time group number
1 1 a 2
2 1 b 3
3 1 c 4
4 1 d 1
5 2 a 0
6 2 b 0
7 2 c 2
8 2 d 12
any help?
You can use expand.grid and merge, like this:
> merge(df, expand.grid(lapply(df[c(1, 2)], unique)), all = TRUE)
time group number
1 1 a 2
2 1 b 3
3 1 c 4
4 1 d 1
5 2 a NA
6 2 b NA
7 2 c 2
8 2 d 12
From there, it's just a simple matter of replacing NA with 0.
new <- merge(df, expand.grid(lapply(df[c(1, 2)], unique)), all.y = TRUE)
new[is.na(new$number),"number"] <- 0
new
The goal is to produce a frequency table of all my selected variables (about reading habits for 4 Newspapers) which in essence have the same possible values:
1= Subscribed
2= Every week
3= Sometimes
4= Never
0= NA (No Answers)
The problem arises if one of the variables does not contain one of the possible value. For example, if no one is subscribed to that particular Newspaper.
a <- c(1,2,3,4,3,1,2,3,4,3)
b <- c(2,2,3,4,3,0,0,3,4,1)
d <- c(2,2,3,4,3,0,0,0,0,0)
e <- c(3,3,3,3,3,3,3,3,3,3)
ta <- table(a)
tb <- table(b)
td <- table(d)
te <- table(e)
abde <- cbind(ta,tb,td,te)
ta tb td te
0 2 2 5 10
1 2 1 2 10
2 4 2 2 10
3 2 3 1 10
4 2 2 5 10
Zero Frequencies are replaced by a duplicate of the last value.
How can this be acheived in a better way?
I think you are looking for factor:
> L <- list(a, b, d, e)
> A <- sort(unique(unlist(L, use.names = FALSE)))
> sapply(L, function(x) table(factor(x, A)))
[,1] [,2] [,3] [,4]
0 0 2 5 0
1 2 1 0 0
2 2 2 2 0
3 4 3 2 10
4 2 2 1 0
Update
Here's an approach in base R that might even be more direct:
> L <- mget(c("a", "b", "d", "e"))
> table(stack(L))
ind
values a b d e
0 0 2 5 0
1 2 1 0 0
2 2 2 2 0
3 4 3 2 10
4 2 2 1 0
You could use mtabulate from qdapTools
library(qdapTools)
t(mtabulate(list(a,b,d,e)))
# [,1] [,2] [,3] [,4]
#0 0 2 5 0
#1 2 1 0 0
#2 2 2 2 0
#3 4 3 2 10
#4 2 2 1 0
Or
t(mtabulate(data.frame(a,b,d,e)))
# a b d e
#0 0 2 5 0
#1 2 1 0 0
#2 2 2 2 0
#3 4 3 2 10
#4 2 2 1 0
This is similar to #Anandas solution (I will post it because was already in middle of writing)
df <- data.frame(a, b, d, e)
do.call(cbind, lapply(df, function(x) table(factor(x, levels = 0:4))))
# a b d e
# 0 0 2 5 0
# 1 2 1 0 0
# 2 2 2 2 0
# 3 4 3 2 10
# 4 2 2 1 0