Output colMeans in columns rather than rows - r

When I use the colMeans function on a dataset, R outputs the means into rows rather than the original column format. Here is an example:
Year J F M A M J J A S O N D
1851 4 6 3 6 9 7 1 2 8 9 5 0
1852 3 8 5 5 5 3 2 8 6 7 4 2
1853 5 7 4 8 6 9 4 4 4 2 1 2
When I use the function
colMeans(df)
The output is returned as:
Year Mean
J 4
F 7
M 4
A 6
etc...
How can I develop the script to ensure the output is organised in columns like the original data rather than rows? It should look like:
J F M A M J J A S O N D
4 7 4 6 etc............

considering input as
dft <- read.table(header = TRUE, text = "Year J F M A M J J A S O N D
1851 4 6 3 6 9 7 1 2 8 9 5 0
1852 3 8 5 5 5 3 2 8 6 7 4 2
1853 5 7 4 8 6 9 4 4 4 2 1 2",stringsAsFactors=FALSE)
you could try
t(round(colMeans(dft),0))
which gives
Year J F M A M.1 J.1 J.2 A.1 S O N D
[1,] 1852 4 7 4 6 7 6 2 5 6 6 3 1
and then get rid of the Year field if you want to.

Related

Suming up consecutive values in groups [duplicate]

This question already has answers here:
Calculate cumulative sum (cumsum) by group
(5 answers)
Closed 2 years ago.
I'd like to sum up consecutive values in one column by groups, without long explanation, I have df like this:
set.seed(1)
gr <- c(rep('A',3),rep('B',2),rep('C',5),rep('D',3))
vals <- floor(runif(length(gr), min=0, max=10))
idx <- c(seq(1:3),seq(1:2),seq(1:5),seq(1:3))
df <- data.frame(gr,vals,idx)
gr vals idx
1 A 2 1
2 A 3 2
3 A 5 3
4 B 9 1
5 B 2 2
6 C 8 1
7 C 9 2
8 C 6 3
9 C 6 4
10 C 0 5
11 D 2 1
12 D 1 2
13 D 6 3
And I'm looking for this one:
gr vals idx
1 A 2 1
2 A 5 2
3 A 10 3
4 B 9 1
5 B 11 2
6 C 8 1
7 C 17 2
8 C 23 3
9 C 29 4
10 C 29 5
11 D 2 1
12 D 3 2
13 D 9 3
So ex. in group C we have 8+9=17 (first and second element of the group) and second value is replaced by the sum. Then 17+6=23 (sum of previously summed elements and third element), 3rd element replaced by the new result and so on...
I was looking for some solution here but it isn't what I'm looking for.
Ok, I think I got it
df %>%
group_by(gr) %>%
mutate(nvals = cumsum(vals))
gr vals idx nvals
1 A 2 1 2
2 A 3 2 5
3 A 5 3 10
4 B 9 1 9
5 B 2 2 11
6 C 8 1 8
7 C 9 2 17
8 C 6 3 23
9 C 6 4 29
10 C 0 5 29
11 D 2 1 2
12 D 1 2 3
13 D 6 3 9

How to collect outputs of vector-valued function into a dataframe?

I have a function f1 that takes a number k as input and returns 3 numbers k, k+1, k+2. I would like to ask how to concatenate these results into a dataframe for k from 1 to 10. In this way, the line k corresponds to the output f1(k).
f1 <- function(k){
return (c(k, k+1, k+2))
}
f1(1)
f1(2)
An option is to Vectorize the function 'f1', pass the values 1 to 10, returns a matrix, and then convert it to data.frame with as.data.frame
as.data.frame(Vectorize(f1)(1:10))
If it needs to be vertical, then transpose the output and apply as.data.frame
as.data.frame(t(Vectorize(f1)(1:10)))
-output
# V1 V2 V3
#1 1 2 3
#2 2 3 4
#3 3 4 5
#4 4 5 6
#5 5 6 7
#6 6 7 8
#7 7 8 9
#8 8 9 10
#9 9 10 11
#10 10 11 12
Or we can use outer
as.data.frame(outer(1:10, 0:2, `+`))
You can also use:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
Output:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
V1 V2 V3
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12

Create a function to Impute values form one data frame into another

The NA values in column A should be filled by the A value from the dat data frame and so on for the other variables.
id <- factor(rep(letters[1:2], each=5))
A <- c(1,2,NA,6,8,9,0,6,7,9)
B <- c(5,6,1,9,8,1,NA,9,7,4)
C <- c(2,3,5,NA,NA,2,7,6,4,6)
D <- c(6,5,8,3,2,9,NA,2,6,8)
df <- data.frame(id, A, B,C,D)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a NA 1 5 8
4 a 6 9 NA 3
5 a 8 8 NA 2
6 b 9 1 2 9
7 b 0 NA 7 NA
8 b 6 9 6 2
9 b 7 7 4 6
10 b 9 4 6 8
dat <- data.frame(col=c("A","B","C","D"), value=c(23,45,26,89))
dat
dat
col value
1 A 23
2 B 45
3 C 26
4 D 89
It should look like:
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a 23 1 5 8
4 a 6 9 26 3
5 a 8 8 26 2
6 b 9 1 2 9
7 b 0 45 7 89
8 b 6 9 6 2
9 b 7 7 4 6
10 b 9 4 6 8
I was thinking something like this but I dont know how to connect those data frames in a function...
test <- function(i){
df[,i][is.na(df[,i])] <- dat$value
}
test(2)
If you want it in your format
test <- function(i){
df[,i][is.na(df[,i])] <<- dat$value[dat$col==i]
}
test("A")
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a 23 1 5 8
4 a 6 9 NA 3
5 a 8 8 NA 2
6 b 9 1 2 9
7 b 0 NA 7 NA
8 b 6 9 6 2
9 b 7 7 4 6
10 b 9 4 6 8
One approach is to iterate over the columns and values and use coalesce():
library(dplyr)
library(purrr)
df[-1] <- map2_df(df[-1], dat$value, coalesce)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a 23 1 5 8
4 a 6 9 26 3
5 a 8 8 26 2
6 b 9 1 2 9
7 b 0 45 7 89
8 b 6 9 6 2
9 b 7 7 4 6
10 b 9 4 6 8
Or same using replace():
map2_df(df[-1], dat$value, ~ replace(.x, is.na(.x), .y))

How to replace the NA values after merge two data.frame? [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

Resources