I have list with names in A1:A144 and I want to move A49:A96 to B1:B48 and A97:144 to C1:C48.
So for each 48th row, I want the next 48 rows moved to a new column.
How to do that?
If you want to consider a VBA alternative then:
Sub MoveData()
nF = 1
nL = 48
nSize = Cells(Rows.Count, "A").End(xlUp).Row
nBlock = nSize / nL
For k = 1 To nBlock
nF = nF + 48
nL = nL + 48
Range("A" & nF & ":A" & nL).Copy Cells(1, k + 1)
Range("A" & nF & ":A" & nL).ClearContents
Next k
End Sub
Not sure how scalable this solution is, but it does work.
First let's pretend your names are x and you want the solution to be in new.df
number.shifts <- ceiling(length(x) / 48) # work out how many columns we need
# create an empty (NA) data frame with the dimensions we need
new.df <- matrix(data = NA, nrow = length(x), ncol = number.shifts)
# run a for-loop over the x, shift the column over every 48th row
j <- 1
for (i in 1:length(x)){
if (i %% 48 == 0) {j <- j + 1}
new.df[i,j] <- x[i]
}
I think you have to elaborate on your question a little more. Do you have the data in R or in Excel and do you want the output to be in R or in Excel?
That beeing said, if x is your vector indicating clusters
x <- rep(1:3, each = 48)
and y is the variable containing names or whatever that you want to distribute over columns A:C (each having 48 rows),
y <- sample(letters, 3 * 48, replace = TRUE)
you can do this:
y.wide <- do.call(cbind, split(y, x))
Just as there is stack in R to create a very long representation of a group of columns, there is unstack to take a long column and make it into a wide form.
Here's a basic example:
mydf <- data.frame(A = 1:144)
mydf$groups <- paste0("A", gl(n=3, k=48)) ## One of many ways to create groups
mydf2 <- unstack(mydf)
head(mydf2)
# A1 A2 A3
# 1 1 49 97
# 2 2 50 98
# 3 3 51 99
# 4 4 52 100
# 5 5 53 101
# 6 6 54 102
tail(mydf2)
# A1 A2 A3
# 43 43 91 139
# 44 44 92 140
# 45 45 93 141
# 46 46 94 142
# 47 47 95 143
# 48 48 96 144
Related
I have a dataframe, and I want to do some calculations depending on the previous rows (like dragging informations down in excel). My DF looks like this:
set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2
df
# DA HB D AD GM GL R RM
# 1 2 399 4 13 30 646 60 0
# 2 2 97 4 10 31 NA NA 0
# 3 1 102 5 5 31 NA NA 0
# 4 3 325 4 2 31 NA NA 0
# 5 2 78 3 14 30 NA NA 0
# 6 1 269 4 8 30 NA NA 0
I want to fill out the missing values in my GL, R and RM columns, and the values are dependent on each other. So eg.
attach(df)
#calc GL and R for the 2nd row
df$GL[2] <- GL[1]+HB[2]+RM[1]
df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]
#calc GL and R for the 3rd row
df$GL[3] <- df$GL[2]+HB[3]+df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]
#and so on..
Is there a way to do all the calculations at once, instead of row by row?
In addition, each time the column 'DA' = 1, the previous values for 'R' should be summed up for the same row for 'RM', but only from the last occurence. So that
attach(df)
df$RM[3] <-R[1]+R[2]+R[3]
#and RM for the 6th row is calculated by
#df$RM[6] <-R[4]+R[5]+R[6]
Thanks a lot in advance!
You can use a for loop to calculate GL values and once you have them you can do the calculation for R columns directly.
for(i in 2:nrow(df)) {
df$GL[i] <- with(df, GL[i-1]+HB[i]+RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))
You can use indexing to solve the first two problems:
> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
+ D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
+ GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> # View df
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60 0
2 2 97 4 10 31 NA NA 0
3 1 102 5 5 31 NA NA 0
4 3 325 4 2 31 NA NA 0
5 2 78 3 14 30 NA NA 0
6 1 269 4 8 30 NA NA 0
> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)]))
> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]
The third problem can be solved with a loop.
> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
+ df$RM[i] <- df$R[i] + df$RM[i-1] * (df$DA[i] != 2)
+ }
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60.000000 60.000000
2 2 97 4 10 31 743 9.587097 9.587097
3 1 102 5 5 31 845 27.258065 36.845161
4 3 325 4 2 31 1170 75.483871 112.329032
5 2 78 3 14 30 1248 8.914286 8.914286
6 1 269 4 8 30 1517 25.283333 34.197619
Do these results look correct?
Update: Assuming RM should = R unless DA = 1, and in that case RM = sum of current row and previous R up to (not including) the above row with DA = 1, try the following loop.
df$RM[1] <- cs <- df$R[1]
for (i in 2:nrow(df)) {
df$RM[i] <- df$R[i] + cs * (df$DA[i] == 1)
cs <- cs * (df$DA[i] != 1) + df$R[i]
}
I have a list:
l1<-list(A=1:10, B=100:120, C=300:310, D=400:430)
How do I convert it to dataframe with 2 columns:
C1 C2
R1 1 A
R2 2 A
...
R10 10 A
R11 100 B
R12 101 B
....
R73 429 D
R73 430 D
I tried:
df1 <- data.frame(matrix(unlist(l1), nrow=length(l1), byrow=T))
But I'm getting an error because the vectors in my list have multiple lengths. Also my actual list consist of Dates and not just integers.
Just use stack:
stack(l1)
> head(stack(l1))
values ind
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
> tail(stack(l1))
values ind
68 425 D
69 426 D
70 427 D
71 428 D
72 429 D
73 430 D
Update
stack won't work with dates. If you have actual date objects, you can do:
data.frame(ind = rep(names(l1), lengths(l1)),
val = as.Date(unlist(l1), origin = "1970-01-01"))
or
data.frame(ind = rep(names(l1), lengths(l1)), val = do.call(c, l1))
Sample data:
l1<-list(A=Sys.Date()+(1:10),
B=Sys.Date()+(100:120),
C=Sys.Date()+(300:310),
D=Sys.Date()+(400:430))
Here's one method: Similar to #Duck answer using Map and do.call
tmp <- Map(data.frame,N = l1,L = names(l1))
out <- do.call(rbind,tmp)
rownames(out) <- NULL
> tail(out)
N L
68 425 D
69 426 D
70 427 D
71 428 D
72 429 D
73 430 D
Maybe a long solution, but using mapply() and do.call() you can reach the expected result. First, you can extract the names of the list as well as the number of elements. Then, using mapply() you can create a list for the first column in your desired result. After that you combine mapply(), do.call(), rbind() and cbind() to end up with df. Here the code:
#Code
#names
v1 <- names(l1)
#length
v2 <- unlist(lapply(l1, length))
#Create values
l2 <- mapply(function(x,y) rep(x,y),v1,v2)
#Bind
df <- as.data.frame(do.call(rbind,mapply(cbind,l2,l1)))
df$V2 <- as.numeric(df$V2)
Output (some rows):
head(df,15)
V1 V2
1 A 1
2 A 24
3 A 25
4 A 37
5 A 69
6 A 70
7 A 71
8 A 72
9 A 73
10 A 2
11 B 3
12 B 4
13 B 5
14 B 6
15 B 7
I have a list of dataframes:
df_DJF = data.frame(replicate(2,sample(0:130,30,rep=TRUE)))
df_JJA = data.frame(replicate(2,sample(0:130,20,rep=TRUE)))
df_MAM = data.frame(replicate(2,sample(0:130,25,rep=TRUE)))
df_SON = data.frame(replicate(2,sample(0:130,15,rep=TRUE)))
df_list = list(df_DJF, df_JJA, df_MAM, df_SON)
I want to randomly choose 80% of each the dataframe. I can do that manually by doing this and using the sample_size as row index.
sample_size = floor(0.8*nrow(df_DJF))
picked_DJF = sample(seq_len(nrow(df_DJF)), size = sample_size)
My problem is that I have very many df with different number of rows. So I want to automatize this process. In the end I want to have 4 sample sizes with the correct number in it. The names of the sample_sizes should be:
samplenames = paste("sample_size", c("DJF", "JJA", "MAM", "SON"), sep = "_")
Same for the "picked"...it should be picked_DJF and so on...
Keep using lists, not assign. Set your names(df_list) = c("DJF", "JJA", "MAM", "SON"), then use the same names for subsequent lists, like a picked list.
# for a single sample size
picked = lapply(df_list, function(x) x[sample(1:nrow(x), size = floor(0.8 * nrow(x))), ])
Using lapply will keep the names of the original list so you don't have to worry about it.
For multiple sample sizes from each of the data frames, you could create a nested list with a nested lapply:
names(df_list) = c("DJF", "JJA", "MAM", "SON")
sample_prop = list(s1 = 0.2, s2 = 0.4, s3 = 0.6, s4 = 0.8)
picked = lapply(df_list, function(df) lapply(sample_prop, function(sp) {
df[sample(nrow(df), size = floor(sp * nrow(df))), ]
}))
# then access individual data frames with `$` or `[[`
picked$JJA$s3
# X1 X2
# 17 70 128
# 7 94 121
# 1 57 125
# 8 32 75
# 9 15 8
# 19 58 15
# 20 55 17
# 10 42 15
# 4 51 67
# 12 89 13
# 2 74 50
# 14 77 36
To divide a data frame in to "picked" and "unpicked", split makes sense. It already returns a list. This will give a triple-nested list result:
result = lapply(df_list, function(df) lapply(sample_prop, function(sp) {
n_pick = floor(sp * nrow(df))
n_unpick = nrow(df) - n_pick
split(df, f = c(rep("picked", n_pick), rep("unpicked", n_unpick))[sample(nrow(df))])
}))
result$JJA$s3$unpicked
# X1 X2
# 2 74 50
# 3 62 78
# 4 51 67
# 6 103 42
# 7 94 121
# 11 59 60
# 14 77 36
# 16 83 72
I have a dataframe that looks something like -
test A B C
28 67 4 23
45 82 43 56
34 8 24 42
I need to compare test to the other three columns in that I just need the number of elements in the other column that is less than the corresponding element in the test column.
So the desired output is -
test A B C result
28 67 4 23 2
45 82 43 56 1
34 8 24 42 2
When I tried -
comp_vec = "test"
name_vec = c("A", "B", "C")
rowSums(df[, comp_vec] > df[, name_vec])
I get the error -
Error in Ops.data.frame(df[, comp_vec], df[, name_vec]) :
‘>’ only defined for equally-sized data frames
I am looking for a way without replicating test to match size of dataframe.
You can use sapply to return a vector of mapping the df$test column against the other three columns. That will return a T/F matrix that you can do rowSums, and set as your result column.
df <- data.frame(test = c(28, 45, 34), A = c(67, 82, 8), B = c(4, 43, 24), C = c(23, 56, 42))
df$result <- rowSums(sapply(df[,2:4], function(x) df$test > x))
> df
test A B C result
1 28 67 4 23 2
2 45 82 43 56 1
3 34 8 24 42 2
I noticed your expected results has 82 for the second row of A, whereas its 5 in your starting example.
df$result <- apply(df, 1, function(x) sum(x < x[1]))
Use apply, specify 1 to indicate by row. x < x[1] will give a vector of TRUE/FALSE if the value at each position in the row is smaller than the first column's value. Use sum to give the number of TRUE values.
# test A B C result
# 1 28 67 4 23 2
# 2 45 82 43 56 1
# 3 34 8 24 42 2
I have a two value
3 and 5
and I make vector
num1 <- 3
num2 <- 12
a <- c(num1, num2)
I want add number(12) to vector "a" and
also I want to make new vector with repeat and append
like this:
3,12, 15,24, 27,36, 39,48 ....
repeat number "n" is 6
I don't have any idea.
Here are two methods in base R.
with outer, you could do
c(outer(c(3, 12), (12 * 0:4), "+"))
[1] 3 12 15 24 27 36 39 48 51 60
or with sapply, you can explicitly loop through and calculate the pairs of sums.
c(sapply(0:4, function(i) c(3, 12) + (12 * i)))
[1] 3 12 15 24 27 36 39 48 51 60
outer returns a matrix where every pair of elements of the two vectors have been added together. c is used to return a vector. sapply loops through 0:4 and then calculates the element-wise sum. It also returns a matrix in this instance, so c is used to return a vector.
Here is a somewhat generic function that takes as input your original vector a, the number to add 12, and n,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n/len1), function(i) x*i)
v2 <- rep(v1, each = n/length(v1))
v3 <- rep(vec, n/len1)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48
f1(a, 11, 12)
#[1] 3 12 14 23 25 34 36 45 47 56 58 67 69 78
f1(a, 3, 2)
#[1] 3 12 6 15
EDIT
If by n=6 you mean 6 times the whole vector then,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n), function(i) x*i)
v2 <- rep(v1, each = len1)
v3 <- rep(vec, n)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48 51 60 63 72 75 84
Using rep for repeating and cumsum for the addition:
n = 6
rep(a, n) + cumsum(rep(c(12, 0), n))
# [1] 15 24 27 36 39 48 51 60 63 72 75 84