I have a list x2 having two data frames, x and x1. Both have 4 columns: n,m,l and k. I want to select the data frame that has maximum last value for column k.
In the below example, I would like data frame 2nd to be selected because the last value in column K is greater than last value in column K for data frame 1.
x <- data.frame(n = c(2, 13, 5),m = c(2, 23, 6),l = c(2, 33, 7),k = c(2, 43, 8))
x1 <- data.frame((n = c(2, 3, 15),m = c(2, 3, 16),l = c(2, 3, 17),k = c(2, 3, 18))
x2<-list(x,x1)
Using lapply, loop through the list of x2 and get the last value of k column of that data frame. Using which.max, find the index which has the maximum of the previous lapply command and extract that dataframe from x2
Note: This code does not account for ties in the last value of k column.
x2[which.max(lapply(x2, function(x) tail(x$k, 1)))]
# [[1]]
# n m l k
# 1 2 2 2 2
# 2 3 3 3 3
# 3 15 16 17 18
if(x$k[length(x$k)] >= x1$k[length(x1$k)]) x else x1
an if statement where
x$k[length(x$k)] - gets the last element from column k of matrix x
n m l k
1 2 2 2 2
2 3 3 3 3
3 15 16 17 18
Related
I have a data frame df with four columns; three integer columns and a special column containing a list:
df <- data.frame(w= 1:3, x=3:5, y=6:8, z = I(list(1:2, 1:3, 1:4)))
> df
w x y z
1 1 3 6 1, 2
2 2 4 7 1, 2, 3
3 3 5 8 1, 2, 3, 4
>class(df$z)
[1] "AsIs"
I want to transform each element of the column df["z"] by separately multiplying it with the corresponding element (same row number) of each of the other columns (df["w"], df["x"], df["y"]) of the same data frame df.
I have found the possibility of using Map("*", df$z, df$x), but it can only perform the required multiplication with one other column at a time. My data set is too large to let me perform the multiplication in such small steps.
> Map("*", df$z, df$x)
[[1]]
[1] 3 6
[[2]]
[1] 4 8 12
[[3]]
[1] 5 10 15 20
Can anyone please provide a hint on how to multiply df["z"] with each of the other columns at once while preserving the data frame structure?
I expect the output to be a data frame df1 with column names w,x,y.
>df1
w x y
1 2 3 6 6 12
2 4 6 4 8 12 7 14 21
3 6 9 12 5 10 15 20 8 16 24 32
Thank you.
We can use transmute_at
library(tidyverse)
df %>%
transmute_at(vars(w, x, y), funs(map2(z, ., `*`)))
# w x y
#1 1, 2 3, 6 6, 12
#2 2, 4, 6 4, 8, 12 7, 14, 21
#3 3, 6, 9, 12 5, 10, 15, 20 8, 16, 24, 32
Or as #Ryan mentioned if there are more columns and the multiplier list column is single, we can use one_of within transmute_at to select other columns except the 'z'
df %>%
transmute_at(vars(-one_of('z')), funs(map2(z, .,`*`)))
I have a data frame with two columns, let's call them X and Y. Here's an example of it:
df <- data.frame(X = LETTERS[1:8],
Y = c(14, 12, 12, 11, 9, 6, 4, 1),
stringsAsFactors = FALSE)
which produces this:
X Y
A 14
B 12
C 12
D 11
E 9
F 6
G 4
H 1
Note that the data frame will always be ordered in a descending order based on Y. I want to group together cases where the Y values lie within a certain range, while updating the X column to reflect the grouping too. For example, if the value is 2, I would like the final output to be:
X new_Y
A 14.00000
B C D 11.66667
E 9.00000
F G 5.00000
H 1.00000
Let me explain how I got that. From the starting df data frame, the closest values were B and C. Joining them would result in:
X new_Y
A 14
B C 12
D 11
E 9
F 6
G 4
H 1
The new_Y value for cases B and C is the average of the original values for B and C i.e. 12. From this second data frame, B C are within 2 from D so they are the next to be grouped together:
X new_Y
A 14.00000
B C D 11.66667
E 9.00000
F 6.00000
G 4.00000
H 1.00000
Note that the Y value for B C D is 11.67 because the original values of B, C and D were 12, 12 and 11 respectively and their average is 11.667. I wouldn't want the code to return the average Y from the previous iteration (which in this case would be 11.5).
Finally, F and G can also be grouped together, producing the final output stated above.
I'm not sure of the code needed to achieve this. My only thoughts were to calculate the distance from the previous and following element, look for the minimum and check whether it exceeds the threshold value (of 2 in the example above). Based on where that minimum appears, join the X column while averaging the Y values from the original table. Repeat this until the minimum becomes larger than the threshold.
But I'm not sure how to write the necessary code to achieve this or whether there's a more efficient solution to the algorithm I'm suggesting above. Any help will be much appreciated.
P.S I forgot to mention that if the distance between the previous and the following Y value is the same, then the grouping should be done towards the larger Y value. So
X Y
A 10
B 8
C 6
would be returned as
X new_Y
A B 9
C 6
Thanks in advance for your patience. My apologies if I didn't explain this very well.
This sounds like hierarchical agglomerative clustering.
To get the groups, use dist, hclust and cutree.
Note that centroid clustering with hclust expects the distances as the square of the Euclidean distance.
df <- data.frame(X = LETTERS[1:8],
Y = c(14, 12, 12, 11, 9, 6, 4, 1),
stringsAsFactors = FALSE)
dCutoff <- 2
d2 <- dist(df$Y)^2
hc <- hclust(d2, method = "centroid")
group_id <- cutree(hc, h = dCutoff^2)
group_id
#> [1] 1 2 2 2 3 4 4 5
To munge the original table, we can use dplyr.
library('dplyr')
df %>%
group_by(group_id = group_id) %>%
summarise(
X = paste(X, collapse = ' '),
Y = mean(Y))
#> # A tibble: 5 x 3
#> group_id X Y
#> <int> <chr> <dbl>
#> 1 1 A 14.00000
#> 2 2 B C D 11.66667
#> 3 3 E 9.00000
#> 4 4 F G 5.00000
#> 5 5 H 1.00000
This gives the average of the previous iteration though. In any case I hope it helps
library(data.table)
df <- data.table(X = LETTERS[1:8],
Y = c(14, 12, 12, 11, 9, 6, 4, 1),
stringsAsFactors = FALSE)
differences <- c(diff(df$Y),NA) # NA for the last element
df$difference <- abs(differences) # get the differences of the consequent elements(since Y is sorted it works)
minimum <- min(df$difference[1:(length(df$difference)-1)]) # get the minimum
while (minimum < 2){
index <- which(df$difference==minimum) # see where the minimum occurs
check = FALSE
# because the last row cannot have a number since there is not an element after that
# we need to see if this element has the minimum difference with its previous
# if it does not have the minimum difference then we exclude it and paste it later
if(df[nrow(df)-1,difference]!=minimum){
last_row <- df[nrow(df)]
df <- df[-nrow(df)]
check = TRUE
}
tmp <- df[(index:(index+1))]
df <- df[-(index:(index+1))]
to_bind <- data.table(X = paste0(tmp$X, collapse = " "))
to_bind$Y <- mean(tmp$Y)
df <- rbind(df[,.(X,Y)],to_bind)
if(check){
df <- rbind(df,last_row[,.(X,Y)])
}
setorder(df,-Y)
differences <- c(diff(df$Y),NA) # NA for the last element
df$difference <- abs(differences) # get the differences of the consequent elements(since Y is sorted it works)
minimum <- min(df$difference[1:(length(df$difference)-1)]) # get the minimum
}
my objective is to sum every nth row by every count. Maybe a loop function might help.
I used this code :
irr = rollapply( irr , width = 1 , by = n , align = "left" , FUN = sum )
Example:
V1
3
2
4
7
5
so if n = 2, the first 2 rows will sum up.
Results:
V1
5
4
7
5
So the problem is, i have multiple "n" in another data.frame variable.
2 5 3 and i want to make "n" change, let say to "3" when it finish summing the first two rows,
next n = 3
Results:
5 16
This is my first time using r so please pardon me for any mistake i made and if the question is hard to understand.Thanks
You can split the data frame according to n and then sum it over every list
As an example,
v1 <- data.frame(X = c(3,2,4,7,5, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4))
n <- data.frame(Y = c(2, 3, 2, 4, 1,4))
unlist(lapply(split(v1$X, rep(1:nrow(n), n$Y)), sum))
# 1 2 3 4 5 6
# 5 16 5 22 8 10
I have a dataframe with 1 column consisting of 10 lists each with a varying number of elements. I also have a vector with 10 different values in it (10 integers).
I want to take the "sumproduct" of each 10 lists with its corresponding vector value, and end up with 10 values.
Value 1 = sumproduct(First list, First vector value)
Value 2 = sumproduct(Second list, Second vector value)
etc...
Final_Answer <- c(Value 1, Value 2, ... , Value 10)
I have a function that generates the dataframe containing lists of numbers representing years. The dataframe is contructed using a loop to generate each value then rowbinding the value together with the dataframe.
Time_Function <- function(Maturity)
{for (i in 0:Count)
{x<-as.numeric(((as.Date(as.Date(Maturity)-i*365)-Start_Date)/365)
Time <- rbind(Time, data.frame(x))}
return((Time))
}
The result is this:
http://pastebin.com/J6phR2hv
http://i.imgur.com/Sf4mpA5.png
If my vector looks like [1,2,3,4...,10], I want the output to be:
Final Answer = [(1*1.1342466 + 1*0.6342466 + 1* 0.1342466), (2*1.3835616 + 2*0.8835616 + 2*0.3835616), ... , ( ... +10*0.0630137)]
Assuming you want to multiply each value in the list by the respective scalar and then add it all up, here is one way to do it.
list1 <- mapply(rep, 1:10, 10:1)
vec1 <- 1:10
df <- data.frame( I(list1), vec1)
df
list1 vec1
1 1, 1, 1,.... 1
2 2, 2, 2,.... 2
3 3, 3, 3,.... 3
4 4, 4, 4,.... 4
5 5, 5, 5,.... 5
6 6, 6, 6,.... 6
7 7, 7, 7, 7 7
8 8, 8, 8 8
9 9, 9 9
10 10 10
mapply(df$list1, df$vec1, FUN = function(x, y) {y* sum(x)})
[1] 10 36 72 112 150 180 196 192 162 100
I have a vector x, that I would like to sort based on the order of values in vector y. The two vectors are not of the same length.
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
The expected result would be:
[1] 4 4 4 2 2 1 3 3 3
what about this one
x[order(match(x,y))]
You could convert x into an ordered factor:
x.factor <- factor(x, levels = y, ordered=TRUE)
sort(x)
sort(x.factor)
Obviously, changing your numbers into factors can radically change the way code downstream reacts to x. But since you didn't give us any context about what happens next, I thought I would suggest this as an option.
How about?:
rep(y,table(x)[as.character(y)])
(Ian's is probably still better)
In case you need to get order on "y" no matter if it's numbers or characters:
x[order(ordered(x, levels = y))]
4 4 4 2 2 1 3 3 3
By steps:
a <- ordered(x, levels = y) # Create ordered factor from "x" upon order in "y".
[1] 2 2 3 4 1 4 4 3 3
Levels: 4 < 2 < 1 < 3
b <- order(a) # Define "x" order that match to order in "y".
[1] 4 6 7 1 2 5 3 8 9
x[b] # Reorder "x" according to order in "y".
[1] 4 4 4 2 2 1 3 3 3
[Edit: Clearly Ian has the right approach, but I will leave this in for posterity.]
You can do this without loops by indexing on your y vector. Add an incrementing numeric value to y and merge them:
y <- data.frame(index=1:length(y), x=y)
x <- data.frame(x=x)
x <- merge(x,y)
x <- x[order(x$index),"x"]
x
[1] 4 4 4 2 2 1 3 3 3
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
for(i in y) { z <- c(z, rep(i, sum(x==i))) }
The result in z: 4 4 4 2 2 1 3 3 3
The important steps:
for(i in y) -- Loops over the elements of interest.
z <- c(z, ...) -- Concatenates each subexpression in turn
rep(i, sum(x==i)) -- Repeats i (the current element of interest) sum(x==i) times (the number of times we found i in x).
Also you can use sqldf and do it by a join function in sql likes the following:
library(sqldf)
x <- data.frame(x = c(2, 2, 3, 4, 1, 4, 4, 3, 3))
y <- data.frame(y = c(4, 2, 1, 3))
result <- sqldf("SELECT x.x FROM y JOIN x on y.y = x.x")
ordered_x <- result[[1]]