i have a basic R question: imagine the following code:
a <- c("A","B","C")
b <- c("A","B","C")
c <- c("A","X","C")
x <- c("A","B","C")
y <- c("","B","C")
z <- c("","","C")
frame <- data.frame(a,b,c,x,y,z)
now i want to get the content of the last 3 columns but only if they contain value. So the Output should look like this
new1 <- c("A","X","C")
new2 <- c("A","B","C")
new3 <- c("A","B","C")
frame2 <- data.frame(new1,new2,new3)
I am thankful for every help.
Using apply from base R
as.data.frame(t(apply(frame, 1, FUN = function(x) tail(x[nzchar(x)], 3))))
You can do,
new_frame <- frame[colSums(frame == '') == 0]
new_frame[tail(seq_along(new_frame), 3)]
b c x
1 A A A
2 B X B
3 C C C
Related
I have a df:
AA <- c("GA","GA", "GA","GA","GA")
A <- c(1,2,3,4,5)
B <- c(5,4,3,2,1)
C <- c(2,3,4,5,1)
D <- c(4,3,2,1,5)
df <- data.frame(AA, A, B, C, D)
The other df is:
E <- c("B", "D")
F <- c("GA","GA")
df2 <- data.frame(E, F)
I would like to only select the columns from df based on the values from df2$E.
And that data frame would look like this:
AA <- c("GA","GA", "GA","GA","GA")
B <- c(5,4,3,2,1)
D <- c(4,3,2,1,5)
df3 <- data.frame(AA, B, D)
My current code below gives me a empty data frame with 0 obs and 5 variables
df3 <- df %>% filter(df %in% df2$E)
Any assistance in generating a code that works would be greatly appreciated.
Thank you!
Here we can index via column names.
df[,c("AA",df2$E)]
Let's assume I have this dataframe:
df <- data.frame(A = letters[1:5],
B = letters[6:10],
stringsAsFactors = FALSE)
A B
1 a f
2 b g
3 c h
4 d i
5 e j
Where I'm looking for this output:
A B
1 e j
2 d i
3 c h
4 b g
5 a f
With this function:
f_Order <- function(df){
df$Order <- as.integer(row.names(df))
df <- arrange(df, desc(Order))[,c("A","B")]
}
Though the function above doesn't work, the code inside the function works perfectly:
df$Order <- as.integer(row.names(df))
df <- arrange(df, desc(Order))[,c("A","B")]
> x
A B
1 e j
2 d i
3 c h
4 b g
5 a f
Why? How do I make the function work?
EDIT:
To clarify, the problem statement is not to change the order of the df, but to make the function f_Order to work. The code does what I want, but it doesn't what I want inside that function. I need to know why, and how I can make the function to work.
EDIT2:
This is exactly the code I'm running, and still doesn't work any of the solutions.
x <- data.frame(A = letters[1:5],
B = letters[6:10],
stringsAsFactors = FALSE)
f_Order <- function(df){
df$Order <- as.integer(row.names(df))
df <- arrange(df, desc(Order))
return(df)
}
f_Order(x)
What if you have a return() at the end of your function? Something like this:
f_Order <- function(df){
df$Order <- as.integer(row.names(df))
df <- arrange(df, desc(Order))[,c("A","B")]
return(df)
}
Basically if you have stuff happening in a function, you need to return it at the end if you want there to be an output. Otherwise it just...does it inside the function, but not in the wider environment, and then doesn't show you anything.
Output:
> f_Order(df)
A B
1 e j
2 d i
3 c h
4 b g
5 a f
If you want to update df, then run df <- f_Order(df).
Continuing with dplyr:
f_Order <- function(df){
#df$Order <- as.integer(row.names(df))
df %>%
mutate(Order=row.names(.)) %>%
arrange(desc(Order))
}
If we don't want to keep Order:
f_Order <- function(df){
df %>%
arrange(desc(row.names(.)))
}
Result:
f_Order(df)
A B
1 e j
2 d i
3 c h
4 b g
5 a f
I have a simple function:
new_function <- function(x)
{
letters <- c("A","B","C")
new_letters<- c("D","E","F")
if (x %in% letters) {"Correct"}
else if (x %in% new_letters) {"Also Correct"}
else {x}
}
I make a dataframe with letters:
df <- as.data.frame(LETTERS[seq( from = 1, to = 10 )])
names(df)<- c("Letters")
I want to apply the function on the dataframe:
df$result <- new_function(df$Letters)
And it doesn't work (the function only writes "Correct")
I get this warning:
Warning message:
In if (x %in% letters) { :
the condition has length > 1 and only the first element will be used
You can use lapply:
df$result <- lapply(df$Letters,new_function)
Output:
df
Letters result
1 A Correct
2 B Correct
3 C Correct
4 D Also Correct
5 E Also Correct
6 F Also Correct
7 G 7
8 H 8
9 I 9
10 J 10
I would rewrite your new_function with ifelse as #akrun suggested. as.character converts x to character in case it is a factor:
new_function <- function(x){
ifelse(x %in% c("A","B","C"), "Correct",
ifelse(x %in% c("D","E","F"), "Also Correct", as.character(x)))
}
df$result <- new_function(df$Letters)
or with case_when from dplyr:
library(dplyr)
new_function <- function(x){
case_when(x %in% c("A","B","C") ~ "Correct",
x %in% c("D","E","F") ~ "Also Correct",
TRUE ~ as.character(x))
}
df %>%
mutate(result = new_function(Letters))
Result:
Letters result
1 A Correct
2 B Correct
3 C Correct
4 D Also Correct
5 E Also Correct
6 F Also Correct
7 G G
8 H H
9 I I
10 J J
Data:
df <- as.data.frame(LETTERS[seq( from = 1, to = 10 )])
names(df)<- c("Letters")
I have a relatively large amount of data stored in a list of data frames with several columns.
For each element of the list I wish to check one column against a reference and if present extract the value held in another column of the same element and place in a new summary matrix.
e.g. with the following example code:
add1 = c("N1","N1","N1")
coords1 = c(1,2,3)
vals1 = c("a","b","c")
extra1 = c("x","y","x")
add2 = c("N2","N2","N2","N2")
coords2 = c(2,3,4,5)
vals2 = c("b","c","d","e")
extra2 = c("z","y","x","x")
add3 = c("N3","N3","N3")
coords3 = c(1,3,5)
vals3 = c("a","c","e")
extra3 = c("z","z","x")
df1 <- data.frame(add1, coords1, vals1, extra1)
df2 <- data.frame(add2, coords2, vals2, extra2)
df3 <- data.frame(add3, coords3, vals3, extra3)
list_all <- list(df1, df2, df3)
coordinate.extract <- unique(unlist(lapply(list_all, "[", 1)))
my_matrix <- matrix(0, ncol = length(list_all)
, nrow = (length(coordinate.extract)))
my_matrix_new <- cbind(as.character(coordinate.extract)
, my_matrix)
I would like to end up with:
my_matrix_new = V1 V2 V3 V4
1 a a
2 b b
3 c c c
4 d
5 e e
i.e. the 3rd column of each list element is chosen based on the value of the second column.
I hope this is clear.
Thanks,
Matt
I would use data.frame as there are mixed classes. You may try merge with Reduce to get the expected output. Select the 2nd and 3rd columns,in each list element, change the column name for the 2nd to be same across all the list elements, merge, and if needed replace the NA elements with ''
lst1 <- lapply(list_all, function(x) {names(x)[2] <- 'V1';x[2:3] })
res <- Reduce(function(...) merge(..., by='V1', all=TRUE), lst1)
res[-1] <- lapply(res[-1], as.character)
res[is.na(res)] <- ''
res
# V1 vals1 vals2 vals3
#1 1 a a
#2 2 b b
#3 3 c c c
#4 4 d
#5 5 e e
We can change the column names
names(res) <- paste0('V', seq_along(res))
Situation
I have two data frames, df1 and df2with the same column headings
x <- c(1,2,3)
y <- c(3,2,1)
z <- c(3,2,1)
names <- c("id","val1","val2")
df1 <- data.frame(x, y, z)
names(df1) <- names
a <- c(1, 2, 3)
b <- c(1, 2, 3)
c <- c(3, 2, 1)
df2 <- data.frame(a, b, c)
names(df2) <- names
And am performing a merge
#library(dplyr) # not needed for merge
joined_df <- merge(x=df1, y=df2, c("id"),all=TRUE)
This gives me the columns in the joined_df as id, val1.x, val2.x, val1.y, val2.y
Question
Is there a way to co-locate the columns that had the same heading in the original data frames, to give the column order in the joined data frame as id, val1.x, val1.y, val2.x, val2.y?
Note that in my actual data frame I have 115 columns, so I'd like to stay clear of using joned_df <- joined_df[, c(1, 2, 4, 3, 5)] if possible.
Update/Edit: also, I would like to maintain the original order of column headings, so sorting alphabetically is not an option (-on my actual data, I realise it would work with the example I have given).
My desired output is
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1
Update with solution for general case
The accepted answer solves my issue nicely.
I've adapted the code slightly here to use the original column names, without having to hard-code them in the rep function.
#specify columns used in merge
merge_cols <- c("id")
# identify duplicate columns and remove those used in the 'merge'
dup_cols <- names(df1)
dup_cols <- dup_cols [! dup_cols %in% merge_cols]
# replicate each duplicate column name and append an 'x' and 'y'
dup_cols <- rep(dup_cols, each=2)
var <- c("x", "y")
newnames <- paste(dup_cols, ".", var, sep = "")
#create new column names and sort the joined df by those names
newnames <- c(merge_cols, newnames)
joined_df <- joined_df[newnames]
How about something like this
numrep <- rep(1:2, each = 2)
numrep
var <- c("x", "y")
var
newnames <- paste("val", numrep, ".", var, sep = "")
newdf <- cbind(joined_df$id, joined_df[newnames])
names(newdf)[1] <- "id"
Which should give you the dataframe like this
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1