reorder columns based on values in a particular row. - r

I have the following data in a dataframe:
aa bb cc
1 3 4 5
2 5 4 3
3 7 8 6
..
100 33 63 55
I need to reorder the columns based on the values in the last row. The result of this transformation would be:
bb cc aa
1 4 5 3
2 4 3 5
3 8 6 7
...
100 63 55 33

x <- structure(list(aa = c(3L, 5L, 7L, 33L), bb = c(4L, 4L, 8L, 63L),
cc = c(5L, 3L, 6L, 55L)), .Names = c("aa", "bb", "cc"),
class = "data.frame", row.names = c("1", "2", "3", "100"))
x[,order(-x[nrow(x),])]

Building on Joshua Ulrich's answer, in case you want to sort by the row name, rather than number:
x[, order(-x[which(rownames(x) == '100'), ]) ]
where 100 is the row name, as in the example above.

Related

how to remove part of a string without interrupting a data frame?

I have a data looks like this but way much bigger
df<- structure(list(names = c("bests-1", "trible-1", "crazy-1", "cool-1",
"nonsense-1", "Mean-1", "Lose-1", "Trye-1", "Trified-1"), Col = c(1L,
2L, NA, 4L, 47L, 294L, 2L, 1L, 3L), col2 = c(2L, 4L, 5L, 7L,
9L, 9L, 0L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
as an example, I am trying to remove -1 from all strings of the first column
I can do this with
as.data.frame(str_remove_all(df$names, "-1"))
the problem is that it will remove all other columns as well.
I dont want to split the data and merge again because I am afraid I Make a mismatch
Is there anyway without interrupting, just getting raid of specific strings?
for instance the output should looks like this
names Col col2
bests 1 2
trible 2 4
crazy NA 5
cool 4 7
nonsense 47 9
Mean 294 9
Lose 2 0
Try 1 2
Trified 3 3
Using gsub, escape the special \\-, and $ for end of string.
transform(df, names=gsub('\\-1$', '', names))
# names Col col2
# 1 bests 1 2
# 2 trible 2 4
# 3 crazy NA 5
# 4 cool 4 7
# 5 nonsense 47 9
# 6 Mean 294 9
# 7 Lose 2 0
# 8 Trye 1 2
# 9 Trified 3 3
Data:
df <- structure(list(names = c("bests-1", "trible-1", "crazy-1", "cool-1",
"nonsense-1", "Mean-1", "Lose-1", "Trye-1", "Trified-1"), Col = c(1L,
2L, NA, 4L, 47L, 294L, 2L, 1L, 3L), col2 = c(2L, 4L, 5L, 7L,
9L, 9L, 0L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
Using stringr package,
df$names = str_remove_all(df$names, '-1')
names Col col2
1 bests 1 2
2 trible 2 4
3 crazy NA 5
4 cool 4 7
5 nonsense 47 9
6 Mean 294 9
7 Lose 2 0
8 Trye 1 2
9 Trified 3 3
We could use trimws from base R
df$names <- trimws(df$names, whitespace = "-\\d+")
-output
> df
names Col col2
1 bests 1 2
2 trible 2 4
3 crazy NA 5
4 cool 4 7
5 nonsense 47 9
6 Mean 294 9
7 Lose 2 0
8 Trye 1 2
9 Trified 3 3

Is there any way to calculate individual equation in a column using R?

I have a data frame where I have a column name Rooms which holds the number of rooms in the house. It has about 50,000+ rows and I checked it using str(df$Rooms) and it is a factor with 44 levels. The column looks like this :
>str(df$Rooms)
Factor w/ 44 levels "","1","1+1","1+2",..: 20 32 23 27 28 29 27 23 26 24 ...
> df$Rooms
1+2
3
1+3
1+2
4
3
1+1
2
..
..
My question is there any way or any functions or library in R that can be used to get the value of these equations. Maybe so that it can become something like this :
> df$Rooms
3
3
4
3
4
3
2
2
..
..
Thank you in advance~
We can use eval parse
df$final_rooms <- sapply(as.character(df$Rooms), function(x) eval(parse(text = x)))
df
# Rooms final_rooms
#1 1+2 3
#2 3 3
#3 1+3 4
#4 1+2 3
#5 4 4
#6 3 3
#7 1+1 2
#8 2 2
data
df <- structure(list(Rooms = structure(c(2L, 5L, 3L, 2L, 6L, 5L, 1L,
4L), .Label = c("1+1", "1+2", "1+3", "2", "3", "4"), class = "factor")),
class = "data.frame", row.names = c(NA, -8L))
We can split by the + and do a sum after converting to numeric without using the eval(parse in base R
df$final_rooms <- sapply(strsplit(as.character(df$Rooms) , "+",
fixed = TRUE), function(x) sum(as.numeric(x)))
Or another option is to read with read.table into two columns and do a rowSums with vectorized option
df$final_rooms <- rowSums(read.table(text = as.character(df$Rooms),
sep="+", header = FALSE, fill = TRUE), na.rm = TRUE)
df$final_rooms
#[1] 3 3 4 3 4 3 2 2
data
df <- structure(list(Rooms = structure(c(2L, 5L, 3L, 2L, 6L, 5L, 1L,
4L), .Label = c("1+1", "1+2", "1+3", "2", "3", "4"), class = "factor")),
class = "data.frame", row.names = c(NA, -8L))

How to sum rows based on exact conditions on multiple columns and save edited rows in original dataset? [duplicate]

This question already has answers here:
Find nearest matches for each row and sum based on a condition
(4 answers)
Closed 3 years ago.
There are 3 parts to this problem:
1) I want to sum values in column b,c,d for any two adjacent rows which have the same values for columns(b,c,d)
2) I would like to keep values in other columns the same. (Some other column (eg. a) may contain character data.)
3) I would like to keep the changes by replacing the original value in columns b,c,d in the first row (of the 2 same rows) with the new values (the sums) and delete the second row(of the 2 same rows).
Time a b c d id
1 2014/10/11 A 40 20 10 1
2 2014/10/12 A 40 20 10 2
3 2014/10/13 B 9 10 9 3
4 2014/10/14 D 16 5 12 4
5 2014/10/15 D 1 6 5 5
6 2014/10/16 B 20 7 8 6
7 2014/10/17 B 20 7 8 7
8 2014/10/18 A 11 9 5 8
9 2014/10/19 C 31 20 23 9
Expected outcome:
Time a b c d id
1 2014/10/11 A 80 40 20 1 *
3 2014/10/13 B 9 10 9 3
4 2014/10/14 D 16 5 12 4
5 2014/10/15 D 1 6 5 5
6 2014/10/16 B 40 14 16 6 *
8 2014/10/18 A 11 9 5 8
9 2014/10/19 C 31 20 23 9
id 1 and 2 combined to become id 1; id 6 and 7 combined to become id 6.
Thank you. Any contribution is greatly appreciated.
Using dplyr functions along with data.table::rleid. To get same values for adjacent b, c and d columns we paste them and use rleid to create groups. For each group we sum the values at b, c and d columns and keep only the 1st row.
library(dplyr)
df %>%
mutate(temp_col = paste(b, c, d, sep = "-")) %>%
group_by(group = data.table::rleid(temp_col)) %>%
mutate_at(vars(b, c, d), sum) %>%
slice(1L) %>%
ungroup %>%
select(-temp_col, -group)
# Time a b c d id
# <fct> <fct> <int> <int> <int> <int>
#1 2014/10/11 A 80 40 20 1
#2 2014/10/13 B 9 10 9 3
#3 2014/10/14 D 16 5 12 4
#4 2014/10/15 D 1 6 5 5
#5 2014/10/16 B 40 14 16 6
#6 2014/10/18 A 11 9 5 8
#7 2014/10/19 C 31 20 23 9
data
df <- structure(list(Time = structure(1:9, .Label = c("2014/10/11",
"2014/10/12", "2014/10/13", "2014/10/14", "2014/10/15", "2014/10/16",
"2014/10/17", "2014/10/18", "2014/10/19"), class = "factor"),
a = structure(c(1L, 1L, 2L, 4L, 4L, 2L, 2L, 1L, 3L), .Label = c("A",
"B", "C", "D"), class = "factor"), b = c(40L, 40L, 9L, 16L,
1L, 20L, 20L, 11L, 31L), c = c(20L, 20L, 10L, 5L, 6L, 7L,
7L, 9L, 20L), d = c(10L, 10L, 9L, 12L, 5L, 8L, 8L, 5L, 23L
), id = 1:9), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9"))

How to calculate field in R based on the formula specified in excel

I want R to read in an Excel containing formula and produce its output.
Say for example if I provide the following as input:
a b c
2 5 =a+b
3 2 =a+b
3 3 =a+b
6 4 =a+b
4 2 =a+b
I should get this output:
a b c
2 5 7
3 2 5
3 3 6
6 4 10
4 2 6
An option would be to remove the = using sub and evaluate the first element (as it is the same for all the rows)
df1$c <- with(df1, eval(parse(text= sub("=", "", c[1]))))
df1$c
#[1] 7 5 6 10 6
data
df1 <- structure(list(a = c(2L, 3L, 3L, 6L, 4L), b = c(5L, 2L, 3L, 4L,
2L), c = c("=a+b", "=a+b", "=a+b", "=a+b", "=a+b")), .Names = c("a",
"b", "c"), class = "data.frame", row.names = c(NA, -5L))

How to intersect values from two data frames with R

I would like to create a new column for a data frame with values from the intersection of a row and a column.
I have a data.frame called "time":
q 1 2 3 4 5
a 1 13 43 5 3
b 2 21 12 3353 34
c 3 21 312 123 343
d 4 123 213 123 35
e 4556 11 123 12 3
And another table, called "event":
q dt
a 1
b 3
c 4
d 2
e 1
I want to put another column called inter on the second table that will be fill the values that are in the intersection between the q and the columns dt from the first data.frame. So the result would be this:
q dt inter
a 1 1
b 3 12
c 4 123
d 2 123
e 1 4556
I have tried to use merge(event, time, by.x = "q", by.y = "dt"), but it generate the error that they aren't the same id. I have also tried to transpose the time data.frame to cross section the values but I didn't have success.
library(reshape2)
merge(event, melt(time, id.vars = "q"),
by.x=c('q','dt'), by.y=c('q','variable'), all.x = TRUE)
Output:
q dt value
1 a 1 1
2 b 3 12
3 c 4 123
4 d 2 123
5 e 1 4556
Notes
We use the function melt from the package reshape2 to convert the data frame time from wide to long format. And then we merge (left outer join) the data frames event and the melted time by two columns (q and dt in event, q and variable in the melted time) .
Data:
time <- structure(list(q = structure(1:5, .Label = c("a", "b", "c", "d",
"e"), class = "factor"), `1` = c(1L, 2L, 3L, 4L, 4556L), `2` = c(13L,
21L, 21L, 123L, 11L), `3` = c(43L, 12L, 312L, 213L, 123L), `4` = c(5L,
3353L, 123L, 123L, 12L), `5` = c(3L, 34L, 343L, 35L, 3L)), .Names = c("q",
"1", "2", "3", "4", "5"), class = "data.frame", row.names = c(NA,
-5L))
event <- structure(list(q = structure(1:5, .Label = c("a", "b", "c", "d",
"e"), class = "factor"), dt = c(1L, 3L, 4L, 2L, 1L)), .Names = c("q",
"dt"), class = "data.frame", row.names = c(NA, -5L))
This may be a little clunky but it works:
inter=c()
for (i in 1:nrow(time)) {
xx=merge(time,event,by='q')
dt=xx$dt
z=y[i,dt[i]+1]
inter=c(inter,z)
final=cbind(time[,1],dt,inter)
}
colnames(final)=c('q','dt','inter')
Hope it helps.
Output:
q dt inter
[1,] 1 1 1
[2,] 2 3 12
[3,] 3 4 123
[4,] 4 2 123
[5,] 5 1 4556

Resources