Recode 2 variables to one in one line [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Say I have a DF like:
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
only it has a long no. of rows. I'd like to create a column depending on simultaneous values of a & b, e.g
df
a b c
0 0 10
0 1 11
1 0 12
1 1 13
I take this can be done with inner joins, using sqldf or maybe dplyr; is there a quicker way, with or without libraries?
Thanks in advance, p

You could do:
library(dplyr)
df %>% mutate(newcol = paste0(a, b))
Depending on how you want the new column to be labelled.
If you have a vector of desired values, let's call it lookup:
lookup <- 10:100
df %>% mutate(newcol = lookup[as.factor(paste0(a, b))])

I think what you mean is that you have some other data frame (say called dictionary) with a c column, and you look up the (a, b) in the dictionary and grab the c from there??
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
dictionary <- df
dictionary$c <- 10:13
dictionary <- dictionary[sample(4), ] # shuffle it just to prove it works
In that case you can do
merge(df, dictionary, merge=c('a', 'b'), all.x=T)
And that will grab the matching c column from dictionary and plonk it into df. The all.x will put a NA there if there is no matching (a, b) in dictionary.
If speed becomes an issue, you might try data.table
library(data.table)
setDT(df) # convert to data.table
setDT(dictionary) # convert to data.table
# set key
setkey(df,a,b)
setkey(dictionary,a,b)
# merge
dictionary[df] # will be `df` with the `c` column added, `NA` if no match

Super cheaty and only applicable to this example but:
df$c <- 10 + df$b + df$a*2?
otherwise, look at ?merge

Related

Why do base R and dplyr::arrange() give different row names in output [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Goal: to get new row names when using base R order() function (as is done with dplyr::arrange()). The rownames/index output for the base R call is 3, 1, 2 as seen below whereas the output for arrange() is 1, 2, 3 (seen below). How can I get 1, 2, 3 using base R order()?
Reprex:
library(dplyr)
df <- data.frame(
company = c("A", "B", "C"),
sales = c(100, 200, 50)
)
# base R:
df[order(df$sales),]
# dplyr:
arrange(df, sales)
# Base R output:
## company sales
## 3 C 50
## 1 A 100
## 2 B 200
# dplyr output:
## company sales
## 1 C 50
## 2 A 100
## 3 B 200
If your goal is for the row numbers after using arrange() to match what you get from order(), then do the following (a few extra dplyr and tibble steps).
library(dplyr)
library(tibble)
df %>%
rownames_to_column() %>%
arrange(sales) %>%
column_to_rownames("rowname")
company sales
3 C 50
1 A 100
2 B 200
If your goal is to the same rownames as what result after arrange(), you can assign the row names after using order().
df_new <- df[order(df$sales),]
rownames(df_new) <- 1:nrow(df_new)
It may be good practice to create an ID column instead of using row names. Usually numbered ID's correspond to the original data, but of course you can create them after your ordering operation.
df_new <- df[order(df$sales),]
df_new$id <- 1:nrow(df_new)

Splitting one big dataframe into multiple CSV.files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Firstly, I have a big data.frame which has 104 rows and 12 columns, I would like to split it up to 13 rows of 8 rows each with the 12 columns.
I am trying to make a code robust enough to not care how many rows there are but simple make a new data.frame every 8 rows.
Also, is it possible after this point to make a code which loops through the 13 data.frames for some calculations?
Here is a way using data.table.split
library(data.table)
#sample data
set.seed(123)
AA <- data.frame( data = rnorm(104) )
#set number of rows to split on
chunksize = 8
#split on create rowid's
l <- split( setDT(AA)[, rowID := (.I-1) %/% chunksize][], by = "rowID")
#names of the list will become the names of the data.frames
names(l) <- paste0( "df", names(l) )
#write the elements of the list to the global environment, using their names
list2env( l, envir = globalenv() )

select multiple ranges of columns in data.table using column names [duplicate]

This question already has answers here:
Select multiple ranges of columns using column names in data.table
(2 answers)
Closed 4 years ago.
I can select multiple ranges of columns in a data.table using a numeric vector like c(1:5,27:30). Is there any way to do the same with column names? For example, in some form similar to col1:col5,col27:col30?
You can with dplyr:
df <- data.frame(a=1, b=2, c=3, d=4, e=5, f=6, g=7)
dplyr::select(df, a:c, f:g)
a b c f g
1 2 3 6 7
I am not sure if my answer is efficient, but I think that could give you a workaround at least in case you need to work with data.table.
My proposal is to use data.table in conjunction with cbind. Thus you could have:
df <- data.frame(a=1, b=2, c=3, d=4, e=5, f=6, g=7)
multColSelectedByName<- cbind(df[,a:c],df[,f:g])
#a b c f g
#1: 1 2 3 6 7
One point that one should be careful is that if there is only one column in one of the selections, for example df[,f] then the name of this column would be something like V2 and not f. In such a case one could use:
multColSelectedByName<- cbind(df[,a:c],f=df[,f])

Sort data in a data frame and rank them [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame like this
Name Value
A. -5
B. 100
F. 0
G. -5
I want to sort the data in an ascending order and add a rank column. So I want something like this:
Name. Value. Rank
A. -5. 1
G. -5. 1
F. 0. 2
B. 100. 3
A base R solution could be:
v1 <- order(df$Value)
data.frame(df[v1, ], rank = as.numeric(factor(df$Value[v1])))
# Name Value rank
#1 A. -5 1
#4 G. -5 1
#3 F. 0 2
#2 B. 100 3
Sorting the dataframe with order and converting the sorted Value to factors and then numeric so that the Value with same value would get same rank.
This can be achieved easily with the dplyr package.
#Recreate the data
df <- read.table(text = "Name Value
A. -5
B. 100
F. 0
G. -5", header = TRUE)
library(dplyr)
df %>% arrange(Value) %>% mutate(Rank = dense_rank(Value))
The dplyr function reads take the data frame df, then arrange it by Value, then add a new column Rank which equals the dense ranking of Value.

Sorting Elements of a List by Characteristics of Those Elements [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Suppose I have a list of tables. One of the columns of each of the tables is named and contains a count of NA values. For example, object 'e' in the below example shows a frequency of 2 in its last column:
a <- seq(1:5)
b <- c(NA,1,2,4,NA)
c <- a %>% data.frame(.,b)
d <- table(c[1], useNA = "always")
e <- table(c[2], useNA = "always")
f <- list(d, e)
Is there a way to order the elements in the list based on the number of NAs in the NA column found in each table in the list? For example, in list f in the above, the NA column in element [1] (i.e. table d) indicates 0 NAs and the NA column in element [2] (i.e. table e) indicates 2 NAs. Since 2 > 0 an I would like to sort the elements of the list from most NAs to least NAs, the list would be reordered to list(e,d).
Based on the description, we can loop over the elements of the list ('f'), subset the elements with NA as names, order it decreasingly and use that index to reorder the 'f'.
f1 <- f[order(-sapply(f, function(x) x[is.na(names(x))]))]
identical(f1, list(e,d))
#[1] TRUE

Resources