I have an example data frame as such:
df_1 <- as.data.frame(cbind(c(14, 27, 38), c(25, 33, 52), c(85, 12, 23)))
Now, I want to split all these columns down the middle so that i get something that would look like this:
df_2 <- as.data.frame(cbind(c(1, 2, 3), c(4, 7, 8), c(2,3,5), c(5, 3, 2), c(8, 1, 2), c(5, 2, 3)))
So my question then is: Is there a command/package that can do this automatically?
In my real data frame I am looking to split columns by name, from an earlier regression where i got the names by inserting:
paste0(names(df)[i], "~", names(df)[j]) into my loop.
My thought, however, is that this will be quite easy once i find the right command for the data frames given above.
Thanks in advance!
You can use strsplit in base R:
as.data.frame(t(apply(df_1, 1, \(x) as.numeric(unlist(strsplit(as.character(x), ""))))))
V1 V2 V3 V4 V5 V6
1 1 4 2 5 8 5
2 2 7 3 3 1 2
3 3 8 5 2 2 3
Another possible solution:
library(tidyverse)
map(df_1, ~ str_split(.x, "", simplify = T)) %>% as.data.frame %>%
`names<-`(str_c("V", 1:ncol(.))) %>% type.convert(as.is = T)
#> V1 V2 V3 V4 V5 V6
#> 1 1 4 2 5 8 5
#> 2 2 7 3 3 1 2
#> 3 3 8 5 2 2 3
Thanks for the answers, they were a lot of help!
I ended up using the tidyr package with command:
test <- as.data.frame(separate(data = test, col = "V1", into = c("col_1", "col_2"), sep = "\\~"))
This worked great for me since I ran a regression earlier and had a good operator for separation: "~"
A base R, option would be to use read.fwf
v1 <- do.call(paste0, df_1)
read.fwf(textConnection(v1), widths = rep(1, max(nchar(v1))))
-output
V1 V2 V3 V4 V5 V6
1 1 4 2 5 8 5
2 2 7 3 3 1 2
3 3 8 5 2 2 3
Another option is to use the splitstackshape package:
df_2 <- df_1 %>%
splitstackshape::cSplit(., names(.), sep = "", stripWhite = F, type.convert = F) %>%
setnames(paste0("V", 1:ncol(.)))
Output
df_2
V1 V2 V3 V4 V5 V6
1: 1 4 2 5 8 5
2: 2 7 3 3 1 2
3: 3 8 5 2 2 3
Related
I would like to remove part of a string from a V2 column in a df.
df
V1 V2
3 scale_KD_1
10 scale_KD_5
4 scale_KD_10
7 scale_KD_7
The desired outcome would be:
df
V1 V2
3 1
10 5
4 10
7 7
Using readr and stringr packages:
library(readr)
df %>% mutate(V2 = parse_number(V2))
V1 V2
1 3 1
2 10 5
3 4 10
4 7 7
library(stringr)
df %>% mutate(V2 = str_remove(V2, '.*_'))
V1 V2
1 3 1
2 10 5
3 4 10
4 7 7
There are many ways to accomplish this. Just check which one is faster. Besides the ones mentioned by #Karthik S, you can try these ones:
library(dplyr)
library(stringr)
df %>%
mutate(V2 = str_extract(V2, '\\d+$'))
df %>%
mutate(V2 = str_remove(V2, '\\D+'))
V1 V2
1 3 1
2 10 5
3 4 10
4 7 7
You can use sub to remove everything until _:
df$V2 <- sub(".*_", "", df$V2)
#df$V2 <- sub("\\D*", "", df$V2) #Some Alternatives
#df$V2 <- sub("[^[:digit:]]*", "", df$V2)
df
# V1 V2
#1 3 1
#2 10 5
#3 4 10
#4 7 7
Data:
df <- read.table(header=T, text=" V1 V2
3 scale_KD_1
10 scale_KD_5
4 scale_KD_10
7 scale_KD_7")
I have two dataframes:
df1 <- data.frame( v1 = c(1,2,3,4),
v2 = c(2, 10, 5, 11),
v3=c(20, 25, 23, 2))
> df1
v1 v2 v3
1 1 2 20
2 2 10 35
3 3 5 23
4 4 11 2
df2 <- data.frame(v1 = 4, = 10, v3 = 30)
> df2
v1 v2 v3
1 4 10 30
I want to add a new column that would say "Fail" when df1 is larger than df2 and "Pass" when it is smaller so that the intended results would be:
> df3
v1 v2 v3 check
1 1 2 20 Pass
2 2 10 35 Fail
3 3 5 23 Pass
4 4 11 2 Fail
You can make size of both the dataframes similar and directly compare :
ifelse(rowSums(df1 >= df2[rep(1,length.out = nrow(df1)), ]) == 0, 'Pass', 'Fail')
#[1] "Pass" "Fail" "Pass" "Fail"
Or using Map :
ifelse(Reduce(`|`, Map(`>=`, df1, df2)), 'Fail', 'Pass')
#Other similar alternatives :
#c('Pass', 'Fail')[Reduce(`|`, Map(`>=`, df1[-1], df2[-1])) + 1]
#c('Fail', 'Pass')[(rowSums(mapply(`>=`, df1, df2)) == 0) + 1]
In tidyverse, we can make use of c_across
library(dplyr) # >= 1.0.0
df1 %>%
rowwise %>%
mutate(check = c('Pass', 'Fail')[1 + any(c_across(everything()) >= df2)])
# A tibble: 4 x 4
# Rowwise:
# v1 v2 v3 check
# <dbl> <dbl> <dbl> <chr>
#1 1 2 20 Pass
#2 2 10 25 Fail
#3 3 5 23 Pass
#4 4 11 2 Fail
I know that my problem is trival, however now I'm learing methods how to reshape data in different ways, so please be understanding.
I have data like this:
Input = (
'col1 col2
A 2
B 4
A 7
B 3
A 4
B 2
A 4
B 6
A 3
B 3')
df = read.table(textConnection(Input), header = T)
> df
col1 col2
1 A 2
2 B 4
3 A 7
4 B 3
5 A 4
6 B 2
7 A 4
8 B 6
9 A 3
10 B 3
And I'd like to have something like this, where the column names are not important:
col1 v1 v2 v3 v4 v5
1 A 2 7 4 4 3
2 B 4 3 2 6 3
So far, I did something like:
res_1 <- aggregate(col2 ~., df, toString)
col1 col2
1 A 2, 7, 4, 4, 3
2 B 4, 3, 2, 6, 3
And it actually works, however, I have one column and valiues are comma separated, instead of being in new columns, so I decided to fix it up:
res_2 <- do.call("rbind", strsplit(res_1$col2, ","))
[,1] [,2] [,3] [,4] [,5]
[1,] "2" " 7" " 4" " 4" " 3"
[2,] "4" " 3" " 2" " 6" " 3"
Adn finally combine it and remove unnecessary columns:
final <- cbind(res_1,res_2)
final$col2 <- NULL
col1 1 2 3 4 5
1 A 2 7 4 4 3
2 B 4 3 2 6 3
So I have my desired output, but I'm not satisfied about the method, I'm sure there's one easy and short command for this. As I said I'd like to learn new more elegant options using different packages.
Thanks!
You can simply do,
do.call(rbind, split(df$col2, df$col1))
# [,1] [,2] [,3] [,4] [,5]
#A 2 7 4 4 3
#B 4 3 2 6 3
You can wrap it to data.frame() to convert from matrix to df
The question is tagged with reshape2 and reshape so we show the use of that package and the base reshape function. Also the use of dplyr/tidyr is illustrated. Finally we show a data.table solution and a second base R solution using xtabs.
reshape2 Add a group column and then convert from long to wide form:
library(reshape2)
df2 <- transform(df, group = paste0("v", ave(1:nrow(df), col1, FUN = seq_along)))
dcast(df2, col1 ~ group, value.var = "col2")
giving:
col1 v1 v2 v3 v4 v5
1 A 2 7 4 4 3
2 B 4 3 2 6 3
2) reshape Using df2 from (1) we have the following base R solution using the reshape function:
wide <- reshape(df2, dir = "wide", idvar = "col1", timevar = "group")
names(wide) <- sub(".*\\.", "", names(wide))
wide
giving:
col1 v1 v2 v3 v4 v5
1 A 2 7 4 4 3
2 B 4 3 2 6 3
3) dplyr/tidyr
library(dplyr)
library(tidyr)
df %>%
group_by(col1) %>%
mutate(group = paste0("v", row_number())) %>%
ungroup %>%
pivot_wider(names_from = "group", values_from = "col2")
giving:
# A tibble: 2 x 6
col1 v1 v2 v3 v4 v5
<fct> <int> <int> <int> <int> <int>
1 A 2 7 4 4 3
2 B 4 3 2 6 3
4) data.table
library(data.table)
as.data.table(df)[, as.list(col2), by = col1]
giving:
col1 V1 V2 V3 V4 V5
1: A 2 7 4 4 3
2: B 4 3 2 6 3
5) xtabs Another base R solution uses df2 from (1) and xtabs. This produces an object of class c("xtabs", "table")`. Note that it labels the dimensions.
xtabs(col2 ~., df2)
giving:
group
col1 v1 v2 v3 v4 v5
A 2 7 4 4 3
B 4 3 2 6 3
I have a data frame like so:
df <- data.frame(
id = c(1, 1, 2, 2),
V1 = c(1:4),
V2 = c(5:8),
V3 = c(9:12))
Printed to the console it looks like this:
# id V1 V2 V3
# 1 1 1 5 9
# 2 1 2 6 10
# 3 2 3 7 11
# 4 2 4 8 12
Now, I would like to transform it to this shape:
# id V1 V2 V3 V4 V5 V6
# 1 1 1 5 9 2 6 10
# 2 2 3 7 11 4 8 12
How can I do this with base R or the tidyverse?
a possible tidyverse solution
wide <- df %>%
group_by(id) %>%
mutate(obs = row_number()) %>%
gather(var, val, V1:V3) %>%
unite(comb, obs, var) %>%
spread(comb, val)
colnames(wide)[-1] <- paste("V", seq(1,ncol(wide) -1), sep = "")
# A tibble: 2 x 7
# Groups: id [2]
# id V1 V2 V3 V4 V5 V6
#1 1 1 5 9 2 6 10
#2 2 3 7 11 4 8 12
You could do it with e.g. using by.
df2 <- do.call(rbind,
by(df, df$id, function(x) c(x[1, "id"], as.vector(t(x[names(x) != "id"]))))
)
colnames(df2) <- c("id", paste0("V", seq(ncol(df2)-1)))
id V1 V2 V3 V4 V5 V6
1 1 1 5 9 2 6 10
2 2 3 7 11 4 8 12
Base R:
lists <- Map(function(x) data.frame(c(x[1,], x[2,-1])), split(df, df$id))
df2 <- do.call(rbind, lists)
To change the column names:
colnames(df2) <- c("id", paste0("V", seq_along(df2[-1])))
And the result:
# > df2
# id V1 V2 V3 V4 V5 V6
# 1 1 1 5 9 2 6 10
# 2 2 3 7 11 4 8 12
I was told that passing equation as strings and evaluating them is bad practice. How can I still create a function which takes an equation and evaluates it without the string version and without using third party packages?
This is my function:
replaceFormula <- function(df, column, formula){
df[column] <- eval(parse(text=formula), df)
return(df)
}
This is my use case:
set.seed(24)
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
df <- as.data.frame(dataset)
replaceFormula(df, 'V5', 'V3+V4')
Update:
Is this also possible with conditions?
My example function:
replaceFactor <- function(df, column, condition, what){
df[column] <- sapply(df[column],function(x) ifelse(eval(parse(text=condition), df), what, x))
return(df)
}
My usecase:
set.seed(24)
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
df <- as.data.frame(dataset)
replaceFactor(df, 'V5', 'V1==1', 'GOOD')
It looks like you've crafted yourself a kludgey version of transform
> set.seed(24)
> dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
> df <- as.data.frame(dataset)
> transform(df, V5 = V3 + V4)
V1 V2 V3 V4 V5
1 1 5 3 5 8
2 1 1 2 1 3
3 4 4 4 NA NA
4 3 4 4 3 7
5 3 1 1 NA NA
We can pass the formula as a quosure and evaluate it by unquoting (!! or UQ) in the devel version of dplyr (or soon to be released 0.6.0)
library(dplyr)
replaceFormula <- function(dat, Col, form){
Col <- quo_name(enquo(Col))
dat %>%
mutate(UQ(Col) := UQ(form))
}
replaceFormula(df, V5, quo(V3 + V4))
# V1 V2 V3 V4 V5
#1 1 5 3 5 8
#2 1 1 2 1 3
#3 4 4 4 NA NA
#4 3 4 4 3 7
#5 3 1 1 NA NA
Update
Based on the OP's comments, we can also pass an expression to evaluate and change the values based on that
replaceFormulaNew <- function(dat, Col, form, what){
Col <- enquo(Col)
ColN <- quo_name(Col)
what <- quo_name(enquo(what))
dat %>%
mutate(UQ(ColN) := ifelse(UQ(form), what, UQ(Col)))
}
replaceFormulaNew(df, V5, quo(V1==1), GOOD)
# V1 V2 V3 V4 V5
#1 1 5 3 5 GOOD
#2 1 1 2 1 GOOD
#3 4 4 4 NA 4
#4 3 4 4 3 <NA>
#5 3 1 1 NA 1
replaceFormulaNew(df, V5, quo(V3 < V4), GOOD)
# V1 V2 V3 V4 V5
#1 1 5 3 5 GOOD
#2 1 1 2 1 3
#3 4 4 4 NA <NA>
#4 3 4 4 3 <NA>
#5 3 1 1 NA <NA>
The enquo takes the input argument and convert it to quosure while quo_name converts it to string for evaluation in mutate to assign the evaluated output to the column specified in the input