I've got a dataset with the following structure:
df <- data.frame(mult=c(1,2,3,4),red=c(1,0.9,0.8,0.7),
result=c('value1','value2','value3','value4'))
that I'd like to display in a 3-D plot (x axis: mult, y axis: red, and the x-y points would be 'result') or multiple 2-D plots. Obviously the real DF has a lot more rows and combinations of mult&red.
Columns mult & red do not have values repeated. What I'd like is to reshape DF to DF1:
- 1 0.9 0.8 0.7
1 value1
2 value2
3 value3
4 .....
so essentially:
1) [mult] values stays as it is (column 1)
2) [red] values become the column names.
3) Each cross between 'mult' and 'red' is a value in
the new DF
My preference would be to do this with the reshape function, but other packages are fine too.
Thanks in advance, p.
Try
library(reshape2)
df1 <- transform(df, result=as.character(result),
red= factor(red, levels= unique(red)))
dcast(df1, mult~red, value.var='result', fill='')[-1]
# 1 0.9 0.8 0.7
#1 value1
#2 value2
#3 value3
#4 value4
Here is a way using tidyr
library(tidyr)
out = rev(spread(df[-1], red, result))
out[is.na(out)] = ''
#> out
# 1 0.9 0.8 0.7
#1 value1
#2 value2
#3 value3
#4 value4
Using reshape as you requested
df <- data.frame(mult=c(1,2,3,4),red=c(1,0.9,0.8,0.7),
result=c('value1','value2','value3','value4'))
df$result = as.character(df$result)
dfWide = reshape(data = df, idvar = "mult", timevar = "red", v.names = "result", direction = "wide")
rownames(dfWide) = dfWide$mult
dfWide$mult = NULL
colnames(dfWide) = gsub(pattern = "result.", replacement = "", colnames(dfWide) )
dfWide[is.na(dfWide)] = ''
dfWide
# 1 0.9 0.8 0.7
# 1 value1
# 2 value2
# 3 value3
# 4 value4
Related
I have this table (inputdf):
Number
Value
1
0.2
1
0.3
1
0.4
2
0.2
2
0.7
3
0.1
and I want to obtain this (outputdf):
Number1
Number2
Number3
0.2
0.2
0.1
0.3
0.7
NA
0.4
NA
NA
I have tried it by iterating with a for loop through the numbers in column 1, then subsetting the dataframe by that number but I have troubles to append the result to an output dataframe:
inputdf <- read.table("input.txt", sep="\t", header = TRUE)
outputdf <- data.frame()
i=1
total=3 ###user has to modify it
for(i in seq(1:total)) {
cat("Collecting values for number", i, "\n")
values <- subset(input, Number == i, select=c(Value))
cbind(outputdf, NewColumn= values, )
names(outputdf)[names(outputdf) == "NewColumn"] <- paste0("Number", i)
}
Any help or hint will be very wellcomed. Thanks in advance!
In the tidyverse, you can create an id for each element of the groups and then use tidyr::pivot_wider:
library(tidyverse)
dat %>%
group_by(Number) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = Number, names_prefix = "Number", values_from = "Value")
# A tibble: 3 × 4
n Number1 Number2 Number3
<int> <dbl> <dbl> <dbl>
1 1 0.2 0.2 0.1
2 2 0.3 0.7 NA
3 3 0.4 NA NA
in base R, same idea. Create the id column and then reshape to wide:
transform(dat, id = with(dat, ave(rep(1, nrow(dat)), Number, FUN = seq_along))) |>
reshape(direction = "wide", timevar = "Number")
So I have this big data set with 32 variables and I need to work with relative values of these variables using all possible subtractions among them. Ex. var1-var2...var1-var32; var3-var4...var3-var32, and so on. I'm new in R, so I would like to do this without going full manually on the process. I'm out of idea, other than doing all manually. Any help appreciated! Thanks!
Ex:
df_original
id
Var1
Var2
Var3
x
1
3
2
y
2
5
7
df_wanted
id
Var1
Var2
Var3
Var1-Var2
Var1-Var3
Var2-Var3
x
1
3
2
-2
-1
1
y
2
5
7
-3
-5
-2
You can do this combn which will create combination of columns taking 2 at a time. In combn you can apply a function to every combination where we can subtract the two columns from the dataframe and add the result as new columns.
cols <- grep('Var', names(df), value = TRUE)
new_df <- cbind(df, do.call(cbind, combn(cols, 2, function(x) {
setNames(data.frame(df[x[1]] - df[x[2]]), paste0(x, collapse = '-'))
}, simplify = FALSE)))
new_df
# id Var1 Var2 Var3 Var1-Var2 Var1-Var3 Var2-Var3
#1 x 1 3 2 -2 -1 1
#2 y 2 5 7 -3 -5 -2
data
df <- structure(list(id = c("x", "y"), Var1 = 1:2, Var2 = c(3L, 5L),
Var3 = c(2L, 7L)), class = "data.frame", row.names = c(NA, -2L))
I have a very large dataframe (around 100 rows, 200 columns). A subset of my data looks like this:
example <- data.frame("Station" = c("012", "013", "014"), "Value1" = c(145.23453, 1.022342, 0.4432),
"Value2" = c(2.1221213, 4445.2231412, 0.3333421), "Name" = c("ABC", "SDS", "EFG"))
I would like to round all numeric variables in my table with these conditions.
if x<1, then 1 sig fig
if 1<= x < 99, then 2 sig figs
if x>= 100, then 3 sig figs
I know to do something like this for a specific column:
example$Value1 <- ifelse(example$Value1 < 1, signif(example$Value1, 1), example$Value1)
but I'm not sure what to do for a large dataframe with a mix of numeric and character values.
Just put the ifelse into an lapply. To identify numeric columns use negate is.character in an sapply. You also could Vectorize a small replacement FUNction with all your desired conditions to use in the lapply, which might be convenient. However, note #GKi's comment, that your conditions are not complete.
nums <- sapply(example, is.numeric)
FUN <- Vectorize(function(x) {
if (x < 1) x <- signif(x, 1)
if (1 <= x & x < 99) x <- signif(x, 2)
if (x >= 100) x <- signif(x, 3)
x
})
example[nums] <- lapply(example[nums], FUN)
# Station Value1 Value2 Name
# 1 012 145.0 2.1 ABC
# 2 013 1.0 4450.0 SDS
# 3 014 0.4 0.3 EFG
CODE
example %>%
pivot_longer(contains("Value")) %>%
mutate(
signf = case_when(
value < 1 ~ 1,
value >= 1 & value < 99 ~ 2,
TRUE ~ 3
),
value = map2_dbl(value, signf, ~signif(.x, .y))
) %>%
select(-signf) %>%
pivot_wider(names_from = "name", values_from = "value")
OUTPUT
# A tibble: 3 x 4
Station Name Value1 Value2
<fct> <fct> <dbl> <dbl>
1 012 ABC 145 2.1
2 013 SDS 1 4450
3 014 EFG 0.4 0.3
I'll give the answer using data.table instead of data.frame because it's better and I don't remember data.frame syntax that well anymore.
library(data.table)
example = data.table(
Station = c("012", "013", "014"),
Value1 = c(145.23453, 1.022342, 0.4432),
Value2 = c(2.1221213, 4445.2231412, 0.3333421),
Name = c("ABC", "SDS", "EFG"))
numeric_colnames = names(example)[sapply(example,is.numeric)]
for(x in numeric_colnames){
example[,(x):=ifelse(
get(x)<1,
signif(get(x),1),
ifelse(
get(x)<99,
signif(get(x),2),
signif(get(x),3)
))]
}
Result:
Station Value1 Value2 Name
1: 012 145.0 2.1 ABC
2: 013 1.0 4450.0 SDS
3: 014 0.4 0.3 EFG
PS: Don't worry about the 145.0 and 4450.0; that's a display issue, not a data issue:
> example[,as.character(Value1)]
[1] "145" "1" "0.4"
> example[,as.character(Value2)]
[1] "2.1" "4450" "0.3"
PPS: the 99 cutoff produces some strange results, e.g.,
> signif(98.9,2)
[1] 99
> signif(99.1,3)
[1] 99.1
Why not use a cutoff of 100 instead?
> signif(99.4,2)
[1] 99
> signif(99.5,2)
[1] 100
> signif(100.1,3)
[1] 100
Use applyand nested ifelse:
If you do not know in advance which columns are numeric and you want to keep the original dataframe:
example[sapply(example, is.numeric)] <- apply(example[sapply(example, is.numeric)], 2,
function(x) ifelse(x < 1, signif(x, 1),
ifelse(x >= 1 & x < 99 , signif(x, 2), signif(x, 3))))
example
Station Value1 Value2 Name
1 012 145.0 2.1 ABC
2 013 1.0 4450.0 SDS
3 014 0.4 0.3 EFG
You can use findInterval to set signif:
i <- sapply(example, is.numeric)
x <- unlist(example[,i])
example[,i] <- signif(x, findInterval(x, c(1, 99))+1)
example
# Station Value1 Value2 Name
#1 012 145.0 2.1 ABC
#2 013 1.0 4450.0 SDS
#3 014 0.4 0.3 EFG
findIntervall result from #webb (Thanks!) example given in the comment:
findInterval(c(145.23453, 1.022342, 0.4432, 2.1221213, 4445.2231412
, 0.3333421), c(1, 99))
#[1] 2 1 0 1 2 0
I have this dataframe and I want to cross-compare all the values inside this data frame.
dat <- tibble::tibble(
name = c("a","b","c"),
value = c(1,2,3))
I want to compare all the row pairs inside this dataframe and in this case I want to divide the smaller number by the bigger number.
The final dataframe should look like this:
a,b,0.5
a,c,0.33
b,c,0.66
Is there a method to achieve this?
Using the data.table package, we can join dat with itself on the condition that one value is less than the other, and compute the ratio with the columns of the joined table.
library(data.table)
setDT(dat)
out <-
dat[dat, on = .(value < value),
.(name1 = x.name,
name2 = i.name,
ratio = x.value/i.value)]
out <- out[!is.na(ratio)]
out
# name1 name2 ratio
# 1: a b 0.5000000
# 2: a c 0.3333333
# 3: b c 0.6666667
One option would be
v1 <- setNames(dat$value, dat$name)
do.call(rbind, combn(v1, 2, FUN = function(x)
setNames(data.frame(as.list(names(x)), round(Reduce(`/`, x[order(x)]), 2)),
c("col1", "col2", "val")), simplify = FALSE))
# col1 col2 val
#1 a b 0.50
#2 a c 0.33
#3 b c 0.67
Or an option with fuzzyjoin (inspired from #IceCreamToucan's post)
library(fuzzyjoin)
fuzzy_inner_join(dat, dat, by = "name", match_fun = list(`<`)) %>%
transmute(col1 = name.x, col2 = name.y, val = value.x/value.y)
# A tibble: 3 x 3
# col1 col2 val
# <chr> <chr> <dbl>
#1 a b 0.5
#2 a c 0.333
#3 b c 0.667
We can use tidyverse:
library(tidyverse)
dat %>% expand(name, name) %>% cbind(expand(dat, value,value)) %>%
filter(value1>value) %>%
mutate(ratio=value/value1)
#> name name1 value value1 ratio
#> 1 a b 1 2 0.5000000
#> 2 a c 1 3 0.3333333
#> 3 b c 2 3 0.6666667
Or just a doodle in base r:
df <- cbind(expand.grid(dat$name,dat$name), expand.grid(dat$value, dat$value))
df <- df[order(df[,3], -df[,4]),]
df <- df[df[,3] < df[,4],]
df$ratio <- df[,3] / df[,4]
df[,-c(3,4)] -> df
df
#> Var1 Var2 ratio
#> 7 a c 0.3333333
#> 4 a b 0.5000000
#> 8 b c 0.6666667
I would like to ask the R community how to merge two rows with the same ID (i.e. same participant) with some variables that are identical and others where there are NA's. In my example, I would like all the values 4-5-6 to appear on one row and therefore for the NA's (or empty cells) to be gone.
I have tried using dplyr without much success, and I have to do the merging by hand (which is quite time consuming and increases the risk for errors). Thank you in advance for your help with this problem!
# Create sample data frame.
id <- c(rep('Participant 1', 2), rep('Participant 2', 2))
value1 <- rep('A', 4)
value2 <- rep('B', 4)
value3 <- rep('C', 4)
value4 <- c('x', NA, NA, 'x')
value5 <- c('x', NA, 'x', NA)
value6 <- c(NA, 'x', NA, 'x')
df <- data.frame(id, value1, value2, value3, value4, value5, value6, stringsAsFactors = F)
# Use dplyr to group the data and keep the non-NA value from the other columns.
df %>% group_by(id, value1, value2, value3) %>%
summarise(value4 = max(value4, na.rm = T),
value5 = max(value5, na.rm = T),
value6 = max(value6, na.rm = T))
Another solution with dplyr and tidyr:
library(dplyr)
library(tidyr)
DF %>%
gather(var, val, Value4:Value6) %>%
filter(!is.na(val)) %>%
spread(var, val)
using the data of #G.Grothendieck, this results in:
ID Value1 Value2 Value3 Value4 Value5 Value6
1 1 A B C x x x
2 2 A B C x x x
Or another variation with summarise_each with the max approach of #G.Grothendieck:
DF %>%
group_by(ID, Value1, Value2, Value3) %>%
summarise_each(funs(max(., na.rm = TRUE)))
The gather and spread options can also be translated into a solution with reshape2:
library(reshape2)
dcast(na.omit(melt(DF, id.vars = c('ID','Value1','Value2','Value3'))),
ID + Value1 + Value2 + Value3 ~ variable,
value.var = 'value')
1) Using DF defined in the Note below try aggregating using the compress function defined below. This function removes NA values and appends an NA just in case all values were removed and then takes the first of what is left. No packages are used.
compress <- function(x) c(na.omit(x), NA)[1]
aggregate(DF[5:7], DF[1:4], compress)
giving:
ID Value1 Value2 Value3 Value4 Value5 Value6
1 1 A B C x x x
2 2 A B C x x x
2) A simpler alternative if no participant has all NA values in any column is that we could eliminate the definition of compress and use max with na.rm = TRUE instead like this:
aggregate(DF[5:7], DF[1:4], max, na.rm = TRUE)
Note: The input in reproducible form:
Lines <- "ID Value1 Value2 Value3 Value4 Value5 Value6
1 A B C x x NA
1 A B C NA NA x
2 A B C NA x NA
2 A B C x NA x"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
If you prefer to use dplyr try:
library(dplyr)
DF %>%
group_by(ID, Value1, Value2, Value3) %>%
summarise_each(funs(toString(na.omit(.))))
Result:
ID Value1 Value2 Value3 Value4 Value5 Value6
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 A B C x x x
2 2 A B C x x x
Note:
DF as defined by G. Grothendieck https://stackoverflow.com/a/40820313/5727278
This builds off of docendo discimus's https://stackoverflow.com/a/27289383/5727278