I am new in R.
I have hundreds of data frames like this
ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
AA ABCD 0.09 0.67 0.10 0.14
AB ABCE 0.04 0.85 0.04 0.06
AC ABCG 0.43 0.21 0.54 0.14
AD ABCF 0.16 0.62 0.25 0.97
AF ABCJ 0.59 0.37 0.66 0.07
This is just an example. The number and names of the Ratio_ columns are different between data frames, but all of them start with Ratio_. I want to apply a function (for example, log(x)), to the Ratio_ columns without specify the column number or the whole name.
I know how to do it df by df, for the one in the example:
A <- function(x) log(x)
df_log<-data.frame(df[1:2], lapply(df[3:6], A))
but I have a lot of them, and as I said the number of columns is different in each.
Any suggestion?
Thanks
Place the datasets in a list and then loop over the list elements
lapply(lst, function(x) {i1 <- grep("^Ratio_", names(x));
x[i1] <- lapply(x[i1], A)
x})
NOTE: No external packages are used.
data
lst <- mget(paste0("df", 1:100))
This type of problem is very easily dealt with using the dplyr package. For example,
df <- read.table(text = 'ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
AA ABCD 0.09 0.67 0.10 0.14
AB ABCE 0.04 0.85 0.04 0.06
AC ABCG 0.43 0.21 0.54 0.14
AD ABCF 0.16 0.62 0.25 0.97
AF ABCJ 0.59 0.37 0.66 0.07',
header = TRUE)
library(dplyr)
df_transformed <- mutate_each(df, funs(log(.)), starts_with("Ratio_"))
df_transformed
# > df_transformed
# ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
# 1 AA ABCD -2.4079456 -0.4004776 -2.3025851 -1.96611286
# 2 AB ABCE -3.2188758 -0.1625189 -3.2188758 -2.81341072
# 3 AC ABCG -0.8439701 -1.5606477 -0.6161861 -1.96611286
# 4 AD ABCF -1.8325815 -0.4780358 -1.3862944 -0.03045921
# 5 AF ABCJ -0.5276327 -0.9942523 -0.4155154 -2.65926004
Related
My R df looks like this:
ID CO.RT CO.ER SC.RT SC.ER
1 0.19 0.06 1.24 0.09
2 0.61 0.01 0.63 0.03
3 0.43 0.02 1.31 0.09
I've been trying to find a way to use tidyr::pivot_longer to achieve the following:
ID Type RT ER
1 CO 0.19 0.06
1 SC 1.24 0.09
2 CO 0.61 0.01
2 SC 0.63 0.03
3 CO 0.43 0.02
3 SC 1.31 0.09
My issue: I can only pivot RT and RT into a single "score"-column—I fail to find the right regex(?) pattern to pivot my df in the way shown above.
Here is what I tried:
df %>% pivot_longer(cols = c(CO.RT:SC.ER),
names_pattern = "(.+).(.+)",
names_to=c("Type", ".value"))
There are many pivot_longer/names_to questions out there, however, I couldn't find the right one for my problem. Can anyone help?
My dataset:
df <- tibble(
ID = c(1,2,3),
CO.RT = c(0.19,0.61,0.43),
CO.ER = c(0.06,0.01,0.02),
SC.RT = c(1.24,0.63,1.31),
SC.ER = c(0.09,0.03,0.09)
)
Imagine there are 4 cards on the desk and there are several rows of them (e.g., 5 rows in the demo). The value of each card is already listed in the demo data frame. However, the exact position of the card is indexed by the pos columns, see the demo data I generated below.
To achieve this, I swap the cards with the [] function across the rows to switch the cards' values back to their original position. The following code already fulfills such a purpose. To avoid explicit usage of the loop, I wonder whether I can achieve a similar effect if I use the vectorization function with packages from tidyverse family, e.g. pmap or related function within the package purrr?
# 1. data generation ------------------------------------------------------
rm(list=ls())
vect<-matrix(round(runif(20),2),nrow=5)
colnames(vect)<-paste0('card',1:4)
order<-rbind(c(2,3,4,1),c(3,4,1,2),c(1,2,3,4),c(4,3,2,1),c(3,4,2,1))
colnames(order)=paste0('pos',1:4)
dat<-data.frame(vect,order,stringsAsFactors = F)
# 2. data swap ------------------------------------------------------------
for (i in 1:dim(dat)[1]){
orders=dat[i,paste0('pos',1:4)]
card=dat[i,paste0('card',1:4)]
vec<-card[order(unlist(orders))]
names(vec)=paste0('deck',1:4)
dat[i,paste0('deck',1:4)]<-vec
}
dat
You could use pmap_dfr :
card_cols <- grep('card', names(dat))
pos_cols <- grep('pos', names(dat))
dat[paste0('deck', seq_along(card_cols))] <- purrr::pmap_dfr(dat, ~{
x <- c(...)
as.data.frame(t(unname(x[card_cols][order(x[pos_cols])])))
})
dat
# card1 card2 card3 card4 pos1 pos2 pos3 pos4 deck1 deck2 deck3 deck4
#1 0.05 0.07 0.16 0.86 2 3 4 1 0.86 0.05 0.07 0.16
#2 0.20 0.98 0.79 0.72 3 4 1 2 0.79 0.72 0.20 0.98
#3 0.50 0.79 0.72 0.10 1 2 3 4 0.50 0.79 0.72 0.10
#4 0.03 0.98 0.48 0.06 4 3 2 1 0.06 0.48 0.98 0.03
#5 0.41 0.72 0.91 0.84 3 4 2 1 0.84 0.91 0.41 0.72
One thing to note here is to make sure that the output from pmap function does not have original names of the columns. If they have the original names, it would reshuffle the columns according to the names and output would not be in correct order. I use unname here to remove the names.
I'm new with R and I have tried a lot to solve this problem, if anyone could help me I'd be very grateful! This is my problem:
I have two data frame (df1 and df2) and what I need is to multiply each value of df1 by a row of df2 searched by id. This is an example of what I'm looking for:
df1<-data.frame(ID=c(1,2,3), x1=c(6,3,2), x2=c(2,3,1), x3=c(4,10,7))
df1
df2<-data.frame(ID=c(1,2,3), y1=c(0.01,0.02,0.05), y2=c(0.2,0.03,0.11), y3=c(0.3,0.09,0.07))
df2
#Example of what I need
df1xdf2<- data.frame(ID=c(1,2,3), r1=c(0.06,0.06,0.1), r2=c(1.2,0.09,0.22), r3=c(1.8,0.27,0.14),
r4=c(0.02,0.06,0.05),r5=c(0.4,0.09,0.11),r6=c(0.6,0.27,0.07),r7=c(0.04,0.2,0.35),r8=c(0.8,0.3,0.77),r9=c(1.2,0.9,0.49))
df1xdf2
I've tried with loops by row and column but I only get a 1x1 multiplication.
My dataframes have same number of rows, columns and factor names. My real life dataframes are much larger, both rows and columns.
Does anyone know how to solve it?
You could use lapply to multiply every column of df1 with complete df2. We can cbind the dataframes together and rename the columns
output <- do.call(cbind, lapply(df1[-1], `*`, df2[-1]))
cbind(df1[1], setNames(output, paste0("r", seq_along(output))))
# ID r1 r2 r3 r4 r5 r6 r7 r8 r9
#1 1 0.06 1.20 1.80 0.02 0.40 0.60 0.04 0.80 1.20
#2 2 0.06 0.09 0.27 0.06 0.09 0.27 0.20 0.30 0.90
#3 3 0.10 0.22 0.14 0.05 0.11 0.07 0.35 0.77 0.49
You could use the dplyr package
#Example with dplyr
require(dplyr)
# First we use merge() to join both DF
result <- merge(df1, df2, by = "ID") %>%
mutate(r1 = x1*y1,
r2 = x1*y2,
r3 = etc.)
within mutate() you can specify your new column formulas and names
An option with map
library(tidyverse)
bind_cols(df1[1], map_dfc(df1[-1], `*`, df2[-1]))
Or in base R by replicating the columns and multiplying
out <- cbind(df1[1], df1[-1][rep(seq_along(df1[-1]), each = 3)] *
df2[-1][rep(seq_along(df2[-1]), 3)])
names(out)[-1] <- paste0("r", seq_along(out[-1]))
out
# ID r1 r2 r3 r4 r5 r6 r7 r8 r9
#1 1 0.06 1.20 1.80 0.02 0.40 0.60 0.04 0.80 1.20
#2 2 0.06 0.09 0.27 0.06 0.09 0.27 0.20 0.30 0.90
#3 3 0.10 0.22 0.14 0.05 0.11 0.07 0.35 0.77 0.49
how can I compute the mean R, R1, R2, R3 values from the rows sharing the same lon,lat field? I'm sure this questions exists multiple times but I could not easily find it.
lon lat length depth R R1 R2 R3
1 147.5348 -35.32395 13709 1 0.67 0.80 0.84 0.83
2 147.5348 -35.32395 13709 2 0.47 0.48 0.56 0.54
3 147.5348 -35.32395 13709 3 0.43 0.29 0.36 0.34
4 147.4290 -35.27202 12652 1 0.46 0.61 0.60 0.58
5 147.4290 -35.27202 12652 2 0.73 0.96 0.95 0.95
6 147.4290 -35.27202 12652 3 0.77 0.92 0.92 0.91
I'd recommend using the split-apply-combine strategy, where you're splitting by BOTH lon and lat, applying mean to each group, then recombining into a single data frame.
I'd recommend using dplyr:
library(dplyr)
mydata %>%
group_by(lon, lat) %>%
summarize(
mean_r = mean(R)
, mean_r1 = mean(R1)
, mean_r2 = mean(R2)
, mean_r3 = mean(R3)
)
I would like to reshape my data based in unique string in a "Bull" column (all data frame):
EBV Bulls
0.13 NE001362
0.17 NE001361
0.05 NE001378
-0.12 NE001359
-0.14 NE001379
0.13 NE001380
-0.46 NE001379
-0.46 NE001359
-0.68 NE001394
0.28 NE001391
0.84 NE001394
-0.43 NE001393
-0.18 NE001707
My expected output:
NE001362 NE001361 NE001378 NE001359 NE001379 NE001380 NE001394 NE001391 NE001393 NE001707
0.13 0.17 0.05 -0.12 -0.14 0.13 -0.68 0.28 -0.43 -0.18
-0.46 -0.46 0.84
I tried dat2 <- dcast(all, EBV~variable, value.var = "Bulls") but do not works.
You have two options. Indexing the multiple occurrences for each level of Bulls or using a list to hold the different levels of EBV.
Option 1: Indexing multiple occurrences
You can use data.table to generate an index that numbers multiple occurrences of EBV:
require(data.table)
setDT(all) ## convert to data.table
all[, index:=1:.N, by=Bulls] ## generate index
dcast.data.table(all, formula=index ~ Bulls, value.var='EBV')
Option 2: Using a list to store multiple values
You could use a list as a value with data.table (I'm not sure if plain data.frame supports it).
require(data.table)
setDT(all) ## convert to data.table
all[, list(list(EBV)), by=Bulls] ## multiple values stored as list
Just to make sure that base R gets some acknowledgement:
## Add an ID, like ilir did, but with base R functions
mydf$ID <- with(mydf, ave(rep(1, nrow(mydf)), Bulls, FUN = seq_along))
Here's reshape:
reshape(mydf, direction = "wide", idvar="ID", timevar="Bulls")
# ID EBV.NE001362 EBV.NE001361 EBV.NE001378 EBV.NE001359 EBV.NE001379
# 1 1 0.13 0.17 0.05 -0.12 -0.14
# 7 2 NA NA NA -0.46 -0.46
# EBV.NE001380 EBV.NE001394 EBV.NE001391 EBV.NE001393 EBV.NE001707
# 1 0.13 -0.68 0.28 -0.43 -0.18
# 7 NA 0.84 NA NA NA
And xtabs. Note: This is a table-like matrix, so if you want a data.frame, you'll have to use as.data.frame.matrix on the output.
xtabs(EBV ~ ID + Bulls, mydf)
# Bulls
# ID NE001359 NE001361 NE001362 NE001378 NE001379 NE001380 NE001391
# 1 -0.12 0.17 0.13 0.05 -0.14 0.13 0.28
# 2 -0.46 0.00 0.00 0.00 -0.46 0.00 0.00
# Bulls
# ID NE001393 NE001394 NE001707
# 1 -0.43 -0.68 -0.18
# 2 0.00 0.84 0.00