loop to create a dataframe with new column, then combine them together

loop to create a dataframe with new column, then combine them together - r

I want to duplicate my dataset on different flight altitude levels. I can do it manually creating dataframes with differing levels of altitude and then rbind them together. But, i want to make it faster by involving a for loop?
this is the example dataset:
structure(list(heading = c(0L, 71L, 132L, 143L, 78L, 125L, 0L,
171L, 165L, 159L), thermal = c(1.25823300871478, 1.2972715238927,
1.65348398199965, 2.04165937130312, 1.496194948775, 1.70668245624966,
1.32775326817617, 1.37003605552932, 1.85841102388127, 1.20642577473389
), WS = c(17.1590022110329, 7.60663206413036, 16.3515501561529,
15.8336908137001, 7.11013207359218, 8.69420768960291, 5.23228331387401,
10.2762569508197, 3.79321542059933, 4.80008774506314), trackId = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("ke1601", "ke1607",
"mwb1501", "mwb1502", "mwb1503", "mwb1504", "nsm1605", "rcees17110",
"rcees17111", "X27230893", "X27231081", "X27233186", "X27234135",
"X52409530"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
I was coding manually like this:
msl100 <- df %>% mutate(alt = 100)
msl200 <- df %>% mutate(alt = 200)
msl300 <- df %>% mutate(alt = 300)
msl400 <- df %>% mutate(alt = 400)
msl500 <- df %>% mutate(alt = 500)
df1 <- rbind(msl100, .........)
I need to do this for every 100 meters up to height 5100 meters.

This can be done purely through a cbind as the rows of the original data will repeat:
cbind(dat, alt=rep(seq(100,5100,100), each=nrow(dat)))
This should be much faster than looping over values.

Consider a cross join merge:
expanded_df <- merge(df, data.frame(alt=seq(100, 5100, 100)), by = NULL)

Create a sequence, use lapply to loop over it transform to add new column and rbind
do.call(rbind, lapply(seq(100, 5100, 100), function(x) transform(df, alt = x)))
# heading thermal WS trackId alt
#1 0 1.258233 17.159002 mwb1501 100
#2 71 1.297272 7.606632 mwb1501 100
#3 132 1.653484 16.351550 mwb1501 100
#4 143 2.041659 15.833691 mwb1501 100
#5 78 1.496195 7.110132 mwb1501 100
#6 125 1.706682 8.694208 mwb1501 100
#7 0 1.327753 5.232283 mwb1501 100
#8 171 1.370036 10.276257 mwb1501 100
#9 165 1.858411 3.793215 mwb1501 100
#10 159 1.206426 4.800088 mwb1501 100
#11 0 1.258233 17.159002 mwb1501 200
#12 71 1.297272 7.606632 mwb1501 200
#....
Using tidyverse that would be
library(dplyr)
library(purrr)
map_df(seq(100, 5100, 100), ~df %>% mutate(alt = .x))

We can use crossing from the tidyr package.
library(dplyr)
library(tidyr)
df2 <- crossing(df, tibble(alt = seq(100, 5100, 100)))
If the order is important, create an ID column, arrage it, and then delete it.
df3 <- df %>%
mutate(ID = 1:n()) %>%
crossing(tibble(alt = seq(100, 5100, 100))) %>%
arrange(alt, ID) %>%
select(-ID)

Another (fast) data.table-based alternative would be to do
library(data.table)
setDT(df)[, .(alt = seq(100, 5100, 100)), by = names(df)]
# heading thermal WS trackId alt
# 1: 0 1.258233 17.159002 mwb1501 100
# 2: 0 1.258233 17.159002 mwb1501 200
# 3: 0 1.258233 17.159002 mwb1501 300
# 4: 0 1.258233 17.159002 mwb1501 400
# 5: 0 1.258233 17.159002 mwb1501 500
#---
#506: 159 1.206426 4.800088 mwb1501 4700
#507: 159 1.206426 4.800088 mwb1501 4800
#508: 159 1.206426 4.800088 mwb1501 4900
#509: 159 1.206426 4.800088 mwb1501 5000
#510: 159 1.206426 4.800088 mwb1501 5100

Related

Subtracting rows in R based on matching value

I am trying to substract two rows in my dataset from each other:
Name Period Time Distance Load
Tim A 01:06:20 6000 680
Max A 01:06:20 5000 600
Leo A 01:06:20 5500 640
Noa A 01:06:20 6500 700
Tim B 00:04:10 500 80
Max B 00:04:10 500 50
Leo B 00:04:10 400 40
I want to subtract the Time, Distance and Load values of Period B from Period A for matching Names.
eg. Subtract row 5 (Tim, Period B) from row 1 (Tim, Period A)
The new values should be written into a new table looking like this:
Name Period Time Distance Load
Tim C 01:02:10 5500 600
Max C 01:02:10 4500 550
Leo C 01:02:10 5100 600
Noa C 01:06:20 6500 700
The real dataset contains many more rows. I tried to play around with dplyr but could not get the result I am looking for.
Thanks in advance

There are so many answers already that this is just a bit of fun at this stage. I think this way is nice as it uses unnest_wider():
library(dplyr)
library(tidyr)
library(purrr)
diff <- function(data) {
if(apply(data[2, -1], 1, function(x) all(is.na(x)))) {
data[1, -1]
} else {
data[1, -1] - data[2, -1]
}
}
df %>% group_by(Name) %>% nest() %>%
mutate(diff = map(data, diff)) %>% unnest_wider(diff) %>%
mutate(Period = "C") %>% select(Period, Time, Distance, Load)
# A tibble: 4 x 5
Name Period Time Distance Load
<chr> <chr> <time> <dbl> <dbl>
1 Tim C 01:02:10 5500 600
2 Max C 01:02:10 4500 550
3 Leo C 01:02:10 5100 600
4 Noa C 01:06:20 6500 700
Apart from the diff() function (which can probably be made neater and 'exclusively' tidyverse), this way is also shorter.
DATA
library(readr)
# courtesy of #MartinGal
df <- read_table2("Name Period Time Distance Load
Tim A 01:06:20 6000 680
Max A 01:06:20 5000 600
Leo A 01:06:20 5500 640
Noa A 01:06:20 6500 700
Tim B 00:04:10 500 80
Max B 00:04:10 500 50
Leo B 00:04:10 400 40")

You could filter on the two periods and then join them together, thus facilitating the subtraction of columns.
library(dplyr)
inner_join(filter(df, Period=="A"), filter(df, Period=="B"), by="Name") %>%
mutate(Period="C",
Time=Time.x-Time.y,
Distance=Distance.x-Distance.y,
Load=Load.x-Load.y) %>%
select(Name, Period, Time, Distance, Load)
Name Period Time Distance Load
1 Tim C 1.036111 hours 5500 600
2 Max C 1.036111 hours 4500 550
3 Leo C 1.036111 hours 5100 600

It's basically the same idea as #Edward. You could use dplyr and tidyr:
df %>%
pivot_wider(names_from="Period", values_from=c("Time", "Distance", "Load")) %>%
mutate(Period = "C",
Time = coalesce(Time_A - Time_B, Time_A),
Distance = coalesce(Distance_A - Distance_B, Distance_A),
Load = coalesce(Load_A - Load_B, Load_A)
) %>%
select(-matches("_\\w"))
returns
# A tibble: 4 x 5
Name Period Time Distance Load
<chr> <chr> <time> <dbl> <dbl>
1 Tim C 01:02:10 5500 600
2 Max C 01:02:10 4500 550
3 Leo C 01:02:10 5100 600
4 Noa C 01:06:20 6500 700
Data
df <- read_table2("Name Period Time Distance Load
Tim A 01:06:20 6000 680
Max A 01:06:20 5000 600
Leo A 01:06:20 5500 640
Noa A 01:06:20 6500 700
Tim B 00:04:10 500 80
Max B 00:04:10 500 50
Leo B 00:04:10 400 40")

Here is a different approach which groups by Name to get the difference.
library(dplyr)
library(chron)
df <- structure(list(Name = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L), .Label = c("Leo", "Max", "Noa", "Tim"), class = "factor"),
Period = structure(c(1L,1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Time = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("0:04:10", "1:06:20"), class = "factor"),
Distance = c(6000L, 5000L, 5500L, 6500L, 500L, 500L, 400L),
Load = c(680L, 600L, 640L, 700L, 80L, 50L, 40L)), class = "data.frame", row.names = c(NA, -7L))
df %>%
mutate(Time = times(Time)) %>%
group_by(Name) %>%
mutate(Time = lag(Time) - Time,
Distance = lag(Distance) - Distance,
Load = lag(Load) - Load,
Period = LETTERS[which(LETTERS == Period) + 1]) %>%
filter(!is.na(Time))

You can use data.table too.
dt <- data.table(Name = c('Tim', 'Max', 'Leo', 'Noa', 'Tim', 'Max', 'Leo'),
Period = c('A', 'A', 'A', 'A', 'B', 'B', 'B'),
Time = c('01:06:20', '01:06:20' , '01:06:20' , '01:06:20' , '00:04:10' , '00:04:10' , '00:04:10' ),
Distance = c(6000, 5000, 5500, 6500, 500, 500, 400 ),
Load = c(680, 600, 640, 700, 80, 50, 40))
Then the first thing to do is to convert the Time var:
dt[, Time := as.POSIXct(Time, format = "%H:%M:%S")]
sapply(dt, class)
Then you use dcast.data.table:
dtCast <- dcast.data.table(dt, Name ~ Period, value.var = c('Time', 'Distance', 'Load'))
And then you create a new object:
dtFinal <- dtCast[,list(Period = 'C',
Time = Time_A - Time_B,
Distance = Distance_A - Distance_B,
Load = Load_A - Load_B),
by = 'Name']
Mind that if you want to convert the Time to the same format as above, you need to do the following:
library(hms)
dtFinal[, Time := as_hms(Time)]

Percent change for grouped subjects at multiple timepoints R

id timepoint dv.a
1 baseline 100
1 1min 105
1 2min 90
2 baseline 70
2 1min 100
2 2min 80
3 baseline 80
3 1min 80
3 2min 90
I have repeated measures data for a given subject in long format as above. I'm looking to calculate percent change relative to baseline for each subject.
id timepoint dv pct.chg
1 baseline 100 100
1 1min 105 105
1 2min 90 90
2 baseline 70 100
2 1min 100 143
2 2min 80 114
3 baseline 80 100
3 1min 80 100
3 2min 90 113

df <- expand.grid( time=c("baseline","1","2"), id=1:4)
df$dv <- sample(100,12)
df %>% group_by(id) %>%
mutate(perc=dv*100/dv[time=="baseline"]) %>%
ungroup()
You're wanting to do something for each 'id' group, so that's the group_by, then you need to create a new column, so there's a mutate. That new variable is the old dv, scaled by the value that dv takes at the baseline - hence the inner part of the mutate. And finally it's to remove the grouping you'd applied.

Try creating a helper column, group and arrange on that. Then use the window function first in your mutate function:
df %>% mutate(clean_timepoint = str_remove(timepoint,"min") %>% if_else(. == "baseline", "0", .) %>% as.numeric()) %>%
group_by(id) %>%
arrange(id,clean_timepoint) %>%
mutate(pct.chg = (dv / first(dv)) * 100) %>%
select(-clean_timepoint)

in Base Ryou can do this
for(i in 1:(NROW(df)/3)){
df[1+3*(i-1),4] <- 100
df[2+3*(i-1),4] <- df[2+3*(i-1),3]/df[1+3*(i-1),3]*100
df[3+3*(i-1),4] <- df[3+3*(i-1),3]/df[1+3*(i-1),3]*100
}
colnames(df)[4] <- "pct.chg"
output:
> df
id timepoint dv.a pct.chg
1 1 baseline 100 100.0000
2 1 1min 105 105.0000
3 1 2min 90 90.0000
4 2 baseline 70 100.0000
5 2 1min 100 142.8571
6 2 2min 80 114.2857
7 3 baseline 80 100.0000
8 3 1min 80 100.0000
9 3 2min 90 112.5000

Base R solution: (assuming "baseline" always appears as first record per group)
data.frame(do.call("rbind", lapply(split(df, df$id),
function(x){x$pct.change <- x$dv/x$dv[1]; return(x)})), row.names = NULL)
Data:
df <- structure(
list(
id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
timepoint = c(
"baseline",
"1min",
"2min",
"baseline",
"1min",
"2min",
"baseline",
"1min",
"2min"
),
dv = c(100L, 105L, 90L, 70L, 100L, 80L, 80L, 80L, 90L)
),
class = "data.frame",
row.names = c(NA,-9L)
)

Complex dataframe values selection based on both rows and columns

I need to select some values on each row of the dataset below and compute a sum.
This is a part of my dataset.
> prova
key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP
18 3483 364 3509 b n m
19 2367 818 3924 b n m
20 3775 1591 802 b m n
21 929 3059 744 n b n
22 3732 530 1769 b n m
23 3503 2011 2932 b n b
24 3684 1424 1688 b n m
Rows are trials of the experiment and columns are the keys pressed, in temporal sequence (keypressRESP) and the amount of time of the key until the next one (key_duration).
So for example in the first trial (first row) I pressed "b" and after 3483 ms I pressed "n" and so on.
This is my dataframe
structure(list(key_duration1 = c(3483L, 2367L, 3775L, 929L, 3732L,
3503L, 3684L), key_duration2 = c(364L, 818L, 1591L, 3059L, 530L,
2011L, 1424L), key_duration3 = c(3509, 3924, 802, 744, 1769,
2932, 1688), KeyPress1RESP = structure(c(2L, 2L, 2L, 4L, 2L,
2L, 2L), .Label = c("", "b", "m", "n"), class = "factor"), KeyPress2RESP = structure(c(4L,
4L, 3L, 2L, 4L, 4L, 4L), .Label = c("", "b", "m", "n"), class = "factor"),
KeyPress3RESP = structure(c(3L, 3L, 4L, 4L, 3L, 2L, 3L), .Label = c("",
"b", "m", "n"), class = "factor")), row.names = 18:24, class = "data.frame")
I need a method for select in each row (trial) all "b" values, compute the sum(key_duration) and print the values on a new column, the same for "m".
How can i do?
I think that i need a function similar to 'apply()' but without compute every values on the row but only selected values.
apply(prova[,1:3],1,sum)
Thanks

Here is a way using data.table.
library(data.table)
setDT(prova)
# melt
prova_long <-
melt(
prova[, idx := 1:.N],
id.vars = "idx",
measure.vars = patterns("^key_duration", "^KeyPress"),
variable.name = "key",
value.name = c("duration", "RESP")
)
# aggregate
prova_aggr <- prova_long[RESP != "n", .(duration_sum = sum(duration)), by = .(idx, RESP)]
# spread and join
prova[dcast(prova_aggr, idx ~ paste0("sum_", RESP)), c("sum_b", "sum_m") := .(sum_b, sum_m), on = "idx"]
prova
Result
# key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP idx sum_b sum_m
#1: 3483 364 3509 b n m 1 3483 3509
#2: 2367 818 3924 b n m 2 2367 3924
#3: 3775 1591 802 b m n 3 3775 1591
#4: 929 3059 744 n b n 4 3059 NA
#5: 3732 530 1769 b n m 5 3732 1769
#6: 3503 2011 2932 b n b 6 6435 NA
#7: 3684 1424 1688 b n m 7 3684 1688
The idea is to reshape your data to long format, aggregate by "RESP" per row. Spread the result and join back to your initial data.

With tidyverse you can do:
bind_cols(df %>%
select_at(vars(starts_with("KeyPress"))) %>%
rowid_to_column() %>%
gather(var, val, -rowid), df %>%
select_at(vars(starts_with("key_"))) %>%
rowid_to_column() %>%
gather(var, val, -rowid)) %>%
group_by(rowid) %>%
summarise(b_values = sum(val1[val == "b"]),
m_values = sum(val1[val == "m"])) %>%
left_join(df %>%
rowid_to_column(), by = c("rowid" = "rowid")) %>%
ungroup() %>%
select(-rowid)
b_values m_values key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP
<dbl> <dbl> <int> <int> <dbl> <fct> <fct> <fct>
1 3483. 3509. 3483 364 3509. b n m
2 2367. 3924. 2367 818 3924. b n m
3 3775. 1591. 3775 1591 802. b m n
4 3059. 0. 929 3059 744. n b n
5 3732. 1769. 3732 530 1769. b n m
6 6435. 0. 3503 2011 2932. b n b
7 3684. 1688. 3684 1424 1688. b n m
First, it splits the df into two: one with variables starting with "KeyPress" and one with variables starting with "key_". Second, it transforms the two dfs from wide to long format and combines them by columns. Third, it creates a summary for "b" and "m" values according row ID. Finally, it merges the results with the original df.

You can make a logical matrix from the KeyPress columns, multiply it by the key_duration subset and then take their rowSums.
prova$b_values <- rowSums((prova[, 4:6] == "b") * prova[, 1:3])
prova$n_values <- rowSums((prova[, 4:6] == "n") * prova[, 1:3])
key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP b_values n_values
18 3483 364 3509 b n m 3483 364
19 2367 818 3924 b n m 2367 818
20 3775 1591 802 b m n 3775 802
21 929 3059 744 n b n 3059 1673
22 3732 530 1769 b n m 3732 530
23 3503 2011 2932 b n b 6435 2011
24 3684 1424 1688 b n m 3684 1424
It works because the logical values are coerced to numeric 1s or 0s, and only the values for individual keys are retained.
Extra: to generalise, you could instead use a function and tidyverse/purrr to map it:
get_sums <- function(key) rowSums((prova[, 4:6] == key) * prova[, 1:3])
keylist <- list(b_values = "b", n_values = "n", m_values = "m")
library(tidyverse)
bind_cols(prova, map_dfr(keylist, get_sums))

How to merge two dataframes based on range value of one table

DF1
SIC Value
350 100
460 500
140 200
290 400
506 450
DF2
SIC1 AREA
100-200 Forest
201-280 Hospital
281-350 Education
351-450 Government
451-550 Land
Note:class of SIC1 is having character,we need to convert to numeric range
i am trying to get the output like below
Desired output:
DF3
SIC Value AREA
350 100 Education
460 500 Land
140 200 Forest
290 400 Education
506 450 Land
i have tried first to convert character class of SIC1 to numeric
then tried to merge,but no luck,can someone guide on this?

An option can be to use tidyr::separate along with sqldf to join both tables on range of values.
library(sqldf)
library(tidyr)
DF2 <- separate(DF2, "SIC1",c("Start","End"), sep = "-")
sqldf("select DF1.*, DF2.AREA from DF1, DF2
WHERE DF1.SIC between DF2.Start AND DF2.End")
# SIC Value AREA
# 1 350 100 Education
# 2 460 500 Lan
# 3 140 200 Forest
# 4 290 400 Education
# 5 506 450 Lan
Data:
DF1 <- read.table(text =
"SIC Value
350 100
460 500
140 200
290 400
506 450",
header = TRUE, stringsAsFactors = FALSE)
DF2 <- read.table(text =
"SIC1 AREA
100-200 Forest
201-280 Hospital
281-350 Education
351-450 Government
451-550 Lan",
header = TRUE, stringsAsFactors = FALSE)

We could do a non-equi join. Split (tstrsplit) the 'SIC1' column in 'DF2' to numeric columns and then do a non-equi join with the first dataset.
library(data.table)
setDT(DF2)[, c('start', 'end') := tstrsplit(SIC1, '-', type.convert = TRUE)]
DF2[, -1, with = FALSE][DF1, on = .(start <= SIC, end >= SIC),
mult = 'last'][, .(SIC = start, Value, AREA)]
# SIC Value AREA
#1: 350 100 Education
#2: 460 500 Land
#3: 140 200 Forest
#4: 290 400 Education
#5: 506 450 Land
Or as #Frank mentioned we can do a rolling join to extract the 'AREA' and update it on the first dataset
setDT(DF1)[, AREA := DF2[DF1, on=.(start = SIC), roll=TRUE, x.AREA]]
data
DF1 <- structure(list(SIC = c(350L, 460L, 140L, 290L, 506L), Value = c(100L,
500L, 200L, 400L, 450L)), .Names = c("SIC", "Value"),
class = "data.frame", row.names = c(NA, -5L))
DF2 <- structure(list(SIC1 = c("100-200", "201-280", "281-350", "351-450",
"451-550"), AREA = c("Forest", "Hospital", "Education", "Government",
"Land")), .Names = c("SIC1", "AREA"), class = "data.frame",
row.names = c(NA, -5L))

Subsetting rows based on multiple columns using data.table - fastest way

I was wondering if there was a more elegant, less clunky and faster way to do this. I have millions of rows with ICD coding for clinical data. A short example provided below. I was to subset the dataset based on either of the columns meeting a specific set of diagnosis codes. The code below works but takes ages in R and was wondering if there is a faster way.
structure(list(eid = 1:10, mc1 = structure(c(4L, 3L, 5L, 2L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("345", "410", "413.9", "I20.1",
"I23.4"), class = "factor"), oc1 = c(350, 323, 12, 35, 413.1,
345, 345, 345, 345, 345), oc2 = structure(c(5L, 6L, 4L, 1L, 1L,
2L, 2L, 2L, 3L, 2L), .Label = c("", "345", "I20.3", "J23.6",
"K50.1", "K51.4"), class = "factor")), .Names = c("eid", "mc1",
"oc1", "oc2"), class = c("data.table", "data.frame"), row.names = c(NA,
-10L), .internal.selfref = <pointer: 0x102812578>)
The code below subsets all rows that meet the code of either "I20" or "413" (this would include all codes that have for example been coded as "I20.4" or "413.9" etc.
dat2 <- dat [substr(dat$mc1,1,3)== "413"|
substr(dat$oc1,1,3)== "413"|
substr(dat$oc2,1,3)== "413"|
substr(dat$mc1,1,3)== "I20"|
substr(dat$oc1,1,3)== "I20"|
substr(dat$oc2,1,3)== "I20"]
Is there a faster way to do this? For example can i loop through each of the columns looking for the specific codes "I20" or "413" and subset those rows?

We can specify the columns of interest in .SDcols, loop through the Subset of Data.table (.SD), get the first 3 characters with substr, check whether it is %in% a vector of values and Reduce it to a single logical vector for subsetting the rows
dat[dat[,Reduce(`|`, lapply(.SD, function(x)
substr(x, 1, 3) %chin% c('413', 'I20'))), .SDcols = 2:4]]
# eid mc1 oc1 oc2
#1: 1 I20.1 350.0 K50.1
#2: 2 413.9 323.0 K51.4
#3: 5 345 413.1
#4: 9 345 345.0 I20.3

For larger data it could help if we dont chech all rows:
minem <- function(dt, colsID = 2:4) {
cols <- colnames(dt)[colsID]
x <- c('413', 'I20')
set(dt, j = "inn", value = F)
for (i in cols) {
dt[inn == F, inn := substr(get(i), 1, 3) %chin% x]
}
dt[inn == T][, inn := NULL][]
}
n <- 1e7
set.seed(13)
dt <- dts[sample(.N, n, replace = T)]
dt <- cbind(dt, dts[sample(.N, n, replace = T), 2:4])
setnames(dt, make.names(colnames(dt), unique = T))
dt
# eid mc1 oc1 oc2 mc1.1 oc1.1 oc2.1
# 1: 8 345 345.0 345 345 345 345
# 2: 3 I23.4 12.0 J23.6 413.9 323 K51.4
# 3: 4 410 35.0 413.9 323 K51.4
# 4: 1 I20.1 350.0 K50.1 I23.4 12 J23.6
# 5: 10 345 345.0 345 345 345 345
# ---
# 9999996: 3 I23.4 12.0 J23.6 I20.1 350 K50.1
# 9999997: 5 345 413.1 I20.1 350 K50.1
# 9999998: 4 410 35.0 345 345 345
# 9999999: 4 410 35.0 410 35
# 10000000: 10 345 345.0 345 345 345 I20.3
system.time(r1 <- akrun(dt, 2:ncol(dt))) # 22.88 sek
system.time(r2 <- minem(dt, 2:ncol(dt))) # 17.72 sek
all.equal(r1, r2)
# [1] TRUE

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

loop to create a dataframe with new column, then combine them together - r

This can be done purely through a cbind as the rows of the original data will repeat: cbind(dat, alt=rep(seq(100,5100,100), each=nrow(dat))) This should be much faster than looping over values.

Consider a cross join merge: expanded_df <- merge(df, data.frame(alt=seq(100, 5100, 100)), by = NULL)

Related

Subtracting rows in R based on matching value

Percent change for grouped subjects at multiple timepoints R

Complex dataframe values selection based on both rows and columns

How to merge two dataframes based on range value of one table

Subsetting rows based on multiple columns using data.table - fastest way

Categories

Resources