Variable selection in R package data.table : $ vs [,,] [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm curently using R package data.table to process big datasets.
I'm wondering if there is a difference between the syntax
DT[,v]
and the syntax :
DT$v
if DT is my data.table object and v the variable I want to select.
I know that the dollar sign is usually used for data frames and that [,v] is always used in data.table examples. However they both work and seem to give (in my experience with 5million rows) similar times to execute.
Do you know if they are processed differently and if one is more efficient when processing even huger datasets ?

Related

Does R has method like fillna(method = "ffill") in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am learning using R to do data cleaning work. Just encounter a question that I could deal with by python but not in R.
The dataset is like this.dataset
I want to concat the first two columns and assign it as index. The first thing I need to do is to fillna('ffill') the first column. Then I need concat two columns.
Could you tell me how to do this in R (tidyverse is better)?
The result should like this:
result
Thanks in advance!
Try these. Be sure to read the help pages since many of them have arguments which may need to be set depending on what you want.
zoo::na.locf (last observation carried forward)
zoo::na.locf0
tidyr::fill
data.table::nafill
zoo also has na.aggregate, na.approx, na.contiguous, na.fill, na.spline, na.StructTS and na.trim for other forms of NA filling and tidyr also has replace_na.

Which function to choose [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
data <- read_delim("imported_data.csv", delim = ",")
data <- read_csv("imported_data.csv")
data <- read.csv("imported_data.csv")
data <- fread("imported_data.csv")
All these function have the same output, which one should I use?
When it comes to more sophisticated functions, again what should I do?
Thanks.
Use the one that's most appropriate for the situation.
If you are using Dplyr and related libraries, use read_csv or read_delim. The former is a convenience wrapper for the latter, so use whichever one seems most logical to you.
If you are using Data.table, use fread. Data.table has better performance on very large datasets, compared to Dplyr.
If you are not using either of those libraries, use read.csv or read.table because they are included in base R.

Which command saves data faster in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Does anyone know which method of saving data is faster fwrite from data.table or saveWorkbook in openxlsx?
Not quite an answer, but too long for a comment.
The easy comment is: Just try to benchmark your code with bench::mark
library(bench)
...
mark(
data.table::fwrite(data, tempfile()),
openxlsx::saveWorkbook(data, tempfile()),
check = FALSE
)
The slightly longer comment is: Do you just want to have the fastest read/write? Then you might want to look into fst and or qs.
I presented a lightning talk at our last R User Group where I benchmarked different read/write speeds, memory usages, file sizes etc. You find the slides here.
Hope that helps

R : understanding simplified script with brackets or hooks? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I would like to understand how really works this script :
y <- y[keep, , keep.lib.sizes=FALSE]
in :
keep <- rowSums(cpm(y)>1) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]
I do know d.f[a,b] but I can not find R-doc for d.f[a, ,b].
I tried "brackets", "hooks", "commas"... :-(
(Sometimes I would prefer that one does not simplifie his R script !)
Thanks in advance.
Subscripting data.Frames takes two values: df[rows, columns]. Any third value are optional arguments that you can use to subscript.
The most common of those is drop=FALSE as in df[1:18, 3, drop = FALSE]. This is done because when you subset just one column of a data.frame, it will lose the data.frame class. In your specific case, it seems like you are using another object that looks like a data.frame but with added functionalities from the bioconductor package. A look at the methods for those will tell you how these work.

Meaning of 'cs' in R expression [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I inherited legacy code from 2009 that includes the many of the following types of expressions -
variable %in% cs(do, ph, t, secchi)
I get the following error - Error: could not find function "cs" when I try to run anything like this, have not seen 'cs' before, and can't locate any info in help files, google, or on this site so far. I'm guessing it is a deprecated way of concatenating strings but would like to confirm before I update the legacy code.
There is a Cs function in Hmisc package. I think it is that.
See this
library(Hmisc)
Cs(a,cat,dog)

Resources