A few questions about data.table in r [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So I went over this tutorial, and have a few questions:
What is the exact meaning of "columns within the frame of a data.table are seen as if they are variables"?
Is there a particular meaning for the "L" after 6 in month==6L? (in the data table its only 6 not 6L).
I understand how to calculate mean for every column by something, but what if I simply want to calculate the mean of each column (assuming that I have many columns so I don't want to write all the names).
Thanks!

expanding the quote: "you don’t have to use DT$ repetitively since columns within the frame of a data.table are seen as if they are variables" referring to variables within a data.table, is like using the with function, which minimizes typing and may make lines more readable.
"L" is an R marker that says treat the preceding number as an integer (not a numeric (double)).
use the .SD method, for example to get the sum of all variables by variable byVariable in data.table dt:
myDT <- dt[, lapply(.SD, sum), by="byVariable"]

Related

Does R has method like fillna(method = "ffill") in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am learning using R to do data cleaning work. Just encounter a question that I could deal with by python but not in R.
The dataset is like this.dataset
I want to concat the first two columns and assign it as index. The first thing I need to do is to fillna('ffill') the first column. Then I need concat two columns.
Could you tell me how to do this in R (tidyverse is better)?
The result should like this:
result
Thanks in advance!
Try these. Be sure to read the help pages since many of them have arguments which may need to be set depending on what you want.
zoo::na.locf (last observation carried forward)
zoo::na.locf0
tidyr::fill
data.table::nafill
zoo also has na.aggregate, na.approx, na.contiguous, na.fill, na.spline, na.StructTS and na.trim for other forms of NA filling and tidyr also has replace_na.

Is there an easy way to order a large number of columns without using dplyr::select() in r? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working with a very large dataset with lots of columns (400+) and every time I create a new variable or add a new one I have to reorder it. I want it ordered so that all the related variables remain together so I've been using dplyr::select() to reorder things. Yet there are times when I have to go back into my script very early on and add a new variable. When I run the whole code after that, there tends to be one or two variables I forgot to put into preceding select() functions so it goes missing.
I use select() because selecting all the columns between two variables and referencing them by name is super easy (eg, Vfour:Vthreefifty). Do you have any tips for reordering datasets with lots of columns?
Given no reproducible example but using your 2 column names:
df %>%
select(., starts_with('V'))
You can then chain starts_with as needed.
Other options include:
ends_with, contains, matches

Working with tables using R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to write a function in R, since in other languages like c++ it works very slow. The function fills a 2d table with data, and then summarizes values of each row for further processing.
I am not sure if it answers your question, but if you work with data you can put them into data frames to take a look at the statistical parameters and for further processing. For example:
df = data.frame("var1" = c(5,10,15), "var2" = c(20,40,60))
#the 'summary' command gives you some statistical parameters based on the column
summary(df)
#with the 'apply' command you can addresses the rows.
#in this example you get the mean of each row:
apply(df, 1,mean)

R : understanding simplified script with brackets or hooks? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I would like to understand how really works this script :
y <- y[keep, , keep.lib.sizes=FALSE]
in :
keep <- rowSums(cpm(y)>1) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]
I do know d.f[a,b] but I can not find R-doc for d.f[a, ,b].
I tried "brackets", "hooks", "commas"... :-(
(Sometimes I would prefer that one does not simplifie his R script !)
Thanks in advance.
Subscripting data.Frames takes two values: df[rows, columns]. Any third value are optional arguments that you can use to subscript.
The most common of those is drop=FALSE as in df[1:18, 3, drop = FALSE]. This is done because when you subset just one column of a data.frame, it will lose the data.frame class. In your specific case, it seems like you are using another object that looks like a data.frame but with added functionalities from the bioconductor package. A look at the methods for those will tell you how these work.

Advanced subsetting r data frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Without using a special function, can you do the following to change the values in an R dataframe from a column that have length greater than 4:
df[length(df$Column1)>4,"Column1"] = "replacement value"
This does not seem to work, is there an alternative index style I can use, or do I need to use a function?
Thanks
The function to determine the length of an entry, like a word in a dataframe, is nchar(), and not length(). The latter is typically used to determine the number of entries in a vector.
You could therefore try using:
df[nchar(df$Column1) > 4, "Column1"] <- "replacement value"

Resources