Need help understandig the 'rep()' function - r

rep (2,5)
rep
Hello everyone, I am learning 'R' by watching a Udemy tutorial and I've been following along. Recently I learned seq() and rep() function. However, when I try to run the code written above I get an additional output. The code returns 2.2.2.2.2 and .Primitive("rep"). I am using Kaggle notebooks. Help me understand how this functions works, what is going wrong here, and what will happen if we provide multiple input as rep(2,3,4,5) or (1,2,3,4,6,8).

In R, rep is a function. It is designed to replicate its first argument a number of times equal to its second argument. Thus rep(2, 5) returns a vector of length 5 with each element as 2.
In R, functions are also objects, and when you input a function's name, R will return the something that tries to be useful by showing that the input is a function and providing the expected arguments. The .Primitive("rep") part tells you that rep is a primitive function, part of the base R code.
rep
function (x, ...) .Primitive("rep")
In this case, rep requires at least one argument x, which the object to be replicated. The ... indicates that it can take a number of other optional arguments. To learn about them, you can access the help file for rep with ?rep.
You can call rep with more arguments, but the behavior might not be what you expect.

By typing rep without any details, you are asking R to show you the internal "guts" of what the function does. You can learn more about it by typing ?rep. The manual is probably a lot for a beginner but if you scroll to the bottom you will see some useful examples.
I hope this help:
rep ("hi", 5) # print hi five times
rep(c("hi", "hello"), 3) # print the object holding hi and hello three times
rep(c("hi", "hello"), c(1, 2)) # print hi once and hello two times

Related

summary of row of numbers in R

I just hope to learn how to make a simple statistical summary of the random numbers fra row 1 to 5 in R. (as shown in picture).
And then assign these rows to a single variable.
enter image description here
Hope you can help!
When you type something like 3 on a single line and ask R to "run" it, it doesn't store that anywhere -- it just evaluates it, meaning that it tries to make sense out of whatever you've typed (such as 3, or 2+1, or sqrt(9), all of which would return the same value) and then it more or less evaporates. You can think of your lines 1 through 5 as behaving like you've used a handheld scientific calculator; once you type something like 300 / 100 into such a calculator, it just shows you a 3, and then after you have executed another computation, that 3 is more or less permanently gone.
To do something with your data, you need to do one of two things: either store it into your environment somehow, or to "pipe" your data directly into a useful function.
In your question, you used this script:
1
3
2
7
6
summary()
I don't think it's possible to repair this strategy in the way that you're hoping -- and if it is possible, it's not quite the "right" approach. By typing the numbers on individual lines, you've structured them so that they'll evaluate individually and then evaporate. In order to run the summary() function on those numbers, you will need to bind them together inside a single vector somehow, then feed that vector into summary(). The "store it" approach would be
my_vector <- c(1, 3, 7, 2, 6)
summary(my_vector)
The importance isn't actually the parentheses; it's the function c(), which stands for concatenate, and instructs R to treat those 5 numbers as a collective object (i.e. a vector). We then pass that single object into my_vector().
To use the "piping" approach and avoid having to store something in the environment, you can do this instead (requires R 4.1.0+):
c(1, 3, 7, 2, 6) |> summary()
Note again that the use of c() is required, because we need to bind the five numbers together first. If you have an older version of R, you can get a slightly different pipe operator from the magrittr library instead that will work the same way. The point is that this "binding" part of the process is an essential part that can't be skipped.
Now, the crux of your question: presumably, your data doesn't really look like the example you used. Most likely, it's in some separate .csv file or something like that; if not, hopefully it is easy to get it into that format. Assuming this is true, this means that R will actually be able to do the heavy lifting for you in terms of formatting your data.
As a very simple example, let's say I have a plain text file, my_example.txt, whose contents are
1
3
7
2
6
In this case, I can ask R to parse this file for me. Assuming you're using RStudio, the simplest way to do this is to use the File -> Import Dataset part of the GUI. There are various options dealing with things such as headers, separators, and so forth, but I can't say much meaningful about what you'd need to do there without seeing your actual dataset.
When I import that file, I notice that it does two things in my R console:
my_example <- read.table(...)
View(my_example)
The first line stores an object (called a "data frame" in this case) in my environment; the second shows a nice view of how it's rendered. To get the summary I wanted, I just need to extract the vector of numbers I want, which I see from the view is called V1, which I can do with summary(my_example$V1).
This example is probably not helpful for your actual data set, because there are so many variations on the theme here, but the theme itself is important: point R at a file, as it to render an object, then work with that object. That's the approach I'd recommend instead of typing data as lines within an R script, as it's much faster and less error-prone.
Hopefully this will get you pointed in the right direction in terms of getting your data into R and working with it.

How can I use a list of integers as the input to a function of one integer and get a list as an output?

Apologies for asking a trivial question, but I'm stumped. Here's the situation:
I have a function of three inputs fishCounter(data, x, y) where data is a matrix and both x and y are integers.
fishCounter is in memory and works completely fine when I call it manually (e.g. fishCounter(matrix(1:4,4,4), 1, 4)). Its output is a single integer.
The relevant data and value of x are in memory. x is simply 3, and we'll call the data trout.
I want R to spit out the list of results for every value of y from 1 to 20. Crudely, what I want is fishCounter(trout, 3, 1:20).
The way that R gives me this data (e.g. array, vector, list, etc) is not of interest, I just want the output however I can get it.
Everything that I've tried to get this has failed. I could of course use a for loop and append this all to a vector, but that seems like far too much effort.
My memory insists that there is a very simple way to get what I'm after. I'm sure that some version of replicate, apply or lapply will do this job.
What I want is a single function that will give me this result. For example, I was surprised when lapply(c(1:19), fishCounter(trout, 3, y) didn't work.
No libraries should be needed and I shouldn't need to code in any new functions. My memory insists that I'm either simply forgetting a function that's build in to R, have forgotten a term that would've got me the answer instantly from a search engine, or I've completely misunderstood the documentation on the three functions that I've listed earlier.
What have I forgotten?
Maybe you can try lapply like below, i.e.,
lapply(1:20, function(y) fishCounter(trout, 3, y))
or Vectorize over your function fishCounter, i.e.,
Vectorize(fishCounter)(trout, 3, 1:20)

Summing up list members step by step

I've got a very simple problem but I was unable to find a simple solution in R because I was used to solve such problems by iterating through an incrementing for-loop in other languages.
Let's say I've got a random distributed numeric list like:
rand.list <- list(4,3,3,2,5)
I'd like to change this random distributed pattern into a constantly rising pattern so the result would look like:
[4,7,10,12,17]
Try using Reduce with the accumulate parameter set to TRUE:
Reduce("+",rand.list, accumulate = T)
I hope this helps.
It came to me first to do cumsum(unlist(rand.list)), where unlist collapses the list into a plain vector. However, my lucky try shows that cumsum(rand.list) also works.
It is not that clear to me how this work, as the source code of cumsum calls .Primitive, an internal S3 method dispatcher which is not easy to further investigate. But I make another complementary experiment as follow:
x <- list(1:2,3:4,5:6)
cumsum(x) ## does not work
x <- list(c(1,2), c(3,4), c(5,6))
cumsum(x) ## does not work
In this case, we have to do cumsum(unlist(x)).

What does the "by" argument in ffbase::as.character do?

In the post below,
aggregation using ffdfdply function in R
There is a line like this.
splitby <- as.character(data$Date, by = 250000)
Just out of curiosity, I wonder what by argument means. It seems to be related to ff dataframe but I'm not sure. Google search and R documentation of as.character and as.vector provided no useful information.
I tried some examples but the codes below give the same results.
d <- seq.Date(Sys.Date(), Sys.Date()+10000, by = "day")
as.character(d, by=1)
as.character(d, by=10)
as.character(d, by=100)
If anybody could tell me what it is, I'd appreciate it. Thank you in advance.
Since as.character.ff works using the default as.character internally, and in view of the fact that df vectors can be larger than RAM, the data needs to be processed in chunks. The partition into chunks is facilitated by the chunk function. In this case, the relevant method is chunk.ff_vector. By default, this will calculate the chunk size by dividing getOption("ffbatchbytes") by the record size. However, this behaviour can be overridden by supplying the chunk size using by.
In the example you give, the ff vector will be converted to character 250000 members at a time.
The end result will be the same for any by or without by at all. Larger values will lead to greater temporary use of RAM but potentially quicker operation.
First, that function is ffbase::as.character, not plain old base::as.character
See http://www.inside-r.org/packages/cran/ffbase/docs/as.character.ff
which says
as.character((x, ...))
Arguments:
x: a ff vector
...: other parameters passed on to chunk
So the by argument is being passed through to some chunk function.
Then you need to figure out which package's chunk function is being used. Type ?chunk, tell us which one, then go read its doc to see what its by argument does.

(R) Burrow into function to change it with Trace()

I've built a large function which calls numerous gbm functions in a big loop. All I'm trying to do is increase the thickness of the tickmarks in rug() which is called by gbm.plot.
I was hoping to use (e.g.)
body(gbm.plot)[[24]][[4]][[3]][[3]][[3]][[3]][[2]]$ylab <- "change value"
From this page's examples, which I've used successfully elsewhere, but the section in question in gbm.plot is in an IF statement, so as.list doesn't nicely recurse the lines (because arguably it's all one huge long line). You can get to them by just manually [[trying]][[successive]][[combinations]] until you get to the right place, but since I'm trying to insert a piece of code , lwd=6 into a bracketed statement, rather than assigning a value to a named subobject, I'm not sure how to get trace to do this.
?trace says:
When the at argument is supplied, it can be a vector of integers referring to the substeps of the body of the function (this only works if the body of the function is enclosed in { ...}. In this case tracer is not called on entry, but instead just before evaluating each of the steps listed in at. (Hint: you don't want to try to count the steps in the printed version of a function; instead, look at as.list(body(f)) to get the numbers associated with the steps in function f.)
The at argument can also be a list of integer vectors. In this case, each vector refers to a step nested within another step of the function. For example, at = list(c(3,4)) will call the tracer just before the fourth step of the third step of the function
So I tried pasting the whole line with the lwd bit added in, hoping that it would overwrite it with the small addition:
trace (gbm.plot, quote(rug(quantile(data[, gbm.call$gbm.x[variable.no]], probs = seq(0, 1, 0.1), na.rm = TRUE),lwd=6)), at=c(22,4,7,3,3,3,2))
...as well as putting objects in & out of {brackets}, all to no avail. Does anyone know either the correct way of using trace for this, or can suggest a better way? Thanks
p.s. it needs to be done automatically with coding so users can load the function which will load the vanilla gbm functions from CRAN and then make tweaks as required.
EDIT: found a workaround. But generalisable question: how can one insert elements into an IF statemented part of a function? e.g. From
rug(quantile(data[, gbm.call$gbm.x[variable.no]], probs=seq(0, 1, 0.1), na.rm=TRUE))
to
rug(quantile(data[, gbm.call$gbm.x[variable.no]], probs=seq(0, 1, 0.1), na.rm=TRUE),lwd=6)

Resources