How to avoid the NULL output of str() - r

I realized that str() returns a NULL to the assigned object (if assigned) and reading a bit I noticed that this is because str() uses the invisible() function under the hood. Is there any argument on str() that disable that so it can actually return the structure of the object?

str() is called for its side effect of printing to the console, not for its return value. That said, if you want to capture that text and store it in an object rather than having it printed to the console, you can do so using the function capture.output(). Here's an example:
x <- capture.output(str(mtcars))
x[1:4]
# [1] "'data.frame':\t32 obs. of 11 variables:"
# [2] " $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ..."
# [3] " $ cyl : num 6 6 4 6 8 6 8 4 4 6 ..."
# [4] " $ disp: num 160 160 108 258 360 ..."
cat(x[1:4], sep="\n")
# 'data.frame': 32 obs. of 11 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num 160 160 108 258 360 ...

Related

Is there a way to split data in r by column then run the same set of commands for each data set

I have a table of data made in excel that i converted to a txt file.
The command I'm using will only let me run it if I have only two columns. I transposed my data into columns but now I need to somehow split it all up so every column 2 to column 189 is a different table with column 1 staying the same in all.
Is it possible to then run the exact same set of commands over and over again for the 188 tables created and save the resulting data into a separate file (or better yet substitute some of the obtained values into an equation).
Sorry if the question is too long or ridiculously easy - I'm a complete newbie to anything beyond basic analysis.
Happy to try and learn other programs if it will solve my problem.
You can do the following in base R (I use the built-in mtcars data.frame as an example)
df <- mtcars
lst <- apply(rbind(1, 2:ncol(df)), 2, function(idx) df[, idx])
This returns a list of data.frames with columns (1,2), (1,3), (1,4) and so on, of the original data.frame.
str(lst)
#List of 10
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ disp: num [1:32] 160 160 108 258 360 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# $ :'data.frame': 32 obs. of 2 variables:
# ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
It's not easy to operate on the list of data.frames using a function from the *apply family.
To generate the basic combinations, you could use Map:
Map(cbind, df[1], df[-1])
To apply a function to each combination you would need to edit the function a bit:
Map(function(a,b) fun(cbind(a,b)), df[1], df[-1])
Or add another level of looping with lapply if you want to keep the code compact.
lapply(Map(cbind, df[1], df[-1]), fun)

writing multiple variables in a short way in r

I have a list of data.frames from the years 2005 - 2016. they all written the same way, except the digits of the years:
m =list(X2016_kvish_1_10t = X2016_kvish_1_10t, X2015_kvish_1_10t = X2015_kvish_1_10t, X2014_kvish_1_10t = X2014_kvish_1_10t,
X2013_kvish_1_10t = X2013_kvish_1_10t, X2012_kvish_1_10t = X2012_kvish_1_10t, X2011_kvish_1_10t = X2011_kvish_1_10t,
X2010_kvish_1_10t = X2010_kvish_1_10t, X2009_kvish_1_10t = X2009_kvish_1_10t, X2008_kvish_1_10t = X2008_kvish_1_10t,
X2007_kvish_1_10t = X2007_kvish_1_10t, X2006_kvish_1_10t = X2006_kvish_1_10t, X2005_kvish_1_10t = X2005_kvish_1_10t)
is there shorter way to write it, without needing to write all of them separately ?
Try mget:
df_names = paste0("X", 2005:2016, "_kvish_1_10t")
m = mget(df_names)
EDIT
As #d.b points out, you don't even need to create df_names
m = mget(ls(pattern="_kvish_1_10t$"))
You can use mget function providing a character vector of the objects names in your workspace.
I made a reproductible example for the purpose of showing how to do it.
df_name <- paste0("x", 2005:2016, "_kvish_1_10t")
df_name
#> [1] "x2005_kvish_1_10t" "x2006_kvish_1_10t" "x2007_kvish_1_10t"
#> [4] "x2008_kvish_1_10t" "x2009_kvish_1_10t" "x2010_kvish_1_10t"
#> [7] "x2011_kvish_1_10t" "x2012_kvish_1_10t" "x2013_kvish_1_10t"
#> [10] "x2014_kvish_1_10t" "x2015_kvish_1_10t" "x2016_kvish_1_10t"
# juste create some dummy table for example
l <- lapply(df_name, assign, value = mtcars[1:2], envir= .GlobalEnv)
# Use mget to get a list of all the object
m <- mget(df_name, envir = .GlobalEnv)
str(m)
#> List of 12
#> $ x2005_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2006_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2007_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2008_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2009_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2010_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2011_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2012_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2013_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2014_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2015_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ x2016_kvish_1_10t:'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...

assigning attributes to data frame variables in place using *apply functions

I would like to set the attribute "mean" to the variable mean for all variables in a data frame (I'm actually working with applying proper attributes from a Stata file to data frames, but this is essentially the problem). Using a for loop, this works:
test <- mtcars
for(var in seq_along(test)) {
attr(test[[var]], "name") <- mean(test[[var]])
}
str(test)
'data.frame': 32 obs. of 11 variables:
$ mpg : atomic 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..- attr(*, "name")= num 20.1
$ cyl : atomic 6 6 4 6 8 6 8 4 4 6 ...
..- attr(*, "name")= num 6.19
However, my best attempt using apply and the super assignment operator does not work:
test <- mtcars
apply(test, 2, function(var) {
attr(var, "name") <<- mean(var)
})
mpg cyl disp hp drat
20.090625 6.187500 230.721875 146.687500 3.596563...
str(test)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
I clearly don't fully understand apply and the super assignment. Can I do this using one of the apply functions and/or dplyr?

Dynamically redefine a function like print.data.frame

Is there a way to redefine a locked function?
What would be the best way to dynamically redefine such a globally available function while evaluating some code?
Example: I have the following code:
print(cars[1:5, ])
This usually calls print.data.frame but for whatever reasons I want it to call my.fancy.print.data.frame() instead. What would be the best way to achieve this?
In the end, I would like to have something like this:
evalWithEnvir(print(cars[1:5, ]), envir = list(print.data.frame = my.fancy.print.data.frame))
EDIT:
The question was badly asked. The problem was that I used <<- to redefine the function. This tried to set the function in the wrong environment. As #hrbrmstr pointed out below, the function can be easily redefined in the global environment.
print.data.frame is not 'locked' (or hidden). It appears among methods("print"), where the non-visible methods are also given.
If you prefer not to define a special class, you can overwrite base::print.data.frame in a defined environment and reference this in your code e.g.
e1 <- new.env(parent=.GlobalEnv)
assign("print.data.frame",
function(x) print((unclass(x))),
envir=e1)
with(e1, print(cars[1:5, ]))
giving:
$speed
[1] 4 4 7 7 8
$dist
[1] 2 10 4 22 16
attr(,"row.names")
[1] 1 2 3 4 5
and your other code should run as normal inside e1.
You can redefine the functionality of print.data.frame in your environment with:
print.data.frame <- function(x, ..., digits = NULL,
quote = FALSE, right = TRUE, row.names = TRUE) {
print("WOO HOO")
}
Now that's useless since it will just print WOO HOO vs do something meaningful but it should help you get started.
SabDeM's idea is a better one:
class(mtcars) <- c("myclass", class(mtcars))
print.myclass <- function(x) {
print(ls.str(x))
}
print(mtcars)
## am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
## carb : num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
## cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
## disp : num [1:32] 160 160 108 258 360 ...
## drat : num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## gear : num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
## hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
## mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## qsec : num [1:32] 16.5 17 18.6 19.4 17 ...
## vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
## wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...

Split a dataframe into a list of dataframes, but how to re-merge?

I have a big ol' data frame with two ID columns for courses and users, and I needed to split it into one dataframe per course to do some further analysis/subsetting. After eliminating quite a few rows from each of the individual course dataframes, I'll need to stick them back together.
I split it up using, you guessed it, split, and that worked exactly as I needed it to. However, unsplitting was harder than I thought. The R documentation says that "unsplit reverses the effect of split," but my reading on the web so far is suggesting that that is not the case when the elements of the split-out list are themselves dataframes.
What can I do to rejoin my modified dfs?
This is a place for do.call. Simply calling df <- rbind(split.df) will result in a weird and useless list object, but do.call("rbind", split.df) should give you the result you're looking for.
unsplit() will work / does seem to work in the general situation that you describe, but not the particular situation of removing rows from the thus split data frame.
Consider
> spl <- split(mtcars, mtcars$cyl)
> str(spl, max = 1)
List of 3
$ 4:'data.frame': 11 obs. of 11 variables:
$ 6:'data.frame': 7 obs. of 11 variables:
$ 8:'data.frame': 14 obs. of 11 variables:
> str(unsplit(spl, f = mtcars$cyl))
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
As we can see, unsplit() can undo a split. However, in the case where the split data frame is further worked upon and altered to remove rows, there will be a mismatch between the total number of rows in the data frames in the split list and the variable used to split the original data frame.
If you know or can compute the changes required to make the variable used to split the original data frame then unsplit() can be deployed. Though it is more than likely that this will not be trivial.
The general solution is, as #Andrew Sannier mentions is the do.call(rbind, ...) idiom:
> spl <- split(mtcars, mtcars$cyl)
> str(do.call(rbind, spl))
'data.frame': 32 obs. of 11 variables:
$ mpg : num 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
$ cyl : num 4 4 4 4 4 4 4 4 4 4 ...
$ disp: num 108 146.7 140.8 78.7 75.7 ...
$ hp : num 93 62 95 66 52 65 97 66 91 113 ...
$ drat: num 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
$ wt : num 2.32 3.19 3.15 2.2 1.61 ...
$ qsec: num 18.6 20 22.9 19.5 18.5 ...
$ vs : num 1 1 1 1 1 1 1 1 0 1 ...
$ am : num 1 0 0 1 1 1 0 1 1 1 ...
$ gear: num 4 4 4 4 4 4 3 4 5 5 ...
$ carb: num 1 2 2 1 2 1 1 1 2 2 ...
Outside of base R, also consider:
data.table::rbindlist() with the side effect of the result being a data.table
dplyr::bind_rows() which despite its somewhat confusing name will bind rows across lists
The answer by Andrew Sannier works but has the side-effect that the rownames get changed. rbind adds the list names to them, so e.g. "Datsun 710" becomes "4.Datsun 710". One can use unname in between to avoid this problem.
Complete example:
mtcars_reorder = mtcars[order(mtcars$cyl), ] #reorder based on cyl first
l1 = split(mtcars_reorder, mtcars_reorder$cyl) #split by cyl
l1 = unname(l1) #remove list names
l2 = do.call(what = "rbind", l1) #unsplit
all(l2 == mtcars_reorder) #check if matches
#> TRUE

Resources