using a variable to specify a column in a dataframe [duplicate] - r

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 2 years ago.
I am trying to access a column in a data frame using a variable , this variable wil be populated
in a loop
atr<-"yield_kgha"
so what i want is the second line below where $atr to act like it was $yield_kgha
I tried $get(atr) with no luck ... how do I get atr to be taken literally
meanis=MEAN = mean(zones[[zonename]]$yield_kgha , na.rm = TRUE) #get the mean yield_kgha in the zone
meanis=MEAN = mean(zones[[zonename]]$atr , na.rm = TRUE) #get the mean yield_kgha in the zone

If we want to use an object, then use [[ instead of $ similar to the 'zonename'
mean(zones[[zonename]][[atr]] , na.rm = TRUE) #

Related

Dynamically turn a data frame column into a variable of the same name [duplicate]

This question already has answers here:
Return elements of list as independent objects in global environment
(4 answers)
Closed 1 year ago.
Let's say that I have a data frame that looks like the following.
dt = data.frame(a = rnorm(5), b = rnorm(5))
I would like to assign a set of column variables as a vector.
colvec <- c("a","b")
So for these two columns in the vector, I'd like to dynamically take that column and create a vector of the same name.
So the normal way to do this would be to do...
a = dt$a
b = dt$b
But I want to do this dynamically. Any suggestions?
Not sure if this is a good idea, but you can do:
sapply(colvec, function(x) assign(x, dt[, x], envir = .GlobalEnv))

Insert or print a column name inside a data table call [duplicate]

This question already has answers here:
Converting multiple data.table columns to factors in R
(2 answers)
Closed 2 years ago.
I have a rather simple problem as it seems, which I cannot solve myself however.
Can I somehow insert or print a column name within a data table call? I have something like this in mind:
col_names = c("column1","column2")
for (col in col_names){
datatable$col ...
}
or
col_names = c("column1","column2")
for (col in col_names){
datatable[,col] ...
}
What I eventually would like to do is transform the variables of certain columns into ordered factors. Since there are many columns, I'm looking for a neater way as an alternative of just coding the same line 20 times with the only difference being the column name.
Are you trying to print the just the column name or the entire column within the datatable?
You could try something like this
col_names = c("column1","column2")
for (i in seq_along(col_names)){
print(datatable[col_names[[i]]])
}
Or if you just want the names printed.
col_names = c("column1","column2")
for (i in seq_along(col_names)){
print(col_names[[i]])
}
Also, you might want to check out the iteration chapter in R for Data Science.
Perhaps, you can try lapply with SDcols to apply a function over col_names. You can try something like this :=
library(data.table)
datatable[, (col_names) := lapply(.SD, function(x) factor(x, ordered = TRUE)),
.SDcols = col_names]
Here we apply factor(x, ordered = TRUE) to each column in col_names where x is each individual column name.

Apply a function to each column in a dataframe in R [duplicate]

This question already has answers here:
Replace all occurrences of a string in a data frame
(7 answers)
Closed 2 years ago.
I would like to replace a series of "99"s in my dataframe with NA. To do this for one column I am using the following line of code, which works just fine.
data$column[data$column == "99"] = NA
However, as I have a large number of columns I want to apply this to all columns. The following line of code isn't doing it. I assume it is because the third "x" is again a reference to the dataframe and not to a specific column.
data = lapply(data, function(x) {x[x == "99"] = NA})
Any advice on what I should change?
If you want to replace all 99, simply do
data[data=="99"] <- NA
If you want to stick to the apply function
apply(data, 2, function(x) replace(x, x=="99", NA))

[[ ]] vs $: why doesn't the latter work on this Datacamp code? [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 4 years ago.
I am working through Datacamp's Intro to R course, but I do not understand why this code works:
# Define columns
columns <- c("trip_distance", "total_amount", "passenger_count")
# Create summary function
taxis_summary <- function(col, data = taxis) {
c(
mean = mean(data[[col]]),
sd = sd(data[[col]]),
quantile(data[[col]], c(0.25, 0.5, 0.75))
)
}
# Use sapply to summarize columns
sapply(columns, taxis_summary)
but this code throws a:
Unknown or uninitalised column: 'col'. Argument is not numeric or
logical: returning NA
# Define columns
columns <- c("trip_distance", "total_amount", "passenger_count")
# Create summary function
taxis_summary <- function(col, data = taxis) {
c(
mean = mean(data$col),
sd = sd(data$col),
quantile(data$col, c(0.25, 0.5, 0.75))
)
}
# Use sapply to summarize columns
sapply(columns, taxis_summary)
There are various ways to access elements in dataframes. This is an issue with the way R is looking for the column names you want it to find.
One way is what datacamp showed, using data[[col]]. Another is the $ accessor, as in data$col. The latter does not substitute variables from functions on the fly. It's looking for a column literally called "col", and the error is reporting that it found no such column. On the other hand, the way datacamp accesses these columns, it was able to find "trip_distance", "total_amount", and "passenger_count".

R sum multi columns in df technique [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
I try to get sum for each of the Flags 1-3 in my dataframe and keep same column names, so I get single row , but looks like I missing some df/Numeric conversion here, can you please advice , not sure whey I get dim(dfs) = NULL??
df <- data.frame(label=2017, F1=1:4, F2=2:5, F3=3:6)
df
dfs <- c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3))
#dfs <- data.frame(c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3)) )
dfs
str(dfs)
dim(dfs)
colnames(dfs) <-c('Label', 'F1','F2','F3')
## Error in `colnames<-`(`*tmp*`, value = c("Label", "F1", "F2", "F3")) :
## attempt to set 'colnames' on an object with less than two dimensions
Your c() creates a vector, not a data frame. If you convert your vector to a one-row data frame with as.data.frame(t(dfs)), you'll be able to set the column names.
You might also be interested in colSums(), or maybe even the How to sum variables by group? R-FAQ.

Resources