I need to include different numbers (result of a loop function) in the name of a data frame which I am trying to read at the same time. However, I do not know how.
Example, of what I am looking for are names of these dataframes:
data.1
data.2
data.3
An example of what I tried to do (which did not work, but illustrates better my question) is:
for (a in 1:3) data.(a) <- read.csv2(file.csv, header = TRUE)
Is it possible to include different numbers in the names of dataframes? If yes, how, please?
Sorry for beginner's question, but I have not found it anywhere.
Although I agree with joran in this case, sometimes it can be useful to use the assign() function, for instance as follows:
for (i in 1:3) {
assign(paste0("data.", i), i)
}
This results in the following:
> ls()
[1] "data.1" "data.2" "data.3" "i"
> data.1
[1] 1
> data.2
[1] 2
> data.3
[1] 3
Related
I am a beginner at R coming from Stata and my first head ache is to figure out how I can loop over a list of names conducting the same operation to all names. The names are variables coming from a data frame. I tried defining a list in this way: mylist<- c("df$name1", "df$name2") and then I tried: for (i in mylist) { i } which I hoped would be equivalent to writing df$name1 and then df$name2 to make R print the content of the variables with the names name1 and name2 from the data frame df. I tried other commands like deleting a variable i=NULL within the for command, but that didn't work either. I would greatly appreciate if someone could tell me what am I doing wrong? I wonder if it has somethign to do with the way I write the i, maybe R does not interpret it to mean the elements of my character vector.
For more clarification I will write out the code I would use for Stata in this instance. Instead of asking Stata to print the content of a variable I am asking it to give summary statistics of a variable i.e. the no. of observations, mean, standard deviation and min and max using the summarize command. In Stata I don't need to refer to the dataframe as I ususally have only one dataset in memory and I need only write:
foreach i in name1 name2 { #name1 and name2 being the names of the variables
summarize `i'
}
So far, I don't manage to do the same thing using the for function in R, which I naivly thought would be:
mylist<-c("df$name1", "df$name2")
for (i in mylist) {
summary(i)
}
you probably just need to print the name to see it. For example, if we have a data frame like this:
df <- data.frame("A" = "a", "B" = "b", "C" = "c")
df
# > A B C
# > 1 a b c
names(df)
# "A" "B" "C"
We can operate on the names using a for loop on the names(df) vector (no need to define a special list).
for (name in names(df)){
print(name)
# your code here
}
R is a little more reticent to let you use strings/locals as code than Stata is. You can do it with functions like eval but in general that's not the ideal way to do it.
In the case of variable names, though, you're in luck, as you can use a string to pull out a variable from a data.frame with [[]]. For example:
df <- data.frame(a = 1:10,
b = 11:20,
c = 21:30)
for (i in c('a','b')) {
print(i)
print(summary(df[[i]]))
}
Notes:
if you want an object printed from inside a for loop you need to use print().
I'm assuming that you're using the summary() function just as an example and so need the loop. But if you really just want a summary of each variable, summary(df) will do them all, or summary(df[,c('a','b')]) to just do a and b. Or check out the stargazer() function in the stargazer package, which has defaults that will feel pretty comfortable for a Stata user.
I want to get the number in order of the column in a dataframe.
df <- data.frame(item = rep(c('a','b','c'), 3),
year = rep(c('2010','2011','2012'), each=3),
count = c(1,4,6,3,8,3,5,7,9))
Lets say the function i am looking for is columnorder. I want to have this result
x <- columnorder(df$count)
x
> 3
x <- columnorder(df$item)
x
> 1
It seems like a basic task but I couldn't find the answer until now. I will appreciate your help. Thank you
You said,
It seems like a basic task but I couldn't find the answer until now.
In the general sense what you are trying to do -- translate a column name into a column index -- is basic, and a pretty common question. However, the particular scenario you describe above, where your input is of the form object_name$column_name, is atypical WRT what you are trying to achieve, which is most likely why you haven't found an existing solution.
In short, the problem is that when you pass an argument as df$count, you may as well just have used c(1,4,6,3,8,3,5,7,9) instead, because df$count will be evaluated as c(1,4,6,3,8,3,5,7,9). Of course, R does allow for a fair bit of metaprogramming, so with a little extra work, this could be implemented as, for example
column_order <- function(expr) {
x <- strsplit(deparse(substitute(expr)), "$", TRUE)[[1]]
match(x[2], names(get(x[1])))
}
column_order(df$item)
#[1] 1
column_order(df$year)
#[1] 2
column_order(df$count)
#[1] 3
But as I said above, this is an atypical interface for what you are ultimately trying to do. A much more standard approach would be for this function to accept the column name (typically as a string) and the target object as arguments, in which case the solution is much simpler:
column_order2 <- function(col, obj) match(col, names(obj))
column_order2("item", df)
#[1] 1
column_order2("year", df)
#[1] 2
column_order2("count", df)
#[1] 3
As proposed in the comments by #mtoto, here is one solution:
x <- which(colnames(df) == "count")
I need to run through a large data frame and extract a vector with the name of the variables that are numeric type.
I've got stuck in my code, perhaps someone could point me to a solution.
This is how far I have got:
numericVarNames <- function(df) {
numeric_vars<-c()
for (i in colnames(df)) {
if (is.numeric(df[i])) {
numeric_vars <- c(numeric_vars, colnames(df)[i])
message(numeric_vars[i])
}
}
return(numeric_vars)
}
To run it:
teste <-numericVarNames(semWellComb)
The is.numeric assertion is not working. There is something wrong with my syntax for catching the type of each column. What is wrong?
Rather than a looping function, how about
df <- data.frame(a = c(1,2,3),
b = c("a","b","c"),
c = c(4,5,6))
## names(df)[sapply(df, class) == "numeric"]
## updated to be 'safer'
names(df)[sapply(df, is.numeric)]
[1] "a" "c"
## As variables can have multiple classes
This question is worth a read
Without test data it is hard to be sure, but it looks like there is just a "grammar" issue in your code.
You wrote:
numeric_vars <- c(numeric_vars, colnames(df)[i])
The way to get the column name into the concatenated list is to include the whole referred to subset in the parentheses:
numeric_vars <- c(numeric_vars, colnames(df[i]))
Try running it with that one change and see what you get.
Does anyone why the result of the following code is different?
a <- cbind(1:10,1:10)
b <- a
colnames(a) <- c("a","b")
colnames(b) <- c("c","d")
colnames(cbind(a,b))
> [1] "a" "b" "c" "d"
colnames(cbind(ts(a),ts(b)))
> [1] "ts(a).a" "ts(a).b" "ts(b).c" "ts(b).d"
Is this or compatibility reasons? Cbind for xts and zoo does not have this feature.
I always accepted this as given, but now my code is littered with the following:
ca<-colnames(a)
cb<-colnames(b)
out <- cbind(a,b)
colnames(out) <- c(ca,cb)
This is just what the cbind.ts method does. You can see the relevant code via stats:::cbind.ts, stats:::.cbind.ts, and stats:::.makeNamesTs.
I can't explain why it was made to be different, since I didn't write it, but here's a work-around.
cbts <- function(...) {
dots <- list(...)
ists <- sapply(dots,is.ts)
if(!all(ists)) stop("argument ", which(!ists), " is not a ts object")
do.call(cbind,unlist(lapply(dots,as.list),recursive=FALSE))
}
I take it that you're interested in why this happens.
Taking a look at the body of stats:::.cbind.ts, which is the function that does column binding for time series, shows that naming is performed by .makeNamesTs. Taking a look at stats:::.make.Names.Ts reveals that the names are derived directly from the arguments you pass to cbind, and there is no obvious way to influence this. As an example, try:
cbind(ts(a),ts(b, start = 2))
You will find that the start specification of the second time series appears in the name of the respective columns.
As to why that's the way things are ... I can't help you there!
I am just beginning to learn R and am having an issue that is leaving me fairly confused. My goal is to create an empty vector and append elements to it. Seems easy enough, but solutions that I have seen on stackoverflow don't seem to be working.
To wit,
> a <- numeric()
> append(a,1)
[1] 1
> a
numeric(0)
I can't quite figure out what I'm doing wrong. Anyone want to help a newbie?
append does something that is somewhat different from what you are thinking. See ?append.
In particular, note that append does not modify its argument. It returns the result.
You want the function c:
> a <- numeric()
> a <- c(a, 1)
> a
[1] 1
Your a vector is not being passed by reference, so when it is modified you have to store it back into a. You cannot access a and expect it to be updated.
You just need to assign the return value to your vector, just as Matt did:
> a <- numeric()
> a <- append(a, 1)
> a
[1] 1
Matt is right that c() is preferable (fewer keystrokes and more versatile) though your use of append() is fine.