I am trying to select a column from a dataframe using a variable as a column name, with the problem that the column name is escaped. I have a couple of workarounds for doing it, which involve changing my code a bit too much, and anyway I've been looking around and I am curious if anybody knew the solution for this kind of weird case.
My dataset is actually a list of time series (which I construct after some operations), this would be a toy example.
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
> df
$`01/19/17`
[1] 1 2 3 4 5 6 7 8 9 10
$`01/20/17`
[1] 2 3 4 5 6 7 8 9 10 11
I don't put the escapes ` in the column names because I want to, but because they come as dates from the process I follow to construct the dataset.
If I know the column name I can access like this,
df$`01/19/17`
If I want to use a variable, looking around e.g. here I see I could rewrite it to something like this,
`$`(df, `01/19/17`)
But I cannot assign a variable like this,
> name1 <- `01/19/17`
Error: object '01/19/17' not found
and if assign it this other way I get a NULL,
> name1 <- "01/19/17"
> `$`(df, name1)
NULL
As I say there are workarounds like e.g. changing all the column names in the list of series, but I just would like to know. Thank you so much.
You can access with brackets rather than with $, even when the key is a string:
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
name1 <- "01/19/17"
df[[name1]]
# [1] 1 2 3 4 5 6 7 8 9 10
Related
I have 20 excel files containing city level data for each year. I imported them in a list because I thought it will be easier to loop over them.
The first task that I wanted to do is to change the name of the second column of each file.
If, for a single file I do:
#data is a list of data tables/frames. Example:
data<-list(a = data.frame(1:2,3:4),b = data.frame(5:8,15:18) )
#renaming first column of a (works)
names(data[[1]])[2]<-"ABC"
I am able to rename the column.
To do batch editing I wanted to write a function to be used in lapply. The function should be a simple version of the above thing:
rename <-function(df){
names(df)[2]<-"XYZ"}
Rename(data[[1]]) however, does nothing to the second column. Any ideas why?
You need to return the full modified object at each iteration:
data <- lapply( data, function(x) {names(x)[2]<-"ABC"; x})
data
#---------
[[1]]
X1.2 ABC
1 1 3
2 2 4
[[2]]
X5.8 ABC
1 5 15
2 6 16
3 7 17
4 8 18
I'm sure this is a duplicate but I don't know what the right search terms might be, so I'm just answering it .... again.
I have a data frame make like this:
data.names<-data.frame(DATA=c(1:5))
rownames(data.names)<-c("IV\xc1N","JOS\xc9","LUC\xcdA","RAM\xd3N","TO\xd1O")
data.names
# DATA
# IV\xc1N 1
# JOS\xc9 2
# LUC\xcdA 3
# RAM\xd3N 4
# TO\xd1O 5
I want the incorrect letters replace by the right ones (Á,É,Í,...). Make clear that I want to use apply because I read that is much more efficient apply than for. My idea is make a function that changes these letters:
letters1<-c("\xc1","\xc9","\xcd","\xd3", "\xd1") #Á,É,Í,Ó,Ñ
letters2<-c("Á","É","Í","Ó","Ñ")
change.names <- function(x){sub(letters1[x], letters2[x],rownames(data.names))}
Now, with a for I haven't any problems:
for(i in 1:5) rownames(data.names)<-change.names(i)
data.names
# DATA
# IVÁN 1
# JOSÉ 2
# LUCÍA 3
# RAMÓN 4
# TOÑO 5
But I don't have much idea how to do it with apply. I've tried:
apply(matrix(c(1:5),ncol=5),2,change.names)
And the output is a matrix with 5 columns, where each one only changes one letter and I can't know how to assign to rownames(data.names) a "mix" of them, or something that works.
You don't even need to use apply, because rownames(data.names) is a vector and vectors may be recycled
> Encoding(rownames(data.names)) <- 'latin1'
> data.names
DATA
IVÁN 1
JOSÉ 2
LUCÍA 3
RAMÓN 4
TOÑO 5
Please read this answer for more details about the encoding.
I have got the following problem. I have a data.frame with an x and y column representing some points in space:
X<-c(18.25743,18.25783,18.25823,18.25850,18.25863,18.25878,
18.25885,18.25912,18.25943,18.25962,18.25978,18.26000,
18.26022,18.26051,18.26070,18.26095,18.26118,18.26140,
18.26189,18.26250,18.26310,18.26390)
Y<-c(44.69561,44.69564,44.69567,44.69567,44.69586,
44.69600,44.69637,44.69671,44.69691,44.69701,44.69720,
44.69740,44.69763,44.69774,44.69787,44.69790,44.69791,
44.69795,44.69812,44.69802,44.69812,44.69834)
eDF<-data.frame(X,Y)
Now my problem is they are "sorted" wrong for plotting.So what I need is a function to write together the rows of the two points which belong together (in a list of lists):
1 and 12 is ID1
2 and 13 is ID2
3 and 14 is ID3
...
11 and 22 is ID11
Every so created list within the list of lists should have its unique ID (just numerating from 1 to the end). Well because I got this problem in all my data with different length.
It would be great if the starting point of the second consecutive row selecting (the 12) is flexible always taking the first row after half of the data.((rownumber/2)+1) in this example
12.
Well I have tried some things and i think Im on the right way but I cant figure out a solution by myself.
This function is pretty near but i cant manage to make it start at different rows(1 and 12):
lapply(2:nrow(eDF), function(x) eDF[(x-1):x,])
I also tried to figure it out with seq and it would do what i need if i could make a list of lists by connecting both code samples. Well I also need to change the concrete start and end numbers to a dynamic solution.
eDF[(seq(1,to=11,by=1)),] # selecting rows 1 to 11
eDF[(seq(12,to=nrow(eDF),by=1)),] #selecting rows 12 to end
Anyone any ideas?
I don't know if you needed an ID column inside of the new list but another way would be:
#create the IDs
eDF$ID <- rep(1:11,2)
#split the data.frame according to those
mylist <- split(eDF, eDF$ID)
Output:
mylist
$`1`
X Y ID
1 18.25743 44.69561 1
12 18.26000 44.69740 1
$`2`
X Y ID
2 18.25783 44.69564 2
13 18.26022 44.69763 2
$`3`
X Y ID
3 18.25823 44.69567 3
14 18.26051 44.69774 3
$`4`
X Y ID
4 18.2585 44.69567 4
15 18.2607 44.69787 4
#and so on...
You could only do split(eDF, rep(1:11,2) if you don't need the ID column.
We can modify the OP's lapply code
lapply(1:11, function(i) eDF[c(i, i+11),])
I'm a bit confused on the filtering scheme on an R data frame.
For example, let's say we have the following data frame titled dframe:
> str(dframe)
'data.frame': 143 obs. of 3 variables:
$ Year : int 1999 2005 2007 2008 2009 2010 2005 2006 2007 2008 ...
$ Name : Factor w/ 18 levels "AADAM","AADEN",..: 1 1 2 2 2 2 3 3 3 3 ...
$ Frequency: int 5 6 10 34 38 12 10 6 10 5 ...
Now if I want to filter dframe where the values of Name is of "AADAM", the proper filter is:
dframe[dframe$Name=="AADAM",]
The part where I'm confused is why the comma doesn't come first. Why isn't it this: dframe[,dframe$Name=="AARUSH"]
UPDATE: You clarified your question is really "Please give examples of what sort of logical expressions are valid for filtering columns?"
I agree with you the syntax appears weird initially, but it has the following logic.
The bottom line is that column-filter expressions are typically less rich and expressive than row-filtering expressions, and in particular you can't chain logical indexing the way you do with rows.
Best way is to think of indexing expressions as the general form:
dframe[<row-index-expression>,<col-index-expression>]
where either index-expression is optional, so you can just do one and we (crucially!) need the comma to disambiguate whether it's row- or column-indexing:
dframe[<row-index-expression>,] # such as dframe[dframe$Name=="ADAM",]
dframe[,<col-index-expression>]
Before we look at examples of col-index-expression and what's valid (and invalid) to include in one, let's review and discuss how R does indexing - I had the same confusion when I started with it.
In this example, you have three columns. You can refer to them by their string names 'Year','Name','Frequency'. You can also refer to them by column indices 1,2,3 where the numbers 1,2,3 correspond to the entries colnames(dframe). R does indexing using the '[' operator, also the '[[' operator. Here are some valid examples of ways to index column-indexing:
dframe[,2] # column 2 / Name
dframe[,'Name'] # column 2 / Name
dframe[,c('Name','Frequency')] # string vector - very common
dframe[,c(2,3)] # integer vector - also very common
dframe[,c(F,T,T)] # logical vector - very rarely seen, and a pain in the butt to compute
Now, if you choose to use a logical expression for the column-index, it must be a valid expression without using column names - inside a column it doesn't know their own names.
Suppose you wanted to dynamically filter "give me only the factor columns from dframe".
Something like:
unlist(apply(dframe[1,1:3], 2, is.factor), use.names=F) # except I can't seem to remove the colnames
For more help and examples on indexing look at the '[' operator help-page:
Type ?'['
dframe[,dframe$Name=="ADAM"] is invalid attempt at column-indexing because the columns know nothing about Name=="ADAM"
Addendum: code to generate example dataframe (because you didn't dump us a dput output)
set.seed(123)
N = 10
randomName <- function() { cat(sample(letters, size=runif(1)*6+2, replace=T), sep='') }
dframe = data.frame(Year=round(runif(N,1980,2014)),
Name = as.factor(replicate(N, randomName())),
Frequency=round(runif(N, 2,40)))
You have to remember that when you're sub-setting, the part before the comma is specifying which rows you want, and the part after the comma is specifying which columns you want. ie:
dframe[rowsyouwant, columnsyouwant]
You're filtering based on columns, but you want all of the columns in your result, so the space after the comma is blank. You want some sub-set of rows, so your filtering specification goes before the comma, where the rows you want are specified.
As others have indicated, requesting a certain subset of a data frame requires the syntax [rows, columns]. Since dframe[has 143 rows, has 3 columns], any request for some part of dframe should be of the form
dframe[which of the 143 rows do I want?, which of the 3 columns do I want?].
Because dframe$Name is a vector of length 143, the comparison dframe$Name=='AADAM' is a vector of T/F values that also has length 143. So,
dframe[dframe$Name=='AADAM',]
is like saying
dframe[of the 143 rows I want these ones, I want all columns]
whereas
dframe[,dframe$Name=='AADAM']
generates an error because it's like saying
dframe[I want all rows, of the 143 columns I want these ones]
On a side note, you may want to look into the subset() function if you're not already familiar with it. You could get the same result by writing subset(dframe, Name=='AADAM')
As others have said, the structure within brackets is row, then column.
One way I think of the syntax of selecting data from a data.frame using:
dframe[dframe$Name=="AADAM",]
is to think of a noun, then a verb where:
dframe[] is the noun. It is the object on which you want to perform an action
and
[dframe$Name=="AADAM",] is the verb. It is the action you want to perform.
I have a silly way of expressing this to myself, but it keeps things straight in my mind:
Hey, you! dframe! I am going to... ...in this case, select all of your rows in which Name is equal to AADAM!
By keeping the column portion of [dframe$Name=="AADAM",] blank you are saying you want to keep all columns.
Sometimes it can be a little difficult to remember that you have to write dframe both inside and outside the brackets.
As for exactly why row comes first and column comes second, I do not know, but row had to be either first or second.
dframe <- read.table(text = '
Year Name Frequency
1 ADAM 4
3 BOB 10
7 SALLY 5
2 ADAM 12
4 JIM 3
12 ADAM 7
', header = TRUE)
dframe[,dframe$Name=="ADAM"]
# Error in `[.data.frame`(dframe, , dframe$Name == "ADAM") :
# undefined columns selected
dframe[dframe$Name=="ADAM",]
# Year Name Frequency
# 1 1 ADAM 4
# 4 2 ADAM 12
# 6 12 ADAM 7
dframe[,'Name']
# [1] ADAM BOB SALLY ADAM JIM ADAM
# Levels: ADAM BOB JIM SALLY
dframe[dframe$Name=="ADAM",'Name']
# [1] ADAM ADAM ADAM
# Levels: ADAM BOB JIM SALLY
I have a really simple question that I cannot find a straightforward answer for. I have a data.frame that looks like this:
df3 <- data.frame(x=c(1:10),y=c(5:14),z=c(25:34))
ID x y z
1 1 5 25
2 2 6 26
3 3 7 27
etc.
And I want to 'paste' together the different values in each column so that they form a single, combined value, as in:
ID x+y+z
1 1525
2 2626
3 3727
I'm sure that this is very easy to do, but I just don't know how!
Yep, paste() is exactly what you want to do:
df3$xyz <- with(df3, paste(x,y,z, sep=""))
# Or, if you want the result to be numeric, rather than character
df3$xyz <- as.numeric(with(df3, paste(x,y,z, sep="")))