I have a really simple question that I cannot find a straightforward answer for. I have a data.frame that looks like this:
df3 <- data.frame(x=c(1:10),y=c(5:14),z=c(25:34))
ID x y z
1 1 5 25
2 2 6 26
3 3 7 27
etc.
And I want to 'paste' together the different values in each column so that they form a single, combined value, as in:
ID x+y+z
1 1525
2 2626
3 3727
I'm sure that this is very easy to do, but I just don't know how!
Yep, paste() is exactly what you want to do:
df3$xyz <- with(df3, paste(x,y,z, sep=""))
# Or, if you want the result to be numeric, rather than character
df3$xyz <- as.numeric(with(df3, paste(x,y,z, sep="")))
Related
Say I have a data.frame:
df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333))
A B C
1 10 11 111
2 20 22 222
3 30 33 333
If I select two (or more) columns I get a data.frame:
x <- df[,1:2]
A B
1 10 11
2 20 22
3 30 33
This is what I want. However, if I select only one column I get a numeric vector:
x <- df[,1]
[1] 1 2 3
I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in the case of one column, but does not retain the column name:
x <- as.data.frame(df[,1])
df[, 1]
1 1
2 2
3 3
I don't understand why it behaves like this. In my mind it should not make a difference if I extract one or two or ten columns. IT should either always return a vector (or matrix) or always return a data.frame (with the correct names). what am I missing? thanks!
Note: This is not a duplicate of the question about matrices, as matrix and data.frame are fundamentally different data types in R, and can work differently with dplyr. There are several answers that work with data.frame but not matrix.
Use drop=FALSE
> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30
From the documentation (see ?"[") you can find:
If drop=TRUE the result is coerced to the lowest possible dimension.
Omit the ,:
x <- df[1]
A
1 10
2 20
3 30
From the help page of ?"[":
Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).
A data frame is a list. The columns are its elements.
You can also use subset:
subset(df, select = 1) # by index
subset(df, select = A) # by name
As mentioned in the comments you can also use dplyr::select, but you do not need to quote the variable name:
library(dplyr)
# by name
df %>%
select(A)
# by index
df %>%
select(1)
Say I have a data.frame:
df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333))
A B C
1 10 11 111
2 20 22 222
3 30 33 333
If I select two (or more) columns I get a data.frame:
x <- df[,1:2]
A B
1 10 11
2 20 22
3 30 33
This is what I want. However, if I select only one column I get a numeric vector:
x <- df[,1]
[1] 1 2 3
I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in the case of one column, but does not retain the column name:
x <- as.data.frame(df[,1])
df[, 1]
1 1
2 2
3 3
I don't understand why it behaves like this. In my mind it should not make a difference if I extract one or two or ten columns. IT should either always return a vector (or matrix) or always return a data.frame (with the correct names). what am I missing? thanks!
Note: This is not a duplicate of the question about matrices, as matrix and data.frame are fundamentally different data types in R, and can work differently with dplyr. There are several answers that work with data.frame but not matrix.
Use drop=FALSE
> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30
From the documentation (see ?"[") you can find:
If drop=TRUE the result is coerced to the lowest possible dimension.
Omit the ,:
x <- df[1]
A
1 10
2 20
3 30
From the help page of ?"[":
Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).
A data frame is a list. The columns are its elements.
You can also use subset:
subset(df, select = 1) # by index
subset(df, select = A) # by name
As mentioned in the comments you can also use dplyr::select, but you do not need to quote the variable name:
library(dplyr)
# by name
df %>%
select(A)
# by index
df %>%
select(1)
I am trying to select a column from a dataframe using a variable as a column name, with the problem that the column name is escaped. I have a couple of workarounds for doing it, which involve changing my code a bit too much, and anyway I've been looking around and I am curious if anybody knew the solution for this kind of weird case.
My dataset is actually a list of time series (which I construct after some operations), this would be a toy example.
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
> df
$`01/19/17`
[1] 1 2 3 4 5 6 7 8 9 10
$`01/20/17`
[1] 2 3 4 5 6 7 8 9 10 11
I don't put the escapes ` in the column names because I want to, but because they come as dates from the process I follow to construct the dataset.
If I know the column name I can access like this,
df$`01/19/17`
If I want to use a variable, looking around e.g. here I see I could rewrite it to something like this,
`$`(df, `01/19/17`)
But I cannot assign a variable like this,
> name1 <- `01/19/17`
Error: object '01/19/17' not found
and if assign it this other way I get a NULL,
> name1 <- "01/19/17"
> `$`(df, name1)
NULL
As I say there are workarounds like e.g. changing all the column names in the list of series, but I just would like to know. Thank you so much.
You can access with brackets rather than with $, even when the key is a string:
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
name1 <- "01/19/17"
df[[name1]]
# [1] 1 2 3 4 5 6 7 8 9 10
I have a data frame make like this:
data.names<-data.frame(DATA=c(1:5))
rownames(data.names)<-c("IV\xc1N","JOS\xc9","LUC\xcdA","RAM\xd3N","TO\xd1O")
data.names
# DATA
# IV\xc1N 1
# JOS\xc9 2
# LUC\xcdA 3
# RAM\xd3N 4
# TO\xd1O 5
I want the incorrect letters replace by the right ones (Á,É,Í,...). Make clear that I want to use apply because I read that is much more efficient apply than for. My idea is make a function that changes these letters:
letters1<-c("\xc1","\xc9","\xcd","\xd3", "\xd1") #Á,É,Í,Ó,Ñ
letters2<-c("Á","É","Í","Ó","Ñ")
change.names <- function(x){sub(letters1[x], letters2[x],rownames(data.names))}
Now, with a for I haven't any problems:
for(i in 1:5) rownames(data.names)<-change.names(i)
data.names
# DATA
# IVÁN 1
# JOSÉ 2
# LUCÍA 3
# RAMÓN 4
# TOÑO 5
But I don't have much idea how to do it with apply. I've tried:
apply(matrix(c(1:5),ncol=5),2,change.names)
And the output is a matrix with 5 columns, where each one only changes one letter and I can't know how to assign to rownames(data.names) a "mix" of them, or something that works.
You don't even need to use apply, because rownames(data.names) is a vector and vectors may be recycled
> Encoding(rownames(data.names)) <- 'latin1'
> data.names
DATA
IVÁN 1
JOSÉ 2
LUCÍA 3
RAMÓN 4
TOÑO 5
Please read this answer for more details about the encoding.
Say I have a data.frame:
df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333))
A B C
1 10 11 111
2 20 22 222
3 30 33 333
If I select two (or more) columns I get a data.frame:
x <- df[,1:2]
A B
1 10 11
2 20 22
3 30 33
This is what I want. However, if I select only one column I get a numeric vector:
x <- df[,1]
[1] 1 2 3
I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in the case of one column, but does not retain the column name:
x <- as.data.frame(df[,1])
df[, 1]
1 1
2 2
3 3
I don't understand why it behaves like this. In my mind it should not make a difference if I extract one or two or ten columns. IT should either always return a vector (or matrix) or always return a data.frame (with the correct names). what am I missing? thanks!
Note: This is not a duplicate of the question about matrices, as matrix and data.frame are fundamentally different data types in R, and can work differently with dplyr. There are several answers that work with data.frame but not matrix.
Use drop=FALSE
> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30
From the documentation (see ?"[") you can find:
If drop=TRUE the result is coerced to the lowest possible dimension.
Omit the ,:
x <- df[1]
A
1 10
2 20
3 30
From the help page of ?"[":
Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).
A data frame is a list. The columns are its elements.
You can also use subset:
subset(df, select = 1) # by index
subset(df, select = A) # by name
As mentioned in the comments you can also use dplyr::select, but you do not need to quote the variable name:
library(dplyr)
# by name
df %>%
select(A)
# by index
df %>%
select(1)