How to use logical values to access elements of data frame - r

Say I have this data frame:
x <- data.frame(matrix(rep(1:5, each=5), nrow=5))
Say I want to square all values that are greater than 3 and put these values back into the x.
I identify the values that are greater than 3 by:
x > 3
Then how can I reference these values in x? Doing x[x>3] returns a vector of integers, not a data frame.
Note that I am more so interested in this particular problem of x[x>3] and not as much the actual application that I included simply as motivation.

Just use matrix indexing:
ind <- which(x > 3, arr.ind = TRUE)
x[ind] <- x[ind] * x[ind] ## or x[ind] <- x[ind]^2
x
# X1 X2 X3 X4 X5
# 1 1 2 3 16 25
# 2 1 2 3 16 25
# 3 1 2 3 16 25
# 4 1 2 3 16 25
# 5 1 2 3 16 25
Alternatively, you can do replace(x, x > 3, x[x > 3]^2), but remember that this doesn't actually modify your "x" object so it needs to be reassigned.

Or,
> x[x>3] <- (x[x>3])^2
> x
X1 X2 X3 X4 X5
1 1 2 3 16 25
2 1 2 3 16 25
3 1 2 3 16 25
4 1 2 3 16 25
5 1 2 3 16 25

Related

Add ID column to a list of data frames

I have a list of 142 dataframes file_content and a list from id_list <- list(as.character(1:length(file_content)))
I am trying to add a new column period to each data frame in file_content.
All data frames are similar to 2021-03-16 below.
`2021-03-16` <- file_content[[1]] # take a look at 1/142 dataframes in file_content
head(`2021-03-16`)
author_id created_at id tweet
1 3.304380e+09 2018-12-01 22:58:55+00:00 1.069003e+18 #Acosta I hope he didn’t really say “muckâ€\u009d.
2 5.291559e+08 2018-12-01 22:57:31+00:00 1.069003e+18 #Acosta I like Mattis, but why does he only speak this way when Individual-1 isn't around?
3 2.195313e+09 2018-12-01 22:56:41+00:00 1.069002e+18 #Acosta What did Mattis say about the informal conversation between Trump and Putin at the G20?
4 3.704188e+07 2018-12-01 22:56:41+00:00 1.069002e+18 #Acosta Good! Tree huggers be damned!
5 1.068995e+18 2018-12-01 22:56:11+00:00 1.069002e+18 #Acosta #NinerMBA_01
6 9.983321e+17 2018-12-01 22:55:13+00:00 1.069002e+18 #Acosta Really?
I have tried to add the period column using the following code but it adds all 142 values from the id_list to every row in every data frame in file_content.
for (id in length(id_list)) {
file_content <- lapply(file_content, function(x) { x$period <- paste(id_list[id], sep = "_"); x })
}
You were close, the mistake is you need double brackets in id_list[[id]].
for (id in length(id_list)) {
file_content <- lapply(file_content, function(x) {
x$period <- paste(id_list[[id]], sep = "_")
x
})
}
# $`1`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`2`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`3`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
You could also try Map() and save a few lines.
Map(`[<-`, file_content, 'period', value=id_list)
# $`1`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`2`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
#
# $`3`
# X1 X2 X3 X4 period
# 1 1 4 7 10 1
# 2 2 5 8 11 2
# 3 3 6 9 12 3
Data:
file_content <- replicate(3, data.frame(matrix(1:12, 3, 4)), simplify=F) |> setNames(1:3)
id_list <- list(as.character(1:length(file_content)))
We may use imap
library(purrr)
library(dplyr)
imap(file_content, ~ .x %>%
mutate(period = .y))
Or with Map from base R
Map(cbind, file_content, period = names(file_content))
In the OP's code, the id_list is created as a single list element by wrapping with list i.e.
list(1:5)
vs
as.list(1:5)
Here, we don't need to convert to list as a vector is enough
id_list <- seq_along(file_content)
Also, the for loop is looping on a single element i.e. the last element with length
for (id in length(id_list)) {
^^
instead, it would be 1:length. In addition, the assignment should be on the single list element file_content[[id]] and not on the entire list
for(id in seq_along(id_list)) {
file_content[[id]]$period <- id_list[id]
}

How to replace the rows in data

Hello I have a table with 5 columns. One of the column X is:
x <- c(1,1,1,1,1,1,2,2,2,3)
How can I change the order of numbers in vector X, for example on the first place put 3s, on the second place put 1s and on the third place put 2s. The output should be in format like:
x <- c(3,1,1,1,1,1,1,2,2,2)
And replace not only the values in the column X but all other rows for each number of X
To clarify the question:
X(old version) -> X(new version)
1 2
2 3
3 1
So, If X=1 make it X=2
If X=2 make it X=3
If X=3 make it X=1
And if for example we change X=1 to X=2 we should put all the rows for X=1 to X=2
I have two vectors:
x <- c(1,1,1,1,1,1,2,2,2,3)
z <- c(10,10,10,10,10,10,20,20,20,30)
The desired output:
x z
1 30
2 10
2 10
2 10
2 10
2 10
2 10
3 20
3 20
3 20
You could
x1 <-c(2,3,1)[x]
x[order(x1)]
# [1] 3 1 1 1 1 1 1 2 2 2
or
x[order(chartr(old="123",new="231",x))]
#[1] 3 1 1 1 1 1 1 2 2 2
Update
If you have many columns.
x <- c(1,1,1,1,1,1,2,2,2,3)
z <- c(10,10,10,10,10,10,20,20,20,30)
set.seed(14)
y <- matrix(sample(25,10*3,replace=TRUE),ncol=3)
m1 <- as.data.frame(cbind(x,z,y))
x1 <- c(2,3,1)[m1$x]
x1
# [1] 2 2 2 2 2 2 3 3 3 1
res <- cbind(x=c(2,3,1)[m1$x[order(x1)]],subset(m1[order(x1),], select=-x))
res
# x z V3 V4 V5
#10 1 30 10 15 2
#1 2 10 7 23 9
#2 2 10 16 5 11
#3 2 10 24 12 16
#4 2 10 14 22 18
#5 2 10 25 22 19
#6 2 10 13 19 16
#7 3 20 24 9 10
#8 3 20 11 17 14
#9 3 20 13 22 18
If I'm understanding correctly, it sounds as though you want to define your own order for sorting something. Is that right? Two ways you could do that:
Option #1: Make another column in your data.frame and assign values in the order you'd like. If you wanted the threes to come first, the ones to come second and the twos to come third, you'd do this:
Data$y <- rep(NA, nrow(Data)
Data$y[Data$x == 3] <- 1
Data$y[Data$x == 1] <- 2
Data$y[Data$x == 2] <- 3
Then you can sort on y and your data.frame will have the order you want.
Option #2: If the numbers you list in x are levels in a factor, you could do this using plyr:
library(plyr)
Data$x <- revalue(Data$x, c("3" = "1", "1" = "2", "2" = "3"))
Personally, I think that the 2nd option would be rather confusing, but if you are using "1", "2", and "3" to refer to levels in a factor, that is one quick way to change things.

Parsing Delimited Data In a DataFrame Into Separate Columns in R

I have a data frame which looks as such
A B C
1 3 X1=7;X2=8;X3=9
2 4 X1=10;X2=11;X3=12
5 6 X1=13;X2=14
I would like to parse the C column into separate columns as such...
A B X1 X2 X3
1 3 7 8 9
2 4 10 11 12
5 6 13 14 NA
How would one go about doing this in R?
First, here's the sample data in data.frame form
dd<-data.frame(
A = c(1L, 2L, 5L),
B = c(3L, 4L, 6L),
C = c("X1=7;X2=8;X3=9",
"X1=10;X2=11;X3=12", "X1=13;X2=14"),
stringsAsFactors=F
)
Now I define a small helper function to take vectors like c("A=1","B=2") and changed them into named vectors like c(A="1", B="2").
namev<-function(x) {
a<-strsplit(x,"=")
setNames(sapply(a,'[',2), sapply(a,'[',1))
}
and now I perform the transformations
#turn each row into a named vector
vv<-lapply(strsplit(dd$C,";"), namev)
#find list of all column names
nm<-unique(unlist(sapply(vv, names)))
#extract data from all rows for every column
nv<-do.call(rbind, lapply(vv, '[', nm))
#convert everything to numeric (optional)
class(nv)<-"numeric"
#rejoin with original data
cbind(dd[,-3], nv)
and that gives you
A B X1 X2 X3
1 1 3 7 8 9
2 2 4 10 11 12
3 5 6 13 14 NA
My cSplit function makes solving problems like these fun. Here it is in action:
## Load some packages
library(data.table)
library(devtools) ## Just for source_gist, really
library(reshape2)
## Load `cSplit`
source_gist("https://gist.github.com/mrdwab/11380733")
First, split your values up and create a "long" dataset:
ddL <- cSplit(cSplit(dd, "C", ";", "long"), "C", "=")
ddL
# A B C_1 C_2
# 1: 1 3 X1 7
# 2: 1 3 X2 8
# 3: 1 3 X3 9
# 4: 2 4 X1 10
# 5: 2 4 X2 11
# 6: 2 4 X3 12
# 7: 5 6 X1 13
# 8: 5 6 X2 14
Next, use dcast.data.table (or just dcast) to go from "long" to "wide":
dcast.data.table(ddL, A + B ~ C_1, value.var="C_2")
# A B X1 X2 X3
# 1: 1 3 7 8 9
# 2: 2 4 10 11 12
# 3: 5 6 13 14 NA
Here's one possible approach:
dat <- read.table(text="A B C
1 3 X1=7;X2=8;X3=9
2 4 X1=10;X2=11;X3=12
5 6 X1=13;X2=14", header=TRUE, stringsAsFactors = FALSE)
library(qdapTools)
dat_C <- strsplit(dat$C, ";")
dat_C2 <- sapply(dat_C, function(x) {
y <- strsplit(x, "=")
rep(sapply(y, "[", 1), as.numeric(sapply(y, "[", 2)))
})
data.frame(dat[, -3], mtabulate(dat_C2))
## A B X1 X2 X3
## 1 1 3 7 8 9
## 2 2 4 10 11 12
## 3 5 6 13 14 0
EDIT To obtain the NA values
m <- mtabulate(dat_C2)
m[m==0] <- NA
data.frame(dat[, -3], m)
Here's a nice, somewhat hacky way to get you there.
## read your data
> dat <- read.table(h=T, text = "A B C
1 3 X1=7;X2=8;X3=9
2 4 X1=10;X2=11;X3=12
5 6 X1=13;X2=14", stringsAsFactors = FALSE)
## ---
> s <- strsplit(dat$C, ";|=")
> xx <- unique(unlist(s)[grepl('[A-Z]', unlist(s))])
> sap <- t(sapply(seq(s), function(i){
wh <- which(!xx %in% s[[i]]); n <- suppressWarnings(as.numeric(s[[i]]))
nn <- n[!is.na(n)]; if(length(wh)){ append(nn, NA, wh-1) } else { nn }
})) ## see below for explanation
> data.frame(dat[1:2], sap)
# A B X1 X2 X3
# 1 1 3 7 8 9
# 2 2 4 10 11 12
# 3 5 6 13 14 NA
Basically what's happening in sap is
check which values are missing
change each list element of s to numeric
remove the NA values from (2)
insert NA into the correct position with append
transpose the result

Keep columns of a data frame based on a data frame

I have a data frame, called df, which contains 4000 values. I have a list of 1000 column numbers, in a data frame called list, which is 1000 rows by 1 column. How can I keep the rows with the numbers in list in the data frame df and throw the rest out. I already tried using:
listv <- as.vector(list)
and then using
dfnew <- df[,listv]
but I get the error
Error in .subset(x, j) : invalid subscript type 'list'
You're mixing up rows and columns subsetting. Here is a minimal example:
df <- data.frame(matrix(1:21, ncol = 3))
df
# X1 X2 X3
# 1 1 8 15
# 2 2 9 16
# 3 3 10 17
# 4 4 11 18
# 5 5 12 19
# 6 6 13 20
# 7 7 14 21
list <- data.frame(V1 = c(1, 4, 6))
list
# V1
# 1 1
# 2 4
# 3 6
df[list[, 1], ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
df[unlist(list), ]
# X1 X2 X3
# 1 1 8 15
# 4 4 11 18
# 6 6 13 20
Note also that as.vector(list) doesn't create a vector, as you thought it would. You need unlist here (as I used in the last example).

Quickly generate the cartesian product of a matrix

Let's say I have a matrix x which contains 10 rows and 2 columns. I want to generate a new matrix M that contains each unique pair of rows from x - that is, a new matrix with 55 rows and 4 columns.
E.g.,
x <- matrix (nrow=10, ncol=2, 1:20)
M <- data.frame(matrix(ncol=4, nrow=55))
k <- 1
for (i in 1:nrow(x))
for (j in i:nrow(x))
{
M[k,] <- unlist(cbind (x[i,], x[j,]))
k <- k + 1
}
So, x is:
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
And then M has 4 columns, the first two are one row from x and the next 2 are another row from x:
> head(M,10)
X1 X2 X3 X4
1 1 11 1 11
2 1 11 2 12
3 1 11 3 13
4 1 11 4 14
5 1 11 5 15
6 1 11 6 16
7 1 11 7 17
8 1 11 8 18
9 1 11 9 19
10 1 11 10 20
Is there either a faster or simpler (or both) way of doing this in R?
The expand.grid() function useful for this:
R> GG <- expand.grid(1:10,1:10)
R> GG <- GG[GG[,1]>=GG[,2],] # trim it to your 55 pairs
R> dim(GG)
[1] 55 2
R> head(GG)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
R>
Now you have the 'n*(n+1)/2' subsets and you can simple index your original matrix.
I'm not quite grokking what you are doing so I'll just throw out something that may, or may not help.
Here's what I think of as the Cartesian product of the two columns:
expand.grid(x[,1],x[,2])
You can also try the "relations" package. Here is the vignette. It should work like this:
relation_table(x %><% x)
Using Dirk's answer:
idx <- expand.grid(1:nrow(x), 1:nrow(x))
idx<-idx[idx[,1] >= idx[,2],]
N <- cbind(x[idx[,2],], x[idx[,1],])
> all(M == N)
[1] TRUE
Thanks everyone!
Inspired from the other answers, here is a function implementing cartesian product of two matrices, in the case of two matrices, the full cartesian product, for only one argument, omitting one of each pair:
cartesian_prod <- function(M1, M2) {
if(missing(M2)) { M2 <- M1
ind <- expand.grid(1:NROW(M1), 1:NROW(M2))
ind <- ind[ind[,1] >= ind[,2],] } else {
ind <- expand.grid(1:NROW(M1), 1:NROW(M2))}
rbind(cbind(M1[ind[,1],], M2[ind[,2],]))
}

Resources