writing character array to a table - r

I want to transpose the output given by the last command and write it to a data.frame. I want that dataframe to have 2 columns. First column will have column names and the second column will have data type for the column in each row. How could I achieve it? I tried variety of things but didnt get what I am looking for
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
smoke <- as.data.frame(smoke)
table1=sapply (smoke, class)
table1

You could also skip the table1 part and go straight from smoke to the desired result.
> data.frame(nm = names(smoke), cl = sapply(unname(smoke), class))
# nm cl
# 1 V1 numeric
# 2 V2 numeric
# 3 V3 numeric

You could try this:
data.frame(var.name = names(table1), var.class = table1, row.names=NULL)
# var.name var.class
#1 V1 numeric
#2 V2 numeric
#3 V3 numeric

You might be looking for the melt command.
library(reshape2)
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
smoke <- as.data.frame(smoke)
table1 <- sapply (smoke, class)
smoke.melt <- melt(smoke)
levels(smoke.melt$variable) <- table1
> smoke.melt
variable value
1 numeric 51
2 numeric 92
3 numeric 68
4 numeric 43
5 numeric 28
6 numeric 22
7 numeric 22
8 numeric 21
9 numeric 9

Just convert table1 to data.frame and adjust:
dd = data.frame(table1)
dd
table1
V1 numeric
V2 numeric
V3 numeric
dd$VarName = rownames(dd)
dd
table1 VarName
V1 numeric V1
V2 numeric V2
V3 numeric V3
dd = dd[,c(2,1)]
dd
VarName table1
V1 V1 numeric
V2 V2 numeric
V3 V3 numeric
names(dd)[2] = "type"
dd
VarName type
V1 V1 numeric
V2 V2 numeric
V3 V3 numeric

Related

R Difference with previous column across multiple columns

I have a dataframe like this that resulted from a cumsum of variables:
id v1 v2 v3
1 4 5 9
2 1 1 4
I I would like to get the difference among columns, such as the dataframe is transformed as:
id v1 v2 v3
1 4 1 4
2 1 0 3
So effectively "de-acumulating" the resulting values getting the difference. This is a small example original df is around 150 columns.
Thx!
x <- read.table(header=TRUE, text="
id v1 v2 v3
1 4 5 9
2 1 1 4")
x[,c("v1","v2","v3")] <- cbind(x[,"v1"], t(apply(x[,c("v1","v2","v3")], 1, diff)))
x
# id v1 v2 v3
# 1 1 4 1 4
# 2 2 1 0 3
Explanation:
Up front, a note: when using apply on a data.frame, it converts the argument to a matrix. This means that if you have any character columns in the argument passed to apply, then the entire matrix will be character, likely not what you want. Because of this, it is safer to only select columns you need (and reassign them specifically).
apply(.., MARGIN=1, ...) returns its output in an orientation transposed from what you might expect, so I have to wrap it in t(...).
I'm using diff, which returns a vector of length one shorter than the input, so I'm cbinding the original column to the return from t(apply(...)).
Just as I had to specific about which columns to pass to apply, I'm similarly specific about which columns will be replaced by the return value.
Simple for cycle might do the trick, but for larger data it will be slower that other approaches.
df <- data.frame(id = c(1,2), v1 = c(4,1), v2 = c(5,1))
df2 <- df
for(i in 3:ncol(df)){
df2[,i] <- df[,i] - df[,i-1]
}

Convert data frame into a matrix of "ranked lists" based on unique values in column

Let's say I have a data frame df that looks like this:
df = data.frame(c("A", "A", "B", "B", "C", "D", "D", "D", "E"),
c(0.1, 0.3, 0.1, 0.8, 0.4, 0.7, 0.5, 0.2, 0.1),
c("v1", "v2", "v1", "v3", "v4", "v2", "v3", "v4", "v2"))
colnames(df) = c("entry", "value", "point")
df = df[order(df$entry, -df$value),]
df
entry value point
2 A 0.3 v2
1 A 0.1 v1
4 B 0.8 v3
3 B 0.1 v1
5 C 0.4 v4
6 D 0.7 v2
7 D 0.5 v3
8 D 0.2 v4
9 E 0.1 v2
I would like to convert it eventually into a matrix of "ranked lists", that has as rows the unique values in the entry column and the number of columns should be equal to the maximum number of unique elements in the point column for a given entry. In this example it would be 3. Each row should be populated with the corresponding values from the point column, sorted descendingly based on the corresponding elements in value (e.g., row A should have v2 as value in the first column). In case an entry has less points than the number of columns in the matrix, the rest of the row should be filled with NAs.
So, the expected output should look something like this:
>df
1 2 3
A v2 v1 NA
B v3 v1 NA
C v4 NA NA
D v2 v3 v4
E v2 NA NA
So far I have tried to create some sort of contingency table using
with(df, table(df$point, df$entry))
but of course my actual data is in the order of millions of entries, and the above command raises to huge amounts of RAM even when subsetting to 100 entries with a couple hundreds of unique points. I have also tried
xtabs(~ entry + point, data=df)
with the same results on my real data. Next I have tried to split it into ordered lists using
df = split(df$point, df$entry)
which works fine and it is fast enough, buuuuut.. now I have problems converting it to the result matrix. Something along those lines probably
matrix(sapply(df, function(x) unlist(x)), nrow=length(df), ncol=max(sapply(df, length)))
or first initialize a matrix and do some rbind or something?
res = matrix(NA, nrow=length(df), ncol=max(sapply(df, length)))
rownames(res) = names(df)
....
Can you please assist?
With dplyr:
df %>%
group_by(entry) %>%
mutate(unq=rank(rev(value))) %>%
select(-value) %>%
tidyr::spread(unq,point)
# A tibble: 5 x 4
# Groups: entry [5]
entry `1` `2` `3`
<fct> <fct> <fct> <fct>
1 A v2 v1 NA
2 B v3 v1 NA
3 C v4 NA NA
4 D v2 v3 v4
5 E v2 NA NA
Consider using by to split by entry and build needed vectors. For same length rows in final matrix, add NA as needed where the below 3 can be changed to however many columns required.
vec_list <- by(df, df$entry, function(sub) {
vec <- as.character(sub[order(-sub$value),]$point)
c(vec, rep(NA, 3 - length(vec)))
})
final_matrix <- do.call(rbind, vec_list)
final_matrix
# [,1] [,2] [,3]
# A "v2" "v1" NA
# B "v3" "v1" NA
# C "v4" NA NA
# D "v2" "v3" "v4"
# E "v2" NA NA
Rextester Demo

Checking if a column name exists in another dataset

So I have two different datasets and I am trying to check if a column name has a duplicate column name in another data set. For example:
V1 V2 V3
1 2 3
as one data set and
V4 V6 V1 V2
NA NA NA NA
And I am trying to make it so the second data set is like this
V4 V6 V1 V2
NA NA 1 NA
where only the minimum value in the original data set copies over, if that makes since. I have tried using this function:
if(ncol((Session1t[grep(temp1, names(Session1t))])) != 0)
But this is not working. It returns the same value regardless of what is input. After entering the if statement I then work to copy only the column that I want over,and I have that figured out, I just cannot get the if statement to work effectively.
We can use ifelse and %in% to match column names and replace NA with 1.
# Create example data frame D1
D1 <- read.table(text = "V1 V2 V3
1 2 3",
header = TRUE)
# Create example data frame D2
D2 <- read.table(text = "V4 V6 V1 V2
NA NA NA NA",
header = TRUE)
# Replace NA to 1 if column names match
D2[1, ] <- ifelse(names(D2) %in% names(D1), 1, NA)
D2
# V4 V6 V1 V2
# 1 NA NA 1 1
Or another option is intersect
nm1 <- intersect(names(df1), names(df2))
df2[nm1] <- df1[nm1]

creating a list of dataframes

I have a dataframe that looks like this:
a <- as.data.frame(t(matrix(c('gr1','','','','gr2','','','','','gr3','','',
rep(1,12),rep(2,12)),ncol=3)))
a looks like:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
gr1 gr2 gr3
1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2
Columns V1-V4 belong to gr1, V5-V9 to gr2, and V10-V12 to gr3.
I would like to separate these groups (gr1-gr3) and their corresponding columns and put them all in a list that I can later on loop and do some analysis. So the desired output is:
list1 = (gr1,gr2,gr3), where each of gr1, gr2, and gr3 are a dataframe with their corresponding columns.
We create a grouping variable based on whether the first row element is blank ('') or not. Then, split the column names of 'a' with the 'grp' to a list and then subset the columns and rows (remove the first row) using lapply, change the names of the 'lst' as the 'gr' values that we extract the first row of 'a'.
grp <- cumsum(as.character(unlist(a[1,]))!='')
lst <- lapply(split(names(a), grp), function(nm) a[-1, nm])
nm1 <- as.character(unlist(a[1,]))
names(lst) <- nm1[nzchar(nm1)]
NOTE: The columns in 'a' are factor class due to the presence of the second header ('gr') as the first row. If we need to convert the columns in each data.frame in the 'lst' to numeric,
lapply(lst, function(x) {
x[] <- lapply(x, function(.x) as.numeric(as.character(.x)))
x})

How to swap values between two columns

I have a data frame with three variables and 250K records. As an example consider
df <- data.frame(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
V1 V2 V3
1 a 2
2 a 3
4 b 1
and want to swap values between V1 and V3 based on the value of V2 as follows:
if V2 == 'b' then V1 <- V3 and V3 <- V1
resulting in
V1 V2 V3
1 a 2
2 a 3
1 b 4
I tried a do loop but it takes forever. If I use Perl, it takes seconds. I believe this task can be done efficiently in R as well. Any suggestions are appreciated.
Try this
df <- data.frame(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
df[df$V2 == "b", c("V1", "V3")] <- df[df$V2 == "b", c("V3", "V1")]
which yields:
> df
V1 V2 V3
1 1 a 2
2 2 a 3
3 1 b 4
You can use transform to do this.
df <- transform(df, V3 = ifelse(V2 == 'b', V1, V3), V1 = ifelse(V2 == 'b', V3, V1))
Editted I got tripped up with column names, sorry. This works.
If you don't mind the rows ending up in different orders, this is kind of a 'cute' way to do this:
dat <- read.table(textConnection("V1 V2 V3
1 a 2
2 a 3
4 b 1"),sep = "",header = TRUE)
tmp <- dat[dat$V2 == 'b',3:1]
colnames(tmp) <- colnames(dat)
rbind(dat[dat$V2 != 'b',],tmp)
Basically, that's just grabbing the rows where V2 == 'b', reverses the columns and slaps it back together with everything else. This can be extended if you have more columns that don't need switching; you'd just use an integer index with those values transposed, rather than just 3:1.

Resources