Vertically printed column names - r

I believe in meaningful variable names. Unfortunately this often means that there are huge white gaps when I look at a data.frame in the R console:
Is there a way to tell R to print the column names vertically, like this:
It doesn't need to be in the console, maybe it is possible to plot a table to PDF that way?
Executable code, provided by Ben Bolker:
sample.table <- data.frame(a.first.long.variable.name=rep(1,7),
another.long.variable.name=rep(1,7),
this.variable.name.is.even.longer.maybe=rep(1,7)
)

As described in the comments, you may apply rotation via CSS:
library(DT)
df <- mtcars
names(df) <- sprintf('<div style="transform:rotate(-90deg);margin-top:30px;">%s</div>', names(df))
dt <- datatable(df, escape = FALSE)
htmlwidgets::saveWidget(dt, tf<-tempfile(fileext = ".html"))
shell.exec(tf)
This does not work in the RStudio Viewer, however it does work in the browser:

Not without using a graphics device.
A better and simpler workaround which works in a plain old console is:
Print the transpose of the table, now column-names become row-names:
> t(sample.table)
1 2 3 4 5 6 7
a.first.long.variable.name 1 1 1 1 1 1 1
another.long.variable.name 1 1 1 1 1 1 1
this.variable.name.is.even.longer.maybe 1 1 1 1 1 1 1
(To suppress the useless column-names you get by default, include sample.table <- data.frame(row.names=1:7, ... )
I do this all the time. Heatmaps, dendrograms, auto-named regression variables from expanding categoricals...

Related

Procedural way to generate signal combinations and their output in r

I have been continuing to learn r to transition away from excel and I am wondering what the best way to approach the following problem is, or at least what tools are available to me:
I have a large data set (100K+ rows) and several columns that I could generate a signal off of and each value in the vectors can range between 0 and 3.
sig1 sig2 sig3 sig4
1 1 1 1
1 1 1 1
1 0 1 1
1 0 1 1
0 0 1 1
0 1 2 2
0 1 2 2
0 1 1 2
0 1 1 2
I want to generate composite signals using the state of each cell in the four columns then see what each of the composite signals tell me about the returns in a time series. For this question the scope is only generating the combinations.
So for example, one composite signal would be when all four cells in the vectors = 0. I could generate a new column that reads TRUE when that case is true and false in each other case, then go on to figure out how that effects the returns from the rest of the data frame.
The thing is I want to check all combinations of the four columns, so 0000, 0001, 0002, 0003 and on and on, which is quite a few. With the extent of my knowledge of r, I only know how to do that by using mutate() for each combination and explicitly entering the condition to check. I assume there is a better way to do this, but I haven't found it yet.
Thanks for the help!
I think that you could paste the columns together to get unique combinations, then just turn this to dummy variables:
library(dplyr)
library(dummies)
# Create sample data
data <- data.frame(sig1 = c(1,1,1,1,0,0,0),
sig2 = c(1,1,0,0,0,1,1),
sig3 = c(2,2,0,1,1,2,1))
# Paste together
data <- data %>% mutate(sig_tot = paste0(sig1,sig2,sig3))
# Generate dummmies
data <- cbind(data, dummy(data$sig_tot, sep = "_"))
# Turn to logical if needed
data <- data %>% mutate_at(vars(contains("data_")), as.logical)
data

How to correct the encoding of characters on a data.frame

I have a data frame make like this:
data.names<-data.frame(DATA=c(1:5))
rownames(data.names)<-c("IV\xc1N","JOS\xc9","LUC\xcdA","RAM\xd3N","TO\xd1O")
data.names
# DATA
# IV\xc1N 1
# JOS\xc9 2
# LUC\xcdA 3
# RAM\xd3N 4
# TO\xd1O 5
I want the incorrect letters replace by the right ones (Á,É,Í,...). Make clear that I want to use apply because I read that is much more efficient apply than for. My idea is make a function that changes these letters:
letters1<-c("\xc1","\xc9","\xcd","\xd3", "\xd1") #Á,É,Í,Ó,Ñ
letters2<-c("Á","É","Í","Ó","Ñ")
change.names <- function(x){sub(letters1[x], letters2[x],rownames(data.names))}
Now, with a for I haven't any problems:
for(i in 1:5) rownames(data.names)<-change.names(i)
data.names
# DATA
# IVÁN 1
# JOSÉ 2
# LUCÍA 3
# RAMÓN 4
# TOÑO 5
But I don't have much idea how to do it with apply. I've tried:
apply(matrix(c(1:5),ncol=5),2,change.names)
And the output is a matrix with 5 columns, where each one only changes one letter and I can't know how to assign to rownames(data.names) a "mix" of them, or something that works.
You don't even need to use apply, because rownames(data.names) is a vector and vectors may be recycled
> Encoding(rownames(data.names)) <- 'latin1'
> data.names
DATA
IVÁN 1
JOSÉ 2
LUCÍA 3
RAMÓN 4
TOÑO 5
Please read this answer for more details about the encoding.

How do I remove the last N rows from a csv using R when the total number of rows can change?

I have a model that produces an output csv with some irrelevant material at the end: useful.data
useful.x useful.y useful.z
1 1 1
2 2 2
3 3 3
useless.data
useless.x useless.y useless.z
1 1 1
2 2 2
3 3 3
The issue is the number of rows I want to keep can change depending on the model run. I've never used an if statement in R but I think that looks like my best bet here in that I should use it once I get to the row that says 'useless.data'
Can someone help me with this? Thanks.
Try something like this:
all_content <- readLines("csvFileHere")
numToSkip <- *rows to skip here*
read.csv2(text = all_content, nrows = length(f) - numToSkip, header =
FALSE, stringsAsFactors = FALSE)
With the code above you will be able to change the amount of rows to skip.
Just a little advice. Always make sure if the answer provided is true, so test this answer with your dataset and check it it actually works or not!

transpose row to column in R using qdap

I have been using the wfm function in "qdap" package for transposing the text row values into columns and ran into problem when the data contains numbers along with text. For example if the row value is "abcdef" the transpose works fine but if the value is "ab1000" then the truncation of numbers happen. Can anyone help with suggestions on how to work around this?
Approach tried so far:
input <- read.table(header=F, text="101 ab0003
101 pp6500
102 sm2456")
colnames(input) <- c("id","channel")
require(qdap)
library(qdap)
output <- t(with(input, wfm(channel, id)))
output <- as.data.frame(output)
expected_output<- read.table(header=F,text="1 1 0
0 0 1")
colnames(expected_output) <- c("ab0003","pp6500", "sm2456")
I think maybe wfm isn't the right tool for this job. It seems you don't really have sentences that you want to split into words. So you're using a function with a lot of overhead unnecessarily. What you really want it to tabulate the values you have by another grouping variable.
Here are two approaches. One using qdapTools's mtabulate, another using base R's table:
library(qdapTools)
mtabulate(with(input, split(channel, id)))
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
t(with(input, table(channel, id)))
## channel
## id ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
It may be possible your MWE is not reflecting the complexity of the data, if this is the case it brings us back to the original problem. wfm uses tmpackage as a backend to make some of the manipulations. So we'd need to supply something to the ldots (...). I re-read the documentation and this is a bit confusing (I have added this info in the dev version) but we want to pass removeNumbers=FALSE to TermDocumentMatrix as seen here:
output <- t(with(input, wfm(channel, id, removeNumbers=FALSE)))
as.data.frame(output)
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1

R - Modified mosaic plot from descr package

I have a dataframe dbwith 2 categorical variables: varA has 4 levels (0,1,2,3), varB has 2 levels (yes,no). varB has no values for the level 0 of varA:
id varA varB
1 2 yes
2 3 no
3 3 no
4 1 yes
5 0 NA
6 1 no
7 2 no
8 3 yes
9 3 yes
10 2 no
I created a contingency table using CrossTable from the descr package and then a mosaic plot with the plot function:
table <- CrossTable(db$varA,db$varB, missing.include=FALSE)
plot(table,xlab="varA",ylab="varB")
I obtained this plot:
I would like to eliminate the level 0 from the plot. I also would like to add 2 y-axis, one on the left of the plot with a scale from 0 to 1 and one on the right with a scale from 1 to 0.
Could you help me?
Well, that was annoying. There is no support for subsetting such a "CrossTable" object. If it were a well-behaved table-like object you would been able to just pass table[ , -1] to the plot function. instead you need to do the subetting before the data that is passed to CrossTable:
table <- with( na.omit(db), CrossTable( varA, varB, missing.include=TRUE))
plot(table, xlab="varA", ylab="varB")
BTW using the name table for a data-object is quite confusing to regular R users since the table function is one of our basic tools.
Personally I would avoid avoid using that CrossTable function since its output is so weird and not available for management with typical R functions. Yeah, I know it produces a SAS-like output, but R users grow to love the compact output of the table function and the many matrix operations that are available for working with table-objects. You may need to get your margin percentages by hand with prop.table.

Resources