How to subset a table in R? - r

I want to create a mosaic plot in R using mosaic from the vcd library. The table that I am plotting has many many 0's in it (when I plot it, the mosaic plot is incomprehensible), and I would like to create a mosaic plot with just the top 25 highest entries. How do you subset a table in R to accomplish this? Or, how do you change every entry satisfying a certain condition to 0?
As an example:
df <- data.frame(letters=c("a","b","c","c","b","c","a","b"), end=c("x","y","x","y","x","y","y","x"))
t <- table(df)
The table looks like this:
> t
end
letters x y
a 1 1
b 2 1
c 1 2
I would like to substitute each 1 to be a 0. How should I do this?

The replace each 1 with a 0:
t[t==1] <- 0

Related

Plot a Bar Chart based on Row Names

I am trying to plot a dataframe as follows:
A 1
C 5
B 4
Z 10
M 7
and would it to show the data in the order (i.e. first column in the bar chart is A, second is C, third is B.
I have:
ggplot(pc,aes(x=Let,y=Count))+geom_bar(stat="identity")
And it plots it with the order of the Let column.
df<-data.frame(c('A','C','B','Z','M'),c(1,5,4,10,7))
One way is to convert Let column to factor in the order you want to see them and then use ggplot command.
library(tidyverse)
df$Let <- factor(df$Let, levels = df$Let)
ggplot(df,aes(x=Let,y=Count))+geom_bar(stat="identity")
data
df<-data.frame(Let = c('A','C','B','Z','M'),Count = c(1,5,4,10,7))

R boxplot with several variables - changing variable names on x-axis

I am new to R and having issues figuring out how to plot multiple variables in the same boxplot and have the x-axis display the variable names instead of 1 2 3 4.
In other words I want 1 to be Hi_24h, 2 = Hi_mo, etc.
boxplot(project$Hi_24h, project$Hi_mo, project$Lo_24h, project$Lo_mo)
Try:
boxplot(project, names=names(project))
if you do not want all of your columns and would like to select them manually then create a vector:
mynames<-c("Hi_24h", "Hi_mo", "Lo_24h", "Lo_mo")
boxplot(project$Hi_24h, project$Hi_mo, project$Lo_24h, project$Lo_mo, names=mynames

r- hist.default, 'x' must be numeric

Just picking up R and I have the following question:
Say I have the following data.frame:
v1 v2 v3
3 16 a
44 457 d
5 23 d
34 122 c
12 222 a
...and so on
I would like to create a histogram or barchart for this in R, but instead of having the x-axis be one of the numeric values, I would like a count by v3. (2 a, 1 c, 2 d...etc.)
If I do hist(dataFrame$v3), I get the error that 'x 'must be numeric.
Why can't it count the instances of each different string like it can for the other columns?
What would be the simplest code for this?
OK. First of all, you should know exactly what a histogram is. It is not a plot of counts. It is a visualization for continuous variables that estimates the underlying probability density function. So do not try to use hist on categorical data. (That's why hist tells you that the value you pass must be numeric.)
If you just want counts of discrete values, that's just a basic bar plot. You can calculate counts of values in R for discrete data using table and then plot that with the basic barplot() command.
barplot(table(dataFrame$v3))
If you want to require a minimum number of observations, try
tbl<-table(dataFrame$v3)
atleast <- function(i) {function(x) x>=i}
barplot(Filter(atleast(10), tbl))

Simple line plot using R ggplot2

I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?
You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)
You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)

R - Modified mosaic plot from descr package

I have a dataframe dbwith 2 categorical variables: varA has 4 levels (0,1,2,3), varB has 2 levels (yes,no). varB has no values for the level 0 of varA:
id varA varB
1 2 yes
2 3 no
3 3 no
4 1 yes
5 0 NA
6 1 no
7 2 no
8 3 yes
9 3 yes
10 2 no
I created a contingency table using CrossTable from the descr package and then a mosaic plot with the plot function:
table <- CrossTable(db$varA,db$varB, missing.include=FALSE)
plot(table,xlab="varA",ylab="varB")
I obtained this plot:
I would like to eliminate the level 0 from the plot. I also would like to add 2 y-axis, one on the left of the plot with a scale from 0 to 1 and one on the right with a scale from 1 to 0.
Could you help me?
Well, that was annoying. There is no support for subsetting such a "CrossTable" object. If it were a well-behaved table-like object you would been able to just pass table[ , -1] to the plot function. instead you need to do the subetting before the data that is passed to CrossTable:
table <- with( na.omit(db), CrossTable( varA, varB, missing.include=TRUE))
plot(table, xlab="varA", ylab="varB")
BTW using the name table for a data-object is quite confusing to regular R users since the table function is one of our basic tools.
Personally I would avoid avoid using that CrossTable function since its output is so weird and not available for management with typical R functions. Yeah, I know it produces a SAS-like output, but R users grow to love the compact output of the table function and the many matrix operations that are available for working with table-objects. You may need to get your margin percentages by hand with prop.table.

Resources