boxplot doesn't show all the parameter in R - r

I write this code to execute an ANOVA for a simple dataframe and I want to draw a boxplot out of it
DF <- read.table('chromium.txt',header=TRUE)
Chromium.aov <- aov(Concentration ~ Lab,data=DF)
print(summary(Chromium.aov))
with(DF,boxplot(Concentration,Lab))
here is the text file
Lab Concentration
1 26.1
1 21.5
1 22.0
1 22.6
1 24.9
1 22.6
1 23.8
1 23.2
2 18.3
2 19.7
2 18.0
2 17.4
2 22.6
2 11.6
2 11.0
2 15.7
3 19.1
3 13.9
3 15.7
3 18.6
3 19.1
3 16.8
3 25.5
3 19.7
4 30.7
However, R only show 2 box plots for lab 1 and 2, not 3 and 4, how can I fix this?

boxplot(DF$Concentration ~ DF$Lab)
The syntax you used is making one box with all the values of 'Concentration', and another with the values of 'Lab'

When you do with(DF,boxplot(Concentration,Lab)), you are providing two sets of values to be plotted - Concentration and lab. You want to split the Concentration based on the unique values Lab and then create the boxplot.
boxplot(split(DF$Concentration, DF$Lab))

Related

What's wrong with this code with permtest function in R?

ID pounds Drug
1 1 46.4 B
2 2 40.4 A
3 3 27.6 B
4 4 93.2 B
5 5 28.8 A
6 6 36.0 A
7 7 81.2 B
8 8 14.4 B
9 9 64.0 A
10 10 29.6 A
My code is
test <-permtest(data1$pounds[Drug=='A'],data1$pounds[Drug=='B'])
But I get an error saying object 'Drug' not found.
Help!
We need to extract the column with $ or [[. Here it is searching for an object 'Drug' in the global env, which is not created there, but only within the environment of the 'data1'. So, either use $/[[
permtest(data1$pounds[data1$Drug=='A'],data1$pounds[data1$Drug=='B'])
Or use with
with(data1, permtest(pounds[Drug == 'A'], pounds[Drug == 'B']))

Time-series data visualization

I have a pretty large data frame in R stored in long form. It contains body temperature data collected from 40 different individuals, with 10 sec intervals, over 16 days. Individuals have been exposed to conditions (cond1 and cond2). It essentially looks like this:
ID Cond1 Cond2 Day ToD Temp
1 A B 1 18.0 37.1
1 A B 1 18.3 37.2
1 A B 2 18.6 37.5
2 B A 1 18.0 37.0
2 B A 1 18.3 36.9
2 B A 2 18.6 36.9
3 A A 1 18.0 36.8
3 A A 1 18.3 36.7
3 A A 2 18.6 36.7
...
I want to create four separate line plots for each combination of conditions(AB, BA, AA, BB) that shows mean temp over time (day 1-16).
p.s. ToD stands for time of day. Not sure if I need to provide it in order to create the plot.
So far I have tried to define the dataset as time series by doing
ts <- ts(data=dataset$Temp, start=1, end=16, frequency=8640)
plot(ts)
This returns a plot of Temp, but I can't figure out how to define condition values for breaking up the data.
Edit:
Essentially I want a plot that looks like this 1, but one for each group separately, and using mean Temp values. This plot is just for one individual in one condition, and I want one that shows the mean for all individuals in the same condition.
You can use summarise and group_by to group the data by condition and then plot it. Is this what you're looking for?
library(dplyr)
## I created a dataframe df that looks like this:
ID Cond1 Cond2 Day ToD Temp
1 1 A B 1 18.0 37.1
2 1 A B 1 18.3 37.2
3 1 A B 2 18.6 37.5
4 2 B A 1 18.0 37.0
5 2 B A 1 18.3 36.9
6 2 B A 2 18.6 36.9
7 3 A A 1 18.0 36.8
8 3 A A 1 18.3 36.7
9 3 A A 2 18.6 36.7
df$Cond <- paste0(df$Cond1, df$Cond2)
d <- summarise(group_by(df, Cond, Day), t = mean(Temp))
ggplot(d, aes(Day, t, color = Cond)) + geom_line()
which results in:

How to change a column classed as NULL to class integer?

So I'm starting with a dataframe called max.mins that has 153 rows.
day Tx Hx Tn
1 1 10.0 7.83 2.1
2 2 7.7 6.19 2.5
3 3 7.1 4.86 0.0
4 4 9.8 7.37 2.7
5 5 13.4 12.68 0.4
6 6 17.5 17.47 3.5
7 7 16.5 15.58 6.5
8 8 21.5 20.30 6.2
9 9 21.7 21.41 9.7
10 10 24.4 28.18 8.0
I'm applying these statements to the dataframe to look for specific criteria
temp_warnings <- subset(max.mins, Tx >= 32 & Tn >=20)
humidex_warnings <- subset(max.mins, Hx >= 40)
Now when I open up humidex_warnings for example I have this dataframe
row.names day Tx Hx Tn
1 41 10 31.1 40.51 20.7
2 56 25 33.4 42.53 19.6
3 72 11 34.1 40.78 18.1
4 73 12 33.8 40.18 18.8
5 74 13 34.1 41.10 22.4
6 79 18 30.3 41.57 22.5
7 94 2 31.4 40.81 20.3
8 96 4 30.7 40.39 20.2
The next step is to search for 2 or 3 consective numbers in the column row.names and give me a total of how many times this occurs (I asked this in a previous question and have a function that should work once this problem is sorted out). The issue is that row.names is class NULL which is preventing me from applying further functions to this dataframe.
Help? :)
Thanks in advance,
Nick
If you need the row.names as a data as integer:
humidex_warnings$seq <- as.integer(row.names(humidex_warnings))
If you don't need row.names
row.names(humidex_warnings) <- NULL

plot 3D surface with R

I am having a problem ploting my data as a 3D surface using this script:
wireframe(Z~X*Y, data=FI02, xlab="X", ylab="Y", main="Surface elevation", drape=TRUE,
colorkey=TRUE, screen=list(z=-60, x=-60))
The output is just a cube without data / surface (see attachment). What was my mistake?
"X" "Y" "Z" "Plot"
552032.707 413894.885 10.8 2
552033.707 413896.585 13.4 2
552036.907 413899.685 18.5 2
552039.307 413898.085 10.5 2
552039.807 413894.585 11.2 2
552044.107 413894.985 9 2
552044.007 413895.035 11.5 2
552043.607 413896.985 13.4 2
552047.407 413897.885 8.2 2
552045.207 413898.985 10.7 2
552042.307 413902.085 9.4 2
552040.907 413902.885 12.5 2
552036.607 413901.585 11.4 2
552036.207 413901.435 12.4 2
552039.907 413905.285 18 2
552036.707 413906.585 9.7 2
552037.407 413908.785 6.3 2
552038.907 413911.085 7.5 2
552039.607 413911.285 16.8 2
552041.107 413908.985 9.5 2
552041.307 413910.385 14.5 2
552042.207 413909.985 9.3 2
552050.707 413911.985 12.5 2
552048.907 413909.985 18.6 2
552044.507 413906.585 6.7 2
552047.807 413904.085 6.8 2
552048.007 413904.285 12.8 2
552050.407 413903.885 9.7 2
552049.107 413909.785 5.2 2
552050.507 413910.785 12.5 2
552052.407 413908.685 16.5 2
552057.907 413910.385 10.3 2
552058.707 413909.785 18.5 2
552058.907 413910.485 12.4 2
552059.707 413908.385 15.3 2
552060.307 413910.785 7.2 2
552061.207 413911.985 11.8 2
552071.007 413912.185 17 2
552068.707 413911.385 8.3 2
552069.107 413910.885 15.5 2
552068.607 413908.485 8 2
Try this to see why I don't think this data is well suited for wireframe:
cloud(Z~X+Y, data=FI02, xlab="X", ylab="Y", main="Surface elevation",
type="l", screen=list(z=-60, x=-60))

Reshaping a data frame with more than one measure variable

I'm using a data frame similar to this one:
df<-data.frame(student=c(rep(1,5),rep(2,5)), month=c(1:5,1:5),
quiz1p1=seq(20,20.9,0.1),quiz1p2=seq(30,30.9,0.1),
quiz2p1=seq(80,80.9,0.1),quiz2p2=seq(90,90.9,0.1))
print(df)
student month quiz1p1 quiz1p2 quiz2p1 quiz2p2
1 1 1 20.0 30.0 80.0 90.0
2 1 2 20.1 30.1 80.1 90.1
3 1 3 20.2 30.2 80.2 90.2
4 1 4 20.3 30.3 80.3 90.3
5 1 5 20.4 30.4 80.4 90.4
6 2 1 20.5 30.5 80.5 90.5
7 2 2 20.6 30.6 80.6 90.6
8 2 3 20.7 30.7 80.7 90.7
9 2 4 20.8 30.8 80.8 90.8
10 2 5 20.9 30.9 80.9 90.9
Describing grades received by students during five months – in two quizzes divided into two parts each.
I need to get the two quizzes into separate rows – so that each student in each month will have two rows, one for each quiz, and two columns – for each part of the quiz.
When I melt the table:
melt.data.frame(df, c("student", "month"))
I get the two parts of the quiz in separate lines too.
dcast(dfL,student+month~variable)
of course gets me right back where I started, and I can't find a way to cast the table back in to the required form.
Is there a way to make the melt command function something like:
melt.data.frame(df, measure.var1=c("quiz1p1","quiz2p1"),
measure.var2=c("quiz1p2","quiz2p2"))
Here's how you could do this with reshape(), from base R:
df2 <- reshape(df, direction="long",
idvar = 1:2, varying = list(c(3,5), c(4,6)),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
## Checking the output
rbind(head(df2, 3), tail(df2, 3))
# student month time p1 p2
# 1.1.quiz1 1 1 quiz1 20.0 30.0
# 1.2.quiz1 1 2 quiz1 20.1 30.1
# 1.3.quiz1 1 3 quiz1 20.2 30.2
# 2.3.quiz2 2 3 quiz2 80.7 90.7
# 2.4.quiz2 2 4 quiz2 80.8 90.8
# 2.5.quiz2 2 5 quiz2 80.9 90.9
You can also use column names (instead of column numbers) for idvar and varying. It's more verbose, but seems like better practice to me:
## The same operation as above, using just column *names*
df2 <- reshape(df, direction="long", idvar=c("student", "month"),
varying = list(c("quiz1p1", "quiz2p1"),
c("quiz1p2", "quiz2p2")),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
I think this does what you want:
#Break variable into two columns, one for the quiz and one for the part of the quiz
dfL <- transform(dfL, quiz = substr(variable, 1,5),
part = substr(variable, 6,7))
#Adjust your dcast call:
dcast(dfL, student + month + quiz ~ part)
#-----
student month quiz p1 p2
1 1 1 quiz1 20.0 30.0
2 1 1 quiz2 80.0 90.0
3 1 2 quiz1 20.1 30.1
...
18 2 4 quiz2 80.8 90.8
19 2 5 quiz1 20.9 30.9
20 2 5 quiz2 80.9 90.9
There was a very similar question asked about half a year ago, in which I wrote the following function:
melt.wide = function(data, id.vars, new.names) {
require(reshape2)
require(stringr)
data.melt = melt(data, id.vars=id.vars)
new.vars = data.frame(do.call(
rbind, str_extract_all(data.melt$variable, "[0-9]+")))
names(new.vars) = new.names
cbind(data.melt, new.vars)
}
You can use the function to "melt" your data as follows:
dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
head(dfL)
# student month variable value Quiz Part
# 1 1 1 quiz1p1 20.0 1 1
# 2 1 2 quiz1p1 20.1 1 1
# 3 1 3 quiz1p1 20.2 1 1
# 4 1 4 quiz1p1 20.3 1 1
# 5 1 5 quiz1p1 20.4 1 1
# 6 2 1 quiz1p1 20.5 1 1
tail(dfL)
# student month variable value Quiz Part
# 35 1 5 quiz2p2 90.4 2 2
# 36 2 1 quiz2p2 90.5 2 2
# 37 2 2 quiz2p2 90.6 2 2
# 38 2 3 quiz2p2 90.7 2 2
# 39 2 4 quiz2p2 90.8 2 2
# 40 2 5 quiz2p2 90.9 2 2
Once the data are in this form, you can much more easily use dcast() to get whatever form you desire. For example
head(dcast(dfL, student + month + Quiz ~ Part))
# student month Quiz 1 2
# 1 1 1 1 20.0 30.0
# 2 1 1 2 80.0 90.0
# 3 1 2 1 20.1 30.1
# 4 1 2 2 80.1 90.1
# 5 1 3 1 20.2 30.2
# 6 1 3 2 80.2 90.2

Resources