R matplot function - r

As I am beginner R, I allow myself to ask R users a little question.
I want to represent in a graphic (points, lines, curves) the values of weight of two human groups treated and not treated by drug (0,1) measured ten times (months).
drug NumberIndividu Mar Apr May June July August September November October December
1 9 25.92 24.6 31.85 38.50 53.70 53.05 65.65 71.45 69.10 67.20
1 10 28.10 26.6 32.00 38.35 53.60 53.25 65.35 65.95 67.80 65.95
1 11 29.10 28.8 30.80 38.10 52.25 47.30 62.20 68.05 66.20 67.55
1 13 27.16 25.0 27.15 34.85 47.30 43.85 54.65 62.25 60.85 58.05
0 5 25.89 25.2 26.50 27.45 37.05 38.95 43.30 50.60 48.20 50.10
0 6 28.19 27.6 28.05 28.60 36.15 37.20 40.40 47.80 45.25 44.85
0 7 28.06 27.2 27.45 28.85 39.20 41.80 51.40 57.10 54.55 55.30
0 8 22.39 21.2 30.10 30.90 42.95 46.30 48.15 54.85 53.35 49.90
I tried :
w= read.csv (file="/file-weight.csv", header=TRUE, sep=",")
w<-data.frame(w)
rownames(w[1:8,])
rownames(w)<-(c(w[,1]))
cols <- character(nrow(w))
cols[rownames(w) %in% c(rownames(w[1:4,]))]<-"blue"
cols[rownames(w) %in% c(rownames(w[5:8,]))]<-"red"
pairs(w,col=cols)
My question is how to configurate matplot function to have one graphic view (points or curves or hist +curves)
My main goal is to visualize all distributions of individus following two colors of first column (drug) for all dates in one image.
Thanks a lot for your suggestions

Is this what you had in mind?
The code is based on the answer to ->this question<-, just using your dataset (df) instead of iris. So in that response, replace:
x <- with(iris, data.table(id=1:nrow(iris), group=Species, Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
with:
xx <- with(df, data.table(id=1:nrow(df), group=drug, df[3:12]))
If all you want is the density plots/histograms, it's easier (see below). These are complementary, because they show that weight is increasing in both control and test groups, just faster in the test group. You wouldn't pick that up from the scatterplot matrix. Also, there's the suggestion that variability in weight is greater in the control group, and grows over time.
library(ggplot2)
library(reshape2) # for melt(...)
# convert df into a form suitable to use with ggplot
gg <- melt(df,id=1:2, variable.name="Month", value.name="Weight")
# density plots
ggplot(gg) +
stat_density(aes(x=Weight, y=..scaled.., color=factor(drug)),geom="line", position="dodge")+
facet_grid(Month~.)+
scale_color_discrete("Drug")
# histograms
ggplot(gg) +
geom_histogram(aes(x=Weight, fill=factor(drug)), position="dodge")+
facet_grid(Month~.)+
scale_fill_discrete("Drug")

Related

read.csv error due to no column names (R)

I'm trying to read a csv file in r.
The issue is that my file has no column names except for the first column
Using the read.csv() function gives me the 'Error in read.table : more columns than column names' error
So I used the read_csv() function from the readr library.
However this creates a df with just one column containing all the values.
(https://i.stack.imgur.com/Och8A.png)
What should I do to fix this issue?
First cut to read the data would be using skip=1 (to not read in the first line, it appears to be descriptive only) and header=FALSE:
quux <- read.csv("path/to/file.csv", skip = 1, header = FALSE)
I find this format to be a bit awkward, we may want to reshape it a bit.
quux <- setNames(data.frame(t(quux[,-1])), sub(":$", "", quux[[1]]))
quux
# LON LAT MMM 1984-Nov-01 1974-Nov-05
# V2 151.0 -24.5 27.11 22.28 22.92
# V3 151.5 -24.0 27.46 22.47 22.83
# V4 152.0 -24.0 27.19 22.27 22.64
Many tools prefer to have the "month" column names as a single column, which is converting this data from "wide" format to "long". This is easily done with either tidyr::pivot_longer or reshape2::melt:
dat <- reshape2::melt(quux, c("LON", "LAT", "MMM"), variable.name = "date")
dat
# LON LAT MMM date value
# 1 151.0 -24.5 27.11 1984-Nov-01 22.28
# 2 151.5 -24.0 27.46 1984-Nov-01 22.47
# 3 152.0 -24.0 27.19 1984-Nov-01 22.27
# 4 151.0 -24.5 27.11 1974-Nov-05 22.92
# 5 151.5 -24.0 27.46 1974-Nov-05 22.83
# 6 152.0 -24.0 27.19 1974-Nov-05 22.64
dat <- tidyr::pivot_longer(quux, -c(LON, LAT, MMM), names_to = "date")
From here, it might be nice to have the date column be a "proper" Date-object so that it "number-like" things can be done with it. For example, in its present form, sorting is incorrect since Apr will land before Jan; other number-like operations include finding ranges of dates (which can be done with strings, but not these strings) and adding/subtracting days (e.g., 7 days prior to a value).
dat$date <- as.Date(dat$date, format = "%Y-%b-%d")
dat
# LON LAT MMM date value
# 1 151.0 -24.5 27.11 1984-11-01 22.28
# 2 151.5 -24.0 27.46 1984-11-01 22.47
# 3 152.0 -24.0 27.19 1984-11-01 22.27
# 4 151.0 -24.5 27.11 1974-11-05 22.92
# 5 151.5 -24.0 27.46 1974-11-05 22.83
# 6 152.0 -24.0 27.19 1974-11-05 22.64
Sample data:
quux <- read.csv(skip = 1, header = FALSE, text = '
LON:,151.0,151.5,152.0
LAT:,-24.5,-24.0,-24.0
MMM:,27.11,27.46,27.19
1984-Nov-01,22.28,22.47,22.27
1974-Nov-05,22.92,22.83,22.64
')

Heatmap in r with geom_raster

I am trying to plot a heat map from data with three variables. I am using ggplot with geom_raster, but doesn't seem to work. I am unable to see what's going wrong.
library(tidyverse)
p <- read.csv("Rheatmaptest.csv", header = TRUE);
p
xdir ydir Category.1 Category.2 Category.3 Category.4
1 -10.731 10.153 0.61975 3.2650 0.19025 13.00
2 -21.462 9.847 1.77000 3.2475 0.56325 16.70
3 -32.193 9.847 1.65500 2.9900 0.51325 176.00
4 -42.924 10.000 1.34500 3.1800 0.41350 177.00
5 -16.770 20.000 0.69600 3.4975 0.22150 174.00
6 -33.541 20.000 0.68700 3.4275 0.20250 4.24
7 -50.311 20.000 0.77350 3.1575 0.24250 177.00
8 -67.082 20.000 1.09600 3.5350 0.34600 163.00
9 -18.689 30.000 0.54250 3.5875 0.18100 160.00
10 -37.378 30.000 0.63075 3.7125 0.19300 158.00
11 -56.067 30.000 0.71975 3.5425 0.22225 2.26
12 -74.756 30.000 0.79100 3.3750 0.23000 8.24
13 -20.000 40.000 0.76650 3.7200 0.24375 167.00
14 -40.000 40.000 0.68325 3.5300 0.21350 155.00
15 -60.000 40.000 0.81075 3.3400 0.25325 145.00
16 -80.000 40.000 0.68800 3.6375 0.21350 146.00
17 -19.521 50.000 0.67900 3.7150 0.21700 167.00
18 -39.043 50.000 0.69500 3.7950 0.21225 109.00
19 -58.564 49.847 0.68300 3.5575 0.20700 166.00
20 -78.085 50.000 0.67375 3.5325 0.21975 163.00
21 -17.562 60.000 0.64350 3.7025 0.19475 140.00
22 -35.585 60.000 0.56650 3.5250 0.17775 34.30
23 -54.067 60.000 0.82350 3.7700 0.24525 129.00
24 -72.090 60.000 0.85450 3.6675 0.28225 156.00
25 -15.522 70.000 0.59100 3.3475 0.18875 144.00
26 -31.044 69.847 0.56200 3.7975 0.17250 159.00
27 -46.566 70.000 0.79375 3.5350 0.24975 145.00
28 -62.088 70.000 0.64275 3.6100 0.20375 132.00
29 -11.040 80.000 0.75875 3.7450 0.23925 138.00
30 -22.081 80.000 0.81900 3.3875 0.25975 144.00
31 -33.121 80.000 0.72725 3.5825 0.22175 132.00
32 -44.161 80.000 0.83300 3.5550 0.27000 177.00
33 -4.522 90.000 1.77500 3.1250 0.57200 16.30
34 -9.440 90.000 0.96925 3.7200 0.31000 163.00
35 -13.106 90.000 0.76975 3.6600 0.23800 3.50
36 -18.089 90.000 0.86050 3.6750 0.26650 80.50
ggplot(p, aes(x = xdir, y = ydir)) +
geom_raster(aes(fill = Category.1), interpolate = TRUE) +
scale_fill_gradient2(limits=c(0.5,2), low="blue", mid="yellow", high="red", midpoint=1)
I am able to see points when I use geom_point instead of geom_raster. Even with geom_raster, I just see very tiny points at the corresponding locations. Interpolate doesn't seem to work.
Am I missing something?
The implied precision of your data is causing your rasters to be plotted so small they are barely visible.
By reducing your precision, you can at least see your raster plot though it is still probably not very useful. Posting this I see I came to the same solution as #tifu.
db %>%
ggplot(aes(x = round(xdir/2), y = round(ydir), fill = Category.1)) +
geom_raster(aes(fill = Category.1)) +
scale_fill_gradient2(limits=c(0.5,2), low="blue", mid="yellow", high="red", midpoint=1)

R identifying first value in data-frame and creating new variable by adding/subtracting this from all values in data-frame in new column

I know this question may have been already answered elsewhere and apologies for repeating it if so but I haven't found a workable answer as yet.
I have 17 subjects each with two variables as below:
Time (s) OD
130 41.48
130.5 41.41
131 39.6
131.5 39.18
132 39.41
132.5 37.91
133 37.95
133.5 37.15
134 35.5
134.5 36.01
135 35.01
I would like R to identify the first value in column 2 (OD) of my dataframe and create a new column (OD_adjusted) by adding or subtracting (depending if the first value is +ive or -ive) from all values in column 2 so it would look like this:
Time (s) OD OD_adjusted
130 41.48 0
130.5 41.41 -0.07
131 39.6 -1.88
131.5 39.18 -2.3
132 39.41 -2.07
132.5 37.91 -3.57
133 37.95 -3.53
133.5 37.15 -4.33
134 35.5 -5.98
134.5 36.01 -5.47
135 35.01 -6.47
First value in column 2 is 41.48 therefore I want to subtract this value from all datapoints in column 2 to create a new third column (OD_adjusted).
I can use OD_adjusted <- ((df$OD) - 41.48) however, I would like to automate the process using a function and this is where I am stuck:
AUC_OD <- function(df){
return_value_1 = df %>%
arrange(OD) %>%
filter(OD [1,2) %>%
slice_(1)
colnames(return_value_1)[3] <- "OD_adjusted"
if (nrow(return_value_1) > 0 ) { subtract
(return_value_1 [1,2] #into new row
else add
(return_value_1 [1,2] #into new row
}
We get the first element of 'OD' and subtract from the column
library(dplyr)
df1 %>%
mutate(OD_adjusted = OD- OD[1])
Or using base R
df1$OD_adjusted <- with(df1, OD - OD[1])

Error message in Treeclim

I'm attempting to use the package treeclim to analyze my tree ring growth data and climate. I measured the widths in CooRecorder, grouped them into series in CDENDRO, and read them into R-Studio using dplR read.rwl function. However, I keep getting an error message reading
"Error in dcc(Plot92.crn, Site92PRISM, selection = -6:9, method = "response", :
Overlapping time span of chrono and climate records is smaller than number of parameters! Consider adapting the number of parameters to a maximum of 100."
I have 100 years of monthly climate data that looks like below:
# head(Site92PRISM)
year month ppt tmax tmin tmean vpdmin..hPa. vpdmax..hPa. site
1 1915 01 0.97 26.1 12.3 19.2 0.97 2.32 92
2 1915 02 1.20 31.5 16.2 23.9 1.03 3.30 92
3 1915 03 2.51 36.0 17.0 26.5 0.97 4.69 92
4 1915 04 3.45 48.9 26.3 37.6 1.14 8.13 92
5 1915 05 3.95 44.6 29.1 36.9 0.94 5.58 92
6 1915 06 6.64 51.0 31.5 41.3 1.04 7.93 92
And my chronology, made in dplR looks like below:
#head(Plot92.crn)
CAMstd samp.depth
1840 0.7180693 1
1841 0.3175528 1
1842 0.5729651 1
1843 0.9785082 1
1844 0.7676334 1
1845 0.3633687 1
Where am I going wrong? Both files contain data from 1915-2015.
I posted a similar question to the author in the google forum of the package (i.e. https://groups.google.com/forum/#!forum/treeclim).
What you need to make sure of is that the number of parameters (n_param) is less or equal to the sample size of your dendrochronological data. By 'number of parameters' I mean the total number of columns in the climatic variables matrices.
For instance, in the following analysis:
resp <- dcc(chrono = my_chrono,
climate = list(precip, temp),
boot = 'stationary')
You need to make sure that the following is TRUE :
length(unique(rownames(my_chrono))) >= (ncol(precip)-1) + (ncol(temp)-1)
ncol(precip)-1 and not ncol(precip) because the first column of the matrix is YEAR. Also note that in my example the years in my_chrono are the same years as in precip and temp, which doesn't have to be the case to run the function (it will automatically take the common years).
Finally, if the previous line code gives you FALSE, you can reduce the number of parameters with the argument selection like this :
resp <- dcc(chrono = my_chrono,
climate = list(precip, temp),
selection = .range(6:12,'prec') + .range(6:12, 'temp'),
var_names = c('prec', 'temp'),
boot = 'stationary')
Because the dcc function automatically takes all months from previous June to current September (i.e. .range(-6:9)), you may need to reduce that range.

How to find correlation coefficients for non-numeric variables

Species Gender Weight Corneal Diameter Avg
Great Grey M 971 19.5
Great Grey F 1209 19.0
Great Grey M 952 20.5
Great Grey F 1793 20.5
Snowy M 1658 22.0
Snowy F 1899 22.75
Snowy F 1975 24.50
Snowy M 1646 23.00
Okay so I have this data set. I want to see if there is a correlation between Male Corneal Diameter Average vs. Female Corneal Diameter Average. I'm not sure how I would do this in R. I tried to create a subset for males and females and then use cor(x, y) to get the correlation coefficient but it's not working. Any help would be appreciated!
Use Spearman rank correlation. If x and y are the two vectors which may or may not be numeric.
cor(rank(x), rank(y))

Resources