How to join scatterplots together using ggplot in R? - r

I have been trying to join scatterplots together in one table of data:
The data is:
row Var1 Var2 Freq
1 0 Good 17
2 1 Good 479
3 2 Good 455
4 3 Good 273
5 4 Good 155
6 5 Good 9
7 0 Average 81
8 1 Average 2449
9 2 Average 4627
10 3 Average 3261
11 4 Average 3142
12 5 Average 110
13 0 Bad 74
14 1 Bad 1472
15 2 Bad 3881
16 3 Bad 3399
17 4 Bad 5431
18 5 Bad 188
Joining together in the following format is a problem since I can't join lines and get a chart as attached. I want the different scatter plots linked together
(edit)The code for the plot is below:
ggplot(dfs, aes(Var1, Freq, colour=Var2))+geom_line()+geom_point()
The result should look something like (that I have successfully joined - the same data but using another Var1 as the x axis):

Related

How to write a loop for this case in R?

I have a data base with 121 rows and like 10 columns. One of these columns corresponds to Station, another to depth and the rest to chemical variables (temperature, salinity, etc.). I want to calculate the integrated value of these chemical properties by station, using the function oce::integrateTrapezoid. It's my first time doing a loop, so i dont know how. Could you help me?
dA<-matrix(data=NA, nrow=121, ncol=3)
for (Station in unique(datos$Station))
{dA[Station, cd] <- integrateTrapezoid(cd, Profundidad..m., "cA")
}
Station
Depth
temp
1
10
28
1
50
25
1
100
15
1
150
10
2
9
27
2
45
24
2
98
14
2
152
11
3
11
28.7
3
48
23
3
102
14
3
148
9

How to write code for Level 2 data for Multilevel Modeling using nlme package

I am struggling with how to describe level 2 data in my Multilevel Model in R.
I am using the nlme package.
I have longitudinal data with repeated measures. I have repeated observations for every subject across many days.
The Goal:
Level 1 would be the individual observations within the subject ID
Level 2 would be the differences between overall means between subject IDs (Cluster).
I am trying to determine if Test scores are significantly affected by study time, and to see if it's significantly different within subjects and between subjects.
How would I write the script if I want to do "Between Subjects" ?
Here is my script for Level 1 Model
model1 <- lme(fixed = TestScore~Studytime, random =~1|SubjectID, data=dataframe, na.action=na.omit)
Below is my example dataframe
`Subject ID` Observations TestScore Studytime
1 1 1 50 600
2 1 2 72 900
3 1 3 82 627
4 1 4 90 1000
5 1 5 81 300
6 1 6 37 333
7 2 1 93 900
8 2 2 97 1000
9 2 3 99 1200
10 2 4 85 600
11 3 1 92 800
12 3 2 73 900
13 3 3 81 1000
14 3 4 96 980
15 3 5 99 1300
16 4 1 47 600
17 4 2 77 900
18 4 3 85 950
I appreciate the help!

R Merge part of table into one column with sum

I have the following table in R:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
162 148 108 93 67 83 44 53 37 47 25 34 17 22 11 11 5
I want to divide in into 7 parts had title of 1 2 3 4 5 6 7&greater, where it needs to combine all the number after 7 and merge it into the last one.
I have looked at aggregate & tapply but doesn't seem like the right function I need.
x <- c(x[1:6], "7 and above"=sum(x[-(1:6)]))
1 2 3 4 5 6 7 and above
162 148 108 93 67 83 306
data
x <- table(rep(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17), c(162,148,108,93,67,83,44,53,37,47,25,34,17,22,11,11,5)))
If you are using table to generate the output above you can use pmin to keep minimum between the values in your data and 7 and then use table to count the frequency.
Assuming your dataframe is called df and column name is col_name you can do.
tab <- table(pmin(df$col_name, 7))
The values under 7 would include all the 7 & above values together. You can rename it to make it more clear.
names(tab)[7] <- '7&above'

Creating an summary dataset with multiple objects and multiple observations per object

I have a dataset with the reports from a local shop, where each line has a client's ID, date of purchase and total value per purchase.
I want to create a new plot where for each client ID I have all the purchases in the last month or even just sample purchases in a range of dates I choose.
The main problem is that certain customers might buy once a month, while others can come daily - so the number of observations per period of time can vary.
I have tried subsetting my dataset to a specific range of time, but either I choose a specific date - and then I only get a small % of all customers, or I choose a range and get multiple observations for certain customers.
(In this case - I wouldn't mind getting the earliest observation)
An important note: I know how to create a for loop to solve this problem, but since the dataset is over 4 million observations it isn't practical since it would take an extremely long time to run.
A basic example of what the dataset looks like:
ID Date Sum
1 1 1 234
2 1 2 45
3 1 3 1
4 2 4 223
5 3 5 546
6 4 6 12
7 2 1 20
8 4 3 30
9 6 2 3
10 3 5 45
11 7 6 456
12 3 7 65
13 8 8 234
14 1 9 45
15 3 2 1
16 4 3 223
17 6 6 546
18 3 4 12
19 8 7 20
20 9 5 30
21 11 6 3
22 12 6 45
23 14 9 456
24 15 10 65
....
And the new data set would look something like this:
ID 1Date 1Sum 2Date 2Sum 3Date 3Sum
1 1 234 2 45 3 1
2 1 20 4 223 NA NA
3 2 1 5 546 5 45
...
Thanks for your help!
I think you can do this with a bit if help from dplyr and tidyr
library(dplyr)
library(tidyr)
dd %>% group_by(ID) %>% mutate(seq=1:n()) %>%
pivot_wider("ID", names_from="seq", values_from = c("Date","Sum"))
Where dd is your sample data frame above.

creating heatmap of the Oxboys dataset (from nlme)

Hi all: I am currently working on a dataset that is close to the Oxboys data in structure. What I am aspiring to achieve is a heatmap that shows for all 26 boys whether the percentage increase in their heights across the 9 occasions were the same as group average, higher than average or lower (Amber, Green, Red respectively). So, 26 rows & 8 columns with R-A-G in each intersecting cell. This is what I believe I need to do;
create a vector with actual percentage increase in heights (what was the % increase on 2nd Occasion vis-a-vis first and so on
calculate the average for each Occasion increase
write this into a matrix
use ggheat to create a heatmap
I need direction, advice, resources that I can look up to initiate this.
many thanks..
here's first 18 rows of the data
Subject numbers are reference to students // Occassion is the progressive time stamps when height measurements were taken
> head(ox_b, 18)
Subject height Occasion
1 1 140.5 1
2 1 143.4 2
3 1 144.8 3
4 1 147.1 4
5 1 147.7 5
6 1 150.2 6
7 1 151.7 7
8 1 153.3 8
9 1 155.8 9
10 2 136.9 1
11 2 139.1 2
12 2 140.1 3
13 2 142.6 4
14 2 143.2 5
15 2 144.0 6
16 2 145.8 7
17 2 146.8 8
18 2 148.3 9

Resources