Converting SAS code to R code - r

I have been trying to convert a SAS code that calculates Simple Regression and Mixed Models. I've achieved to convert simple Regression but when it comes to Mixed Model, my trials turn into fails. The SAS code shnown below is the code that I try to convert
"parc" "m" "dap" "ht" is the header labes of dataset, respectively.
data algoritmo ;
input parc m dap ht ;
lnH = LOG(ht-1.3);
lnD = LOG(dap) ;
cards ;
8 1 24.3 26.7
8 1 29.9 30.7
8 1 32.6 31.7
8 1 35.9 33.7
8 1 36.5 32.5
22 2 22.3 21.0
22 2 26.9 23.1
22 2 26.9 20.5
22 2 32.4 21.5
22 2 33.5 25.0
85 3 33.6 33.5
85 3 36.0 33.0
85 3 37.0 35.0
85 3 40.8 35.0
;
run ;
/* Simpre Regression Model */
PROC REG DATA=algoritmo ;
model lnH = lnD ;
output out=out p=pred ;
run ; quit ;
/* Mixed-Effects Model */
PROC MIXED DATA=algoritmo COVTEST METHOD=REML ;
TITLE ' lnH = (B0+bok)+(B1+b1k)*lnd ' ;
MODEL lnH = lnD / S OUTPM=outpm OUTP=outp ;
RANDOM intercept lnD /SUBJECT=m s G TYPE=UN ;
RUN ;
Here is the part of code that I converted. This part of code works perfect for me.
data1= read.table(file.choose(), header=T, sep=",")
attach(data1)
lnH=log(ht-1.3)
lnD =log(dap)
data2 = cbind(data1,lnH, lnD)
#Simple Linear Model
model1 = lm(lnH~lnD,data=data2)
summary(model1)
But for the rest I'm stuck.
model2 = lme(lnH~lnD ,data=data2,random=~1|lnD / m, method= "REML", weights=varPower(0.2,form=~dap))
summary(model2)

with the help oh Roland, replacing random=~1|lnD with random=~lnD|mworked pretty good.

Related

R { : the condition has length > 1 [duplicate]

This question already has answers here:
Interpreting "condition has length > 1" warning from `if` function
(7 answers)
Closed 7 months ago.
this is my first time asking a question in StackOverflow and also my first time coding using R
So, please understand if my explanation is unclear :(
I now have a data frame (data2000) that is 1092 x 6
The headers are year, month, predictive horizon, name of the company, GDP Price Index, and Consumer Price Index
I want to create vectors on gdppi and cpi for each month
My ultimate goal is to get the mean, median, interquartile range, and 90th-10th percentile range for each month and I thought this is the first step
and this is the code that I wrote by far
***library(tidyverse)
data2000 <- read.csv("")
for (i in 1:12) {
i_gdppi <- c()
i_cpi <- c()
}
for (i in 1:12) {
if (data2000$month == i) {
append(i_gdppi,data2000[,gdppi])
append(i_cpi, data2000[,cpi])
}
}***
Unfortunately, I got an error message saying that
Error in if (data2000$month == 1) { : the condition has length > 1
I googled it by myself and in if statement, I cannot use a vector as a condition
How can I solve this problem?
Thank you so much and have a nice day!
If you use the group_by() function then it takes care of sub-setting your data:
library(dplyr)
data2000 <- data.frame(month = rep(c(1:12), times = 2), gdppi = runif(24)*100) # Dummy data
data2000 |>
group_by(month) |>
summarise(mean = mean(gdppi), q10 = quantile(gdppi, probs = .10), q25 = quantile(gdppi, probs = .25)) # Add the other percentiles, as needed
Gives this
# A tibble: 12 x 4
month mean q10 q25
<int> <dbl> <dbl> <dbl>
1 1 12.5 3.44 6.83
2 2 34.7 7.15 17.5
3 3 37.8 22.1 28.0
4 4 30.3 19.0 23.2
5 5 65.7 62.2 63.5
6 6 60.7 38.7 47.0
7 7 43.0 38.2 40.0
8 8 77.9 60.7 67.1
9 9 56.3 44.0 48.6
10 10 53.1 19.6 32.2
11 11 63.8 40.6 49.3
12 12 59.0 49.2 52.9
If you have years and months, then group_by(year, month)

Save survdiff output

I have run a logrank test with survdiff like below:
survdiff(formula = Surv(YearsToEvent, Event) ~ Cat, data = RegressionData)`
I get the following output:
N Observed Expected (O-E)^2/E (O-E)^2/V
0 30913 487 437.9 5.50 11.9
1 3755 56 23.2 46.19 48.0
2 3322 36 45.2 1.89 2.0
3 15796 260 332.6 15.85 27.3
Chisq= 71.9 on 3 degrees of freedom, p= 0.000000000000002
How can I save this (especially the p-value) to a .txt file? I am looping a bunch of regressions like this and want to save them all to a .text file.

Reverse Johnson transformation

I want to perform a regression and I have a data set with a left-skewed target variable (Murder) like this:
data("USAArrests")
str(USAArrests)
'data.frame': 50 obs. of 4 variables:
$ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
$ Assault : int 236 263 294 190 276 204 110 238 335 211 ...
$ UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...
$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
hist(USAArrests&Murder)
Since the data is left-skewed. I can do a log transformation of the target in order to improve the performance of the model.
train = USArrests[1:30,]
train$Murder = log(train$Murder)
test = USArrests[31:50,]
If I want to apply this model on the test set a have to reverse the transformation to get the actual result. This I can do by exp.
fit = lm(Murder~., data = train)
pred = predict(fit, test)
exp(pred)
However, in my case, the log transformation is not enough to get a normal distribution of the target. So I used the Johnson transformation.
library(bestNormalize)
train$Murder = yeojohnson(train$Murder)$x.t
Is there a possibility to reverse this transformation like the log transformation like above?
As noted by Rui Barradas, the predict function can be used here. Instead of directly pulling out x.t from the yeojohnson function, you can do the following:
# Store the transformation object
yj_obj <- yeojohnson(train$Murder)
# Perform transformation
yj_vals <- predict(yj_obj)
# Reverse transformation
orig_vals <- predict(yj_obj, newdata = yj_vals, inverse = TRUE)
# Should be the same as the original values
all.equal(orig_vals, train$Murder)
The same workflow can be done with the log and exponentiation transformation via the log_x function (together with the predict function and the inverse = TRUE argument).

How to make the speed profile of a moving object?

I am an R beginner user and I face the following problem. I have the following data frame:
distance speed
1 61.0 36.4
2 51.4 35.3
3 42.2 34.2
4 33.4 32.8
5 24.9 31.3
6 17.5 28.4
7 11.5 24.1
8 7.1 19.4
9 3.3 16.9
10 0.5 15.5
11 4.4 15.1
12 8.5 15.5
13 13.1 17.3
14 18.8 20.5
15 25.7 24.1
16 33.3 26.3
17 41.0 27.0
18 48.7 27.7
19 56.6 28.4
20 64.8 29.2
21 73.6 31.7
22 83.3 34.2
23 93.4 35.3
The column distance represents the distance of a following object over a specific point and the column speed the object's speed. As you can see the object is getting closer to the point and then it is getting away. I am trying to make its speed profile. I tried the following code but it didn't give me the plot I want (because I want to show how its speed is changing when the moving object moves closer and past the reference point)
ggplot(speedprofile, aes(x = distance, y = speed)) + #speedprofile is the data frame
geom_line(color = "red") +
geom_smooth() +
geom_vline(xintercept = 0) # the vline is the reference line
The plot is the following:
Then, I tried to set the first 10 distances as negative manually which are prior to zero (0). So I get a plot closer to that I want:
But there is a problem. The distance can't be defined as negative.
To sum up, the expected plot is the following (and I am sorry for the quality).
Do you have any ideas on how to solve this?
Thank you in advance!
You can do something like this to auto-compute the change point (to know when the distance should be negative) and then set the axis labels to be positive.
Your data (in case anyone needs it to answer):
read.table(text="distance speed
61.0 36.4
51.4 35.3
42.2 34.2
33.4 32.8
24.9 31.3
17.5 28.4
11.5 24.1
7.1 19.4
3.3 16.9
0.5 15.5
4.4 15.1
8.5 15.5
13.1 17.3
18.8 20.5
25.7 24.1
33.3 26.3
41.0 27.0
48.7 27.7
56.6 28.4
64.8 29.2
73.6 31.7
83.3 34.2
93.4 35.3", stringsAsFactors=FALSE, header=TRUE) -> speed_profile
Now, compute the "real" distance (negative for approaching, positive for receding):
speed_profile$real_distance <- c(-1, sign(diff(speed_profile$distance))) * speed_profile$distance
Now, compute the X axis breaks ahead of time:
breaks <- scales::pretty_breaks(10)(range(speed_profile$real_distance))
ggplot(speed_profile, aes(real_distance, speed)) +
geom_smooth(linetype = "dashed") +
geom_line(color = "#cb181d", size = 1) +
scale_x_continuous(
name = "distance",
breaks = breaks,
labels = abs(breaks) # make all the labels for the axis positive
)
Provided fonts are working well on your system you could even do:
labels <- abs(breaks)
labels[(!breaks == 0)] <- sprintf("%s\n→", labels[(!breaks == 0)])
ggplot(speed_profile, aes(real_distance, speed)) +
geom_smooth(linetype = "dashed") +
geom_line(color = "#cb181d", size = 1) +
scale_x_continuous(
name = "distance",
breaks = breaks,
labels = labels,
)

How to fit a function for different groups in a data set using R

Please, how can I fit a function for different groups in a data set (Soil) using R. the first column is the group i.e. Plot and the second column is the observed variable i.e. Depth
Plot Depth
1 12.5
1 14.5
1 15.8
1 16.1
1 18.9
1 21.2
1 23.4
1 25.7
2 13.1
2 15.0
2 15.8
2 16.3
2 17.4
2 18.6
2 22.6
2 24.1
2 25.6
3 11.5
3 12.2
3 13.9
3 14.7
3 18.9
3 20.5
3 21.6
3 22.6
3 24.1
3 25.8
4 10.2
4 21.5
4 15.1
4 12.3
4 10.0
4 13.5
4 16.5
4 19.2
4 17.6
4 14.1
4 19.7
I used the 'for' statement but only saw output for Plot 1.
This was how I applied the 'for' statement:
After importing my data in R, I saved it as: SNq,
for (i in 1:SNq$Plot[i]) {
dp <- SNq$Depth[SNq$Plot==SNq$Plot[i]]
fit1 = fitdist(dp, "gamma") ## this is the function I'm fitting. The function is not the issue. My challenge is the 'for' statement.
fit1
}
I think this should work. Just make one change in your code:
Why would it work ?
Because: unique function will return unique values (1,2,3) which are nothing but the groups in Plot column. With unique value, we can subset the data using SNq$Depth[SNq$Plot==i] and get depth value for that group.
for (i in unique(SNq$Plot)) { # <- here
dp <- SNq$Depth[SNq$Plot==i]
fit1 = fitdist(dp, "gamma") ## this is the function I'm fitting. The function is not the issue. My challenge is the 'for' statement.
plot(fit1)
}
A tidyverse suggestion:
library("tidyverse")
library("fitdistrplus")
fits <- SNq %>%
group_by(Plot) %>%
nest() %>%
mutate(fits = map(data, ~ fitdist(data = .$Depth, distr = "gamma")),
summaries = map(fit, summary))
You could continue with print(fits$fits) and print(fits$summaries) to access the different fits and their summary. Alternatively you can use a syntax like fits$fits[[1]] and fits$summaries[[1]] to access them.
Try:
for (i in 1:nrow(SNq)) {
dp <- SNq$Depth[SNq$Plot==SNq$Plot[i]]
fit1 = fitdist(dp, "gamma")
fit1
}

Resources