Error in sortedXyData using R - r

I have a dataset composed of observations X, regressors X and grouping variables group.
Link to the dataset : data.txt
Using the library nlme, I can build a dataframe with :
ex1 <- groupedData(Y ~ X | group,data=mydata)
After that, I would like to apply the function sortedXyData in order to sort my data with respect to X. When I try
sortedXyData("X","Y",ex1)
I get the following error :
[1] x y
<0 lignes> (ou 'row.names' de longueur nulle)
Message d'avis :
In sortedXyData.default("X", "Y", ex1) :
NAs introduits lors de la conversion automatique
But if I try with a much simpler dataset such as :
X <- c(1.2,2.2,3.5,-3.8,9,3.7,4,8,7)
Y <- c(5,4,8,3,6,2,0,5,5)
group <- c(1,2,3,4,5,6,7,8,9)
group <- as.factor(group)
data1 <- data.frame(X,Y,group)
data2 <- groupedData(Y ~ X | group,data=data1)
sortedXyData("X","Y",data2)
then, the output is :
x y
1 4 0
2 7 5
3 8 5
4 9 6
Message d'avis :
In sortedXyData.default("X", "Y", data2) :
NAs introduits lors de la conversion automatique
We can see that the integer values in X are sorted but not the decimal values. It seems the problem comes from the way R deals with these values. I don't know what to do in order to have all the values in X sorted.

You have something wrong with your setup, or you are not posting exactly what you did.
If I run your simple data set, I get proper sorting:
sortedXyData("X","Y",data2)
x y
1 -3.8 3
2 1.2 5
3 2.2 4
4 3.5 8
5 3.7 2
6 4.0 0
7 7.0 5
8 8.0 5
9 9.0 6
Make sure you have the latest version of R and of the packages in use.

Related

How to find first 3 max values in a specified column?

In R, I tried several codings but they all got errors messages. But even finally some turned out with the result, they do not look like correct. Indeed they look a bit mess up. Why?
I tried to find max values from the column of 'no. of ratings' for different smart water bottle products, but received 'volumes' column's descriptions.
> Smart_Water_Bottle_Review[which.max(Smart_Water_Bottle_Review$`no. of ratings`)]
# A tibble: 9 x 1
`volumne (oz)`
<chr>
1 16
2 17, 21
3 20
4 20.3
......
Warning message:
In which.max(Smart_Water_Bottle_Review$`no. of ratings`) :
NAs introduced by coercion
And as I changed to another column, again, another column's result other then that showed up.
Smart_Water_Bottle_Review[which.max(Smart_Water_Bottle_Review$`volumne (oz)`)]
# A tibble: 9 x 1
`keep hot (hrs)`
<chr>
1 NIL
2 0
3 NIL
......
Warning message:
In which.max(Smart_Water_Bottle_Review$`volumne (oz)`) :
NAs introduced by coercion
Simply speaking, I asked for the max value of no. of ratings, it gives the volumne; and asked for 'volumne', it gave 'keep hot'.
Plus, I asked for max, it provides everything.
Please advise how to correct these or the right syntax, thanks.
There are several issues here.
It looks like you have your variables stored as character, rather than numeric. The NAs introduced by coercion warning is saying that the variable was converted to numeric, but that some values weren't number-like and couldn't be converted (e.g., the NIL value). You should convert your variable to numeric first with as.numeric() and verify that all values are correctly converted.
Second, when you subset a data frame or tibble with [ using numbers and no commas, it selects the column matching that index.
Here is how I would recommend you solve this problem
library(dplyr)
Smart_Water_Bottle_Review |>
mutate(`no. of ratings` = as.numeric(`no. of ratings`)) |>
filter(`no. of ratings` == max(`no. of ratings`, na.rm = TRUE))
# alternative, notice the comma indicating to use [row, column] indexing
Smart_Water_Bottle_Review[which.max(as.numeric(Smart_Water_Bottle_Review$`no. of ratings`)) , ]
You mentioned that you wanted the top 3 values. That is most easily done by sorting the data frame:
library(dplyr)
Smart_Water_Bottle_Review |>
mutate(`no. of ratings` = as.numeric(`no. of ratings`)) |>
arrange(-`no. of ratings`) |>
slice(3)
for no_of_rating:
smartrating %>%
mutate(no_of_ratings_100s = as.numeric(no_of_ratings_100s)) %>%
arrange(-no_of_ratings_100s)
# A tibble: 9 x 2
Products no_of_ratings_100s
<chr> <dbl>
1 Hydrate Spark 3 Tracks (not insultaed) 37.3
2 Hydrate Spark Stainless Steel 21 oz 16.5
3 CrazyCap Self Cleaning, UV water purifyer 11.9
4 Thermos 24 oz hydration bottle w smart lid 11.5
5 LARQ 8.29
6 Philips Water GoZero UV Self-Cleaning Vacuum Insulated 3.22
7 Bellabeat 0.37
8 Equa Smart Water 0.02
for amazon_rating: (no mutate function is needed)
smartamazon %>%
arrange(-amazon_ratings_5)
# A tibble: 9 x 2
Products amazon_ratings_5
<chr> <dbl>
1 LARQ 4.5
2 CrazyCap Self Cleaning, UV water purifyer 4.4
3 Hydrate Spark Stainless Steel 21 oz 4.4
4 Hydrate Spark 3 Tracks (not insultaed) 4.4
5 Philips Water GoZero UV Self-Cleaning Vacuum Insulated 4.1
6 Equa Smart Water 4
7 Bellabeat 3.8
8 Thermos 24 oz hydration bottle w smart lid 3.7

Convert comma separated decimals from character to numeric

For my exam i have to build some scatter plots in r. I created a data frame with 4 variables. with this data frame i want to add regression lines to my scatter plots.
the name of my data frame is "alle".
variable names are: demo, tot, besch, usd
with this code i tried to line the regression line but got following result:
reg1<- lm(tot~demo, data=alle)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
here is the structure of "alle"
str(alle)
'data.frame': 11 obs. of 4 variables:
$ demo : chr "498.300.775" "500.297.033" "502.090.235" "503.170.618" ...
$ tot : Factor w/ 11 levels "4.846.423","4.871.049",..: 1 3 4 5 2 8 7 6 10 9 ...
$ besch: Factor w/ 9 levels "68,4","68,6",..: 5 7 3 2 2 1 1 4 6 8 ...
$ usd : Factor w/ 44 levels "0,68434","0,72584",..: 26 30 29 23 28 22 24 25 15 14 ...
Tried to convert column "demo" to numeric with
alle$demo <- as.numeric(as.character(alle$demo))
it converted the column to numeric but now the rows are full with "NA"s.
I think that i all columns must be numeric.
How can I convert all 4 columns to numeric and finally plot the regression lines.
Data:
> head(alle,6)
demo tot besch usd
1 498.300.775 4.846.423 69,8 1,3705
2 500.297.033 4.891.934 70,3 1,4708
3 502.090.235 4.901.358 69,0 1,3948
4 503.170.618 4.906.313 68,6 1,3257
5 502.964.837 4.871.049 68,6 1,3920
6 504.047.964 5.010.371 68,4 1,2848
thanks
Try doing it in two steps. First get rid of the dots, then replace the commas by decimal points and coerce to numeric.
alle[] <- lapply(alle, function(x) gsub("\\.", "", x))
alle[] <- lapply(alle, function(x) as.numeric(sub(",", ".", x)))
Note:
The above solution is broken in two for readability. The following does the same but it takes just one lapply loop and should therefore be faster if the dataset is big. If the dataset is small to medium, maybe the two steps solutions is preferable.
alle[] <- lapply(alle, function(x){
as.numeric(sub(",", ".", gsub("\\.", "", x)))
})
With dplyr:
library(dplyr)
alle %>%
mutate_all(as.character) %>%
mutate_at(c("besch","usd"),function(x) as.numeric(as.character(gsub(",",".",x)))) ->alle
demo tot besch usd
1 498.300.775 4.846.423 69.8 1.3705
2 500.297.033 4.891.934 70.3 1.4708
3 502.090.235 4.901.358 69.0 1.3948
4 503.170.618 4.906.313 68.6 1.3257
5 502.964.837 4.871.049 68.6 1.3920
6 504.047.964 5.010.371 68.4 1.2848

Error in 2 * "X2B" : non-numeric argument to binary operator

I am trying to look at the baseball data from 1903 through 1960 from the Lahman database. I am doing this for my own research. I am wanting to use the batting table, which does not include batting average, slugging, OBP or OPS.
I want to calculate those, but I first need to get total bases. I am having trouble getting the program to calculate total bases with the X2B and X3B.
I've looked into trying as.numeric, but I couldn't get it to work. This is using R and R studio. I've tried putting quotes around X2B and X3B for the doubles and triples and without quotes.
batting_1960 <- batting_1903 %>%
filter(yearID <= 1960 & G >= 90) %>%
mutate(Batting_Average = H/AB, TB = (2*"X2B")+(3*"X3B")+HR+(H-"X2B"-"X3B"-HR)) %>%
arrange(yearID, desc(Batting_Average))
I expect that for each row of data, that the total bases will be calculated in a new column but I get the error:
Error in 2 * "X2B" : non-numeric argument to binary operator
This would be so that I could eventually calculated OPS, OBP and slugging.
Your code is trying to mutiply 2 by the literal string "X2B", which is not going to work. Column names should be unquoted in mutate().
Your error:
> tibble(X2B = 1:10) %>% mutate(TB = 2 * "X2B")
Error in 2 * "X2B" : non-numeric argument to binary operator
Should be, for example:
> tibble(X2B = 1:10) %>% mutate(TB = 2 * X2B)
# A tibble: 10 x 2
X2B TB
<int> <dbl>
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
6 6 12
7 7 14
8 8 16
9 9 18
10 10 20

Value labels (levels) are lost when modifing a memisc:data.set in R

I use memisc:data.set because I import data from SPSS. I can get the value labels (in SPSS meaning) from a object when asking for levels(). I use that for the labels of the tick-marks in a plot.
When I modify the data.set (like in the exmpale below) levels() doesn't work anymore.
library('memisc')
# example dta
d <- data.set(a = sample(1:100))
d$a_strat <- cut(d$a, breaks=seq(1,100, by=10))
# "modify" the data.set
e <- d[,c('a_strat')]
# it is still a data.set but "a_strat" changed it's type
> class(e)
[1] "data.set"
attr(,"package")
[1] "memisc"
Now have a look at the different data types of a_strat in the two data.set.
> str(d$a_strat)
Factor w/ 9 levels "(1,11]","(11,21]",..: 4 9 3 1 NA 9 5 4 9 9 ...
> str(e$a_strat)
$ Nmnl. item w/ 9 labels for 1,2,3,... int 4 9 3 1 NA 9 5 4 9 9 ...
The practical issue is I can not do that on the second data.set.
> levels(e$a_strat)
NULL
But this works
> labels(e$a_strat)
Values and labels:
1 '(1,11]'
2 '(11,21]'
3 '(21,31]'
4 '(31,41]'
5 '(41,51]'
6 '(51,61]'
7 '(61,71]'
8 '(71,81]'
9 '(81,91]'
But when I use that for plotting in axis(..., labels=labels(e$_strat)) the value labels (e.g. (32,41]) doesn't appear. Instead of that the values (1, 2, ..., 9) appear on the tickmarks.
I am not sure how to solve that.
The little helper here is as.factor().
So it could look like this
axis(..., labels=labels(as.factor(e$_strat)))
But please don't rate that answer positive. ;) I still can't understand why the type of a_strat changes in my example.

R programming fail to run head() on a SpatialPointsDataFrame

I need to convert a R data.frame object into a SpatialPointsDataFrame object in order to run spatial statistics functions on the data. However, for some reason converting a data.frame object into a SpatialPointsDataFrame give an unexpected behavior when running specific functions on the converted object.
In this example I try to run head() function on the resulting SpatialPointsDataFrame
Why does the function head() fail on some SpatialPointsDataFrame objects?
Here is the code to reproduce the behavior.
Example 1, no error:
#beginning of r code
#load S Classes and Methods for Spatial Data package "sp"
library(sp)
#Load an example dataset that contain geographic ccoordinates
data(meuse)
#check the structure of the data, it is a data.frame
str(meuse)
#>'data.frame': 155 obs. of 14 variables: ...
#with coordinates x,y
#Convert the data into a SpatialPointsDataFrame, by function coordinates()
coordinates(meuse) <- c("x", "y")
#check structure, seems ok
str(meuse)
#Check first rows of the data
head(meuse)
#It worked!
#Now create a small own dataset
testgeo <- as.data.frame(cbind(1:10,1:10,1:10))
#set colnames
colnames(testgeo) <- c("x", "y", "myvariable")
#convert to SpatialPointsDataFrame
coordinates(testgeo) <- c("x", "y")
#Seems ok
str(testgeo)
#But try running for instance head()
head(testgeo)
#Resulting output: Error in `[.data.frame`(x#data, i, j, ..., drop = FALSE) :
#undefined columns selected
#end of example code
There is some difference between the two example datasets that I do not understand. Function str() does not reveal the difference?
Why does the function head() fail on the dataset testgeo?
Why does head() work when adding more columns, 10 seems to be the limit:
testgeo <- as.data.frame(cbind(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10))
coordinates(testgeo) <- c("V1", "V2")
head(testgeo)
There is no specific head method for SpatialPoints/PolygonsDataFrames, so when you call head(testgeo) or head(meuse) it falls through to the default method:
> getAnywhere("head.default")
A single object matching ‘head.default’ was found
It was found in the following places
registered S3 method for head from namespace utils
namespace:utils
with value
function (x, n = 6L, ...)
{
stopifnot(length(n) == 1L)
n <- if (n < 0L)
max(length(x) + n, 0L)
else min(n, length(x))
x[seq_len(n)]
}
<bytecode: 0x97dee18>
<environment: namespace:utils>
What this does is then returns x[1:n], but for those spatial classes, square bracket indexing like that takes columns:
> meuse[1]
coordinates cadmium
1 (181072, 333611) 11.7
2 (181025, 333558) 8.6
3 (181165, 333537) 6.5
4 (181298, 333484) 2.6
5 (181307, 333330) 2.8
6 (181390, 333260) 3.0
7 (181165, 333370) 3.2
8 (181027, 333363) 2.8
9 (181060, 333231) 2.4
10 (181232, 333168) 1.6
> meuse[2]
coordinates copper
1 (181072, 333611) 85
2 (181025, 333558) 81
3 (181165, 333537) 68
4 (181298, 333484) 81
5 (181307, 333330) 48
6 (181390, 333260) 61
7 (181165, 333370) 31
8 (181027, 333363) 29
9 (181060, 333231) 37
10 (181232, 333168) 24
So when you do head(meuse) it tries to get meuse[1] to meuse[6], which exist because meuse has lots of columns.
But testgeo doesn't. So it fails.
The real fix might be to write a head.SpatialPointsDataFrame that goes:
> head.SpatialPointsDataFrame = function(x,n=6,...){x[1:n,]}
so that:
> head(meuse)
coordinates cadmium copper lead zinc elev dist om ffreq soil
1 (181072, 333611) 11.7 85 299 1022 7.909 0.00135803 13.6 1 1
2 (181025, 333558) 8.6 81 277 1141 6.983 0.01222430 14.0 1 1
3 (181165, 333537) 6.5 68 199 640 7.800 0.10302900 13.0 1 1
4 (181298, 333484) 2.6 81 116 257 7.655 0.19009400 8.0 1 2
5 (181307, 333330) 2.8 48 117 269 7.480 0.27709000 8.7 1 2
6 (181390, 333260) 3.0 61 137 281 7.791 0.36406700 7.8 1 2
lime landuse dist.m
1 1 Ah 50
2 1 Ah 30
3 1 Ah 150
4 0 Ga 270
5 0 Ah 380
6 0 Ga 470
> head(testgeo)
coordinates myvariable
1 (1, 1) 1
2 (2, 2) 2
3 (3, 3) 3
4 (4, 4) 4
5 (5, 5) 5
6 (6, 6) 6
The real real problem here is that the spatial classes don't inherit from data.frame, so they don't behave like them.
head(meuse) didn't give you the first few rows of the dataset meuse but its first few columns (6 + the coordinate column).
Your dataset testgeo only have 1 column so head(testgeo) fails. However head(testgeo,1) works.
head(testgeo,1)
coordinates myvariable
1 (1, 1) 1
2 (2, 2) 2
3 (3, 3) 3
4 (4, 4) 4
5 (5, 5) 5
6 (6, 6) 6
7 (7, 7) 7
8 (8, 8) 8
9 (9, 9) 9
10 (10, 10) 10
The reason why columns are selected instead of rows is unknown to me but if you want to see the first few rows of testgeo you can use the more traditional:
testgeo[1:5, ]
coordinates myvariable
1 (1, 1) 1
2 (2, 2) 2
3 (3, 3) 3
4 (4, 4) 4
5 (5, 5) 5
sp now has a head method for all Spatial objects, implemented as
> sp:::head.Spatial
function (x, n = 6L, ...)
{
ix <- sign(n) * seq(abs(n))
x[ix, , drop = FALSE]
}
note that it also takes care of negative n

Resources