This question already has answers here:
Calculate the mean by group
(9 answers)
Closed last year.
In the following CSV file:
Species, Age
australian, 2.6
australian, 2.3
brown, 2.3
brown, 2.3
brown, 3.4
brown, 3.4
dalmatian, 5.1
dalmatian, 4.4
dalmatian, 4.4
dalmatian, 4.1
dalmatian, 4.2
dalmatian, 4.7
dalmatian, 5.5
I am attempting to calculate the mean for the Pelican species, but R is displaying an error about unequal lengths.
df <- read.csv('c:/Users/Michelle/Downloads/pelican.csv')
tapply(df$Species, df$Age, mean)
Error in tapply(df$Species, df$Age, mean) :
arguments must have same length
I assumed the tapply function would output each pelican species with the mean age of each.
Unfortunately, the director at the University of Florida is insisting I use base R functions.
Edit 1:
str(df) 'data.frame': 13 obs. of 2 variables: $ Species: chr "australian" "australian" "brown" "brown" ... $ Age : num 2.6 2.3 2.3 2.3 3.4 3.4 5.1 4.4 4.4 4.1 ...
dput(df) structure(list(Species = c("australian", "australian", "brown", "brown", "brown", "brown", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian"), Age = c(2.6, 2.3, 2.3, 2.3, 3.4, 3.4, 5.1, 4.4, 4.4, 4.1, 4.2, 4.7, 5.5)), class = "data.frame", row.names = c(NA, -13L))
Thank you Pedro for the help.
Thank you for any help you can provide.
M.
Welcome Michelle! The tapply function works with two main objets (these objects need to be vectors), called X and INDEX. What the error messages is telling you, is that X and INDEX does not have the same length.
The example below, reproduces the same error that you are facing. See that the X object have 4 elements, but INDEX have only 2.
tapply(X = c(5, 6, 7, 8), INDEX = c(1, 2), mean)
This means that, to fix your error, the first and second objects that you pass to tapply(), need to have the same length. In your example, these two objects are df$Species and df$Age. You can confirm if df$Species and df$Age does not have the same length, by comparing the result of length(df$Species) and length(df$Age). If they are equal, then, these two vectors have the same length. But, if they are not equal, then these two vectors have different lengths.
What is probably going wrong in your code, is that the read.csv() function is not correctly reading your CSV file. Maybe df was transformed to a list, and not a data.frame. We cannot give better help than this for you, because we do not know what the df object is, or, how it is structured in your R session.
You could give these useful information for us, by copying and pasting the result of str(df) command, or, dput(df). Both of these commandos would give us enough information to probably point out exactly what you need to do. So, next time, when you post a question, is good idea to include these infos.
Anyway, when I copy and paste the CSV file that you passed, and try to run your code, everything works fine. So, again, your df object is probably not structured as you expected, probably because of some problem at the read.csv() function.
text <- "
Species, Age
australian, 2.6
australian, 2.3
brown, 2.3
brown, 2.3
brown, 3.4
brown, 3.4
dalmatian, 5.1
dalmatian, 4.4
dalmatian, 4.4
dalmatian, 4.1
dalmatian, 4.2
dalmatian, 4.7
dalmatian, 5.5"
data <- readr::read_csv(text)
tapply(data$Age, data$Species, mean)
Result:
australian brown dalmatian
2.450000 2.850000 4.628571
I want to create multiple files for columns in a life table. I thought the easiest way to do this would be to save the files using their variable names (ax, Sx, lx, Lx, ...). However, I cannot get R to create two files based on the same name (one in lower case and one in upper case, e.g. lx.csv and Lx.csv).
To demonstrate the problem:
# write a csv as normal
write.csv(mtcars, "d.csv")
# next line seems to replace d.csv rather than create a new D.csv file
write.csv(iris, "D.csv")
# get iris when read back in
d <- read.csv("d.csv")
head(d)
# X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 1 5.1 3.5 1.4 0.2 setosa
# 2 2 4.9 3.0 1.4 0.2 setosa
# 3 3 4.7 3.2 1.3 0.2 setosa
# 4 4 4.6 3.1 1.5 0.2 setosa
# 5 5 5.0 3.6 1.4 0.2 setosa
# 6 6 5.4 3.9 1.7 0.4 setosa
Is this behavior normal and is there a way to force the creation of new file with the upper case name?
I am using Windows and R 4.1.0
Update
Thanks to #tim for the answer. I had to go through the following steps in Powershell (in admin mode)
Run Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Restart PC
Run cd C:\folder to get to the location i want to enable case sensitive file names
Run (Get-ChildItem -Recurse -Directory).FullName | ForEach-Object {fsutil.exe file setCaseSensitiveInfo $_ enable}
I wanted to enable case sensitive file names for all the sub directories. I think if I just needed for a single folder I could have used fsutil.exe file setCaseSensitiveInfo C:\folder enable for 3 and 4
Windows' NTFS file system is case insensitive. with the april 18 update sensitivity for specific folders was introduced:
https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/#:~:text=Windows%2010%20now%20offers%20an%20optional%20case-sensitive%20file,see%20%E2%80%9Cfile%E2%80%9D%20and%20%E2%80%9CFile%E2%80%9D%20as%20two%20separate%20files.
I have the following dataset:
Class Value
Drive 9.5
Analyser 6.35
GameGUI 12.09
Drive 9.5
Analyser 5.5
GameGUI 2.69
Drive 9.5
Analyser 9.10
GameGUI 6.1
I want to retrieve the classes that have similar values, which would be in the case of the example above is Drive. To do that I have the following command:
dataset[as.logical(ave(dataset$Value, dataset$Class, FUN = function(x) all(x==1))), ]
But this command returns only the classes that their values is always one. What I want is different, I don't want to give a specific value.
I'm using R markdown to create an html document. I've written a function that produces the following data frame as its output:
April ($) April Growth (%) Current ($) Current Growth (%) Change (%)
1 2013:3 253,963.49 0.2 251,771.20 0.7 -0.9
2 2013:4 253,466.09 -0.8 251,515.26 -0.4 -0.8
3 2014:1 255,448.95 3.2 255,300.10 6.2 -0.1
4 2014:2 259,376.84 6.3 259,919.99 7.4 0.2
5 2014:3 261,398.85 3.2 262,486.91 4.0 0.4
6 2014:4 264,309.06 4.5 266,662.59 6.5 0.9
I'm then supplying this data frame to htmlTable as shown:
html.tab <- htmlTable(sample.df, rnames=F)
print(html.tab)
However, when I knit the file I the following table is produced:
Can anyone explain what is happening? I thought perhaps it was the data class in the data frame but I didn't see anything in the htmlTable vignette saying it couldn't handle data of certain classes.
This is my first time working with R Markdown and htmlTables so hopefully I've just made some basic mistake but I haven't been able to find anyone else with the same problem.
Thanks to Benjamin for the suggestion. It turns out the problem was the data class. sample.df contained data of class factor which apparently htmlTable can't handle. By converting the data to characters the correct table is produced.
sample.df[] <- lapply(sample.df, as.character)
Perhaps someone more familiar with the package can explain why factors are a problem?
I knew it would be something basic like this!
I'm trying to create a time series plot using R where obtain the dates from a REST request and then I want to group and count the date occurrences on a one week interval. I followed the examples of ts() in R and tried plots, which worked great. But I couldn't find any examples that shows how to create date aggregation based on existing data. Can someone point me in the proper direction?
This is a sample of my parsed REST data:
REST Response excerpt ....
"2014-01-16T14:51:50.000-0800"
"2014-01-14T15:42:55.000-0800"
"2014-01-13T17:29:08.000-0800"
"2014-01-13T16:19:31.000-0800"
"2013-12-16T16:56:39.000-0800"
"2014-02-28T08:11:54.000-0800"
"2014-02-28T08:11:28.000-0800"
"2014-02-28T08:07:02.000-0800"
"2014-02-28T08:06:36.000-0800"
....
Sincerely,
code B.
You can define the date with "as.Date" and then create a time series with "xts", as it allows merging by any period of time.
library(xts)
REST$date <- as.Date(REST$date, format="%Y-%m-%d")
REST$variable <- seq(0,2.4,by=.3)
ts <- xts(REST[,"variable"], order.by=REST[,"date"])
> to.monthly(ts)
ts.Open ts.High ts.Low ts.Close
Dec 2013 1.2 1.2 1.2 1.2
Xan 2014 0.6 0.9 0.0 0.0
Feb 2014 1.5 2.4 1.5 2.4
> to.weekly(ts)
ts.Open ts.High ts.Low ts.Close
2013-12-16 1.2 1.2 1.2 1.2
2014-01-16 0.6 0.9 0.0 0.0
2014-02-28 1.5 2.4 1.5 2.4
Not sure if this is what you needed. Is it?