I want to create a pivot table from my data set in excel to R. I have been following this tutorial on how to do this: http://excel2r.com/pivot-tables-in-r-basic-pivot-table-columns-and-metrics/ . I have used the codes mentioned in this tutorial by replacing it with my own data variables, but I keep getting an error message noting: Error: select() doesn't handle lists.
What does this error message mean and how I can I fix this?
The R-Script I have been using from the tutorial is:
library(dplyr)
library(tidyr)
pivot <- df %>%
select(Product.Category, Region, Customer.Segment, Sales)%>%
group_by(Product.Category, Region, Customer.Segment) %>%
summarise(TotalSales = sum(Sales))
Thank you in advance for the help!
By your error message: "select() doesn't handle lists.", I supose that your object called df isn't a dataframe.
Maybe you have a dataframe inside a list.
Try this in your R console:
class(df)
If the class is a list, you need take off the dataframe from the list. You can do this by the position. Probably in the first position. df[[1]]
The functions that you are using, works only for dataframes in general. (And tibbles, that is a another type of dataframe)
Like this example:
I hope it works for you.
And, for the next time, try to make an reproducible example.
You could at least print your dataframe original, before try to use these functions, that way I could help you efficiently.
Related
I have a given df which has some column names that include a "/" (e.g. "Province/State" and "Country/Region").
I want to first group the df by "Country/Region" and then summarize it like this:
confirmed_by_country <- confirmed %>%
group_by("Country/Region") %>%
summarize_at(vars(-Lat, -Long, -"Province/State"), sum)
When I try to run this code it tells me that the column "Province/State" does not exist. I was warned about using this problem but still can't figure out what I am doing wrong.
I am also confused why I am only getting this error for "Province/State" and not "Country/Region" in the group_by() function.
does anyone have an idea what the problem might be? Thanks!
Somehow it made a difference whether I imported the data with read.csv() or read_csv().
It didn't work with read.csv() even if I used backticks but it did when I used read_csv() and backticks. (The column names were also different depending on which one I used.)
If anyone knows why that is, I'm interested!
Hi all,
I am trying to rename the columns from the data frame (protein_df). As from here, the columns 'id' and 'Intensity' are shown to be located inside the data frame. However, the error message indicates that the argument to rename columns is unused. Does anyone have an idea of how this could happen?
Thanks!
When you have that type of "unexplainable" error for dplyr functions, it is usually because there is a conflict between different libraries. So use dplyr::rename and it should be good.
It's best to post your code as something that's copy/pastable text, you can format using backticks.
That error message means that the first arguments in rename() don't exist. I'm not sure if this is your goal, but my best guess is that you have the rename arguments backwards. Judging from the first print of your dataframe head(protien_df), id and intensity are already the column names, so they need to go first in your rename():
protein_df %>%
rename(Intensity = Protien_intensity,
id = Protien_group_IDs)
You can still pipe in the rename() bit to your read_tsv and save it to that df.
When you download the base and dplyr library together this may happen, I figured out the solution for this. Addition of dplyr:: will help you.
protein_df %>% dplyr::rename(Intensity = Protien_intensity, id = Protein_group_IDs)
I'm using an R script within Power Query to do some data transformations and return a scaled table.
My R code is like this:
# 'dataset'
It does seem like odd that this fails to return. A quick glance online gave this 3 minute youtube video, which uses the same method, which you are using. Further searching down a source, one may come across the Microsoft Documentation, which gives a possible reason for why there might be an issue.
When preparing and running an R script in Power BI Desktop, there are a few limitations:
Only data frames are imported, so make sure the data you want to import to Power BI is represented in a data frame
Columns that are typed as Complex and Vector are not imported, and are replaced with error values in the created table
These seem like the most obvious reasons. Betting that there is no complex columns in your dataset, I'd believe the prior is likely the reason. A quick recreation of your dataset shows that the scale functions changes your dataset into a matrix class object. This is kept by cbind, and as such output is of class matrix and not data.frame.
>dataset <- as.data.frame(abs(matrix(rnorm(1000),ncol=4)))
>class(dataset)
[1]"data.frame"
>library(dplyr)
>df_normal <- log(dataset + 1) %>%
> select(c(2:4)) %>%
> scale
>class(df_normal)
[1] "matrix"
>df_normal <- cbind(dataset[,1], df_normal)
>output <- df_normal
>class(output)
[1] "matrix"
A simple fix would then seem to be adding output <- as.data.frame(output), as this is in line with the documentation of powerBI. Maybe it would need a return like statement at the end. Adding a line at the end of the script simply stating output should fix this.
Edit
For clarification, I believe the following edited script (of yours) should return the data expected
# 'dataset' contém os dados de entrada neste script
library(dplyr)
df_normal <- log(dataset+1) %>%
select(c(2:4)) %>%
scale
df_normal <-cbind(dataset[,c(1)], df_normal)
output <- as.data.frame(df_normal)
#output ##This line might be needed without the first comment
I'm brand new to R and am having difficulty with something very basic. I'm importing data from an excel file like this:
data1 <- read.csv(file.choose(), header=TRUE)
When I try to look at the data in the table by column, R doesn't recognize the column headers as objects. This is what it looks like
summary(Square.Feet)
Error in summary(Square.Feet) : object 'Square.Feet' not found
I need to run a regression and I'm having the same problem. Any help would be much appreciated.
Yes it recognizes, you have to tell R to select the dataframe so:
summary(data1$Square.Feet)
Where "data" is the name of your dataframe, and after the dollar goes the name of the variable
Hope it helps
UPDATE
As suggested below, you can use the following:
data1 <- read.csv(file.choose(), header=TRUE)
attach(data1)
This way, by doing "attach", you avoid to write everytime the name of the dataset, so we would go from
summary(data1$Square.Feet)
To this point after attaching the data:
summary(Square.Feet)
However I DO NOT recommend to do it, because if you load other datasets you may mess everything as it's quite common that variables have the same names, among other major problems, see here (Thanks Ben Bolker for your contribution): here , here, here and
here
if you want a summary of all data fields, then
summary(data1)
or you can use the 'with' helper function
with(data1, summary(Square.Feet))
I can't figure out why this version of plyr's rename function isn't working.
I have a dataframe where I have a single column that ends up being named seq(var_slcut_trucknumber_min, var_slcut_trucknumber_max) because I made it like this:
df_metbal_slcut <- as.data.frame(seq(var_slcut_trucknumber_min,var_slcut_trucknumber_max))
The terms var_slcut_trucknumber_min and var_slcut_trucknumber_max are defined as the min and max of another column.
However, when trying to rename it by the following code,
var_temp <- names(df_metbal_slcut)
df_metbal_slcut <- rename(df_metbal_slcut, c(var_temp="trucknumber"))
I get an error as follows:
The following `from` values were not present in `x`: var_temp
I don't understand why. I know that I can easily do this as colnames(df_metbal_slcut)[1] <- "trucknumber", but I'm an R n00b, and I was looking at a data manipulation tutorial that said that learning plyr was the way to go, so here I am stuck on this.
Try this instead:
df_metbal_slcut <- rename(df_metbal_slcut, setNames("trucknumber",var_temp))
The reason it wasn't working was that c(var_temp = "trucknumber") creates a named vector with the name var_temp, which is not what you were intending. When creating named objects using the tag = value syntax, R won't evaluate variables. It assumes that you literally want the name to be var_temp.
More broadly, it might make sense to name the column more sensibly when initially creating the data frame again using setNames.