Pivot_longer on all columns - r

I am using pivot_longer from tidyr to transform a data frame from wide to long. I wish to use all the columns and maintain rownames in a column as well. The earlier melt function works perfect on this call
w1 <- reshape2::melt(w)
head(w1)
'data.frame': 900 obs. of 3 variables:
$ Var1 : Factor w/ 30 levels "muscle system process",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Var2 : Factor w/ 30 levels "muscle system process",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value: num NA NA NA NA NA NA NA NA NA NA ...
But pivot_longer doesnt
w %>% pivot_longer()
Error in UseMethod("pivot_longer") :
no applicable method for 'pivot_longer' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"
Any suggestion is appreciated

Obviously some data would be helpful, but your problem lies in the fact that you are using pivot_longer() on an object of class matrix and not data.frame
library(tidyr)
# your error
mycars <- as.matrix(mtcars)
pivot_longer(mycars)
Error in UseMethod("pivot_longer") :
no applicable method for 'pivot_longer' applied to an object of class
"c('matrix', 'array', 'double', 'numeric')"
pivot_longer() will work on a data frame
> class(mycars)
[1] "matrix" "array"
> class(mtcars)
[1] "data.frame"
Remember to specify the cols argument, this was not required in reshape2::melt() (more info in the documentation). You want all the columns so cols = everything():
pivot_longer(mtcars, cols = everything())
(Disclaimer: Of course, mtcars is not the best dataset to convert to long format)

Related

Losing information after converting from factor to numeric in R

I have a dataframe in which some of the numeric column are in factor and i want to convert to numeric value. However i tried the below code but still it is losing the information.
> str(xdate1)
'data.frame': 6 obs. of 1 variable:
$ Amount.in.doc..curr.: Factor w/ 588332 levels "-0.5","-1","-1,000",..: 5132 57838 81064 98277 76292 71982
After converting to numeric i am losing the information. below is the output:
> xdate1$Amount.in.doc..curr.<-as.numeric(as.character(xdate1$Amount.in.doc..curr.))
Warning message:
NAs introduced by coercion
> str(xdate1)
'data.frame': 6 obs. of 1 variable:
$ Amount.in.doc..curr.: num -150 NA NA NA NA NA
You have values with commas( ',') which turn into NA when changing to numeric, remove them before converting to numeric.
xdate1$Amount.in.doc..curr. <- as.numeric(gsub(',', '', xdate1$Amount.in.doc..curr.))
Or use parse_number from readr
xdate1$Amount.in.doc..curr. <- readr::parse_number(as.character(xdate1$Amount.in.doc..curr.))

How to group daily data into months in a dataframe using dplyr

I have a dataframe containing daily counts of number group members seen present. I am wanting to get a monthly mean of the number of group members seen (produced in a data frame). I've been trying to use dplyr as it is much simpler than creating a new data frame and filling it using a for loop. I'm very new to coding and would like to be able to do this for multiple groups. My dataframe looks like this:
data.frame': 148 obs. of 7 variables:
$ Date : Date, format: "2013-05-01" "2013-05-02" ...
$ Group : chr "WK" "WK" "WK" "WK" ...
$ Session : Factor w/ 12 levels "AM","AM1","AM2",..: 9 1 9 9 1 9 9 1 1 1 ...
$ Group.Members.Seen : num 7 6 8 9 9 6 8 9 4 9 ...
$ Roving.Males : num NA NA NA NA NA NA NA NA NA NA ...
$ Undyed.Group.Members.Seen: num NA NA NA NA NA NA NA NA NA NA ...
$ Non.group.Other : num NA NA NA NA NA NA NA NA NA NA ..
I don't have an observation for every day, and sometimes have multiple observations for a day. In this particular instance, there is only data in the Group.members.seen column, however in other datasets i do have numbers in roving.males, undyed.group.members.seen, and non.group.other columns.
For this particular dataset, I am only wanting to work with the Date and Group.Members.seen columns, as I only have data in those columns. I've used select to select those columns, then have tried to use mutate, group_by, and summarise to get what I want. However, I think the problem is with the dates. Have also tried aggregate but i don't think that is the best.
test <- WK.2013 %>%
select(Date, Group.Members.Seen) %>%
mutate(mo = Date(format="%m"), mean.num.members = mean(Group.Members.Seen)) %>%
group_by(Date(format="%m")) %>%
summarise(mean = mean(Group.Members.Seen))
Error message is saying it cannot find the function "Date", which is probably the beginning of a long string of problems with that code.
You can try lubridate package and round dates to month or year or other units.
library(lubridate)
mydate <- today()
> floor_date(today(),unit = "month")
[1] "2019-07-01"
> floor_date(mydate,unit = "month")
[1] "2019-07-01"
> round_date(mydate,unit = "month")
[1] "2019-08-01"
It's hard to say for sure if this will work without seeing the actual data but could you try the apply.monthly function from the xts package?

Convert comma separated decimals from character to numeric

For my exam i have to build some scatter plots in r. I created a data frame with 4 variables. with this data frame i want to add regression lines to my scatter plots.
the name of my data frame is "alle".
variable names are: demo, tot, besch, usd
with this code i tried to line the regression line but got following result:
reg1<- lm(tot~demo, data=alle)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
here is the structure of "alle"
str(alle)
'data.frame': 11 obs. of 4 variables:
$ demo : chr "498.300.775" "500.297.033" "502.090.235" "503.170.618" ...
$ tot : Factor w/ 11 levels "4.846.423","4.871.049",..: 1 3 4 5 2 8 7 6 10 9 ...
$ besch: Factor w/ 9 levels "68,4","68,6",..: 5 7 3 2 2 1 1 4 6 8 ...
$ usd : Factor w/ 44 levels "0,68434","0,72584",..: 26 30 29 23 28 22 24 25 15 14 ...
Tried to convert column "demo" to numeric with
alle$demo <- as.numeric(as.character(alle$demo))
it converted the column to numeric but now the rows are full with "NA"s.
I think that i all columns must be numeric.
How can I convert all 4 columns to numeric and finally plot the regression lines.
Data:
> head(alle,6)
demo tot besch usd
1 498.300.775 4.846.423 69,8 1,3705
2 500.297.033 4.891.934 70,3 1,4708
3 502.090.235 4.901.358 69,0 1,3948
4 503.170.618 4.906.313 68,6 1,3257
5 502.964.837 4.871.049 68,6 1,3920
6 504.047.964 5.010.371 68,4 1,2848
thanks
Try doing it in two steps. First get rid of the dots, then replace the commas by decimal points and coerce to numeric.
alle[] <- lapply(alle, function(x) gsub("\\.", "", x))
alle[] <- lapply(alle, function(x) as.numeric(sub(",", ".", x)))
Note:
The above solution is broken in two for readability. The following does the same but it takes just one lapply loop and should therefore be faster if the dataset is big. If the dataset is small to medium, maybe the two steps solutions is preferable.
alle[] <- lapply(alle, function(x){
as.numeric(sub(",", ".", gsub("\\.", "", x)))
})
With dplyr:
library(dplyr)
alle %>%
mutate_all(as.character) %>%
mutate_at(c("besch","usd"),function(x) as.numeric(as.character(gsub(",",".",x)))) ->alle
demo tot besch usd
1 498.300.775 4.846.423 69.8 1.3705
2 500.297.033 4.891.934 70.3 1.4708
3 502.090.235 4.901.358 69.0 1.3948
4 503.170.618 4.906.313 68.6 1.3257
5 502.964.837 4.871.049 68.6 1.3920
6 504.047.964 5.010.371 68.4 1.2848

Convert delimited string to numeric vector in dataframe

This is such a basic question, I'm embarrassed to ask.
Let's say I have a dataframe full of columns which contain data of the following form:
test <-"3000,9843,9291,2161,3458,2347,22925,55836,2890,2824,2848,2805,2808,2775,2760,2706,2727,2688,2727,2658,2654,2588"
I want to convert this to a numeric vector, which I have done like so:
test <- as.numeric(unlist(strsplit(test, split=",")))
I now want to convert a large dataframe containing a column full of this data into a numeric vector equivalent:
mutate(data,
converted = as.numeric(unlist(strsplit(badColumn, split=","))),
)
This doesn't work because presumably it's converting the entire column into a numeric vector and then replacing a single row with that value:
Error in mutate_impl(.data, dots) : Column converted must be
length 20 (the number of rows) or one, not 1274
How do I do this?
Here's some sample data that reproduces your error:
data <- data.frame(a = 1:3,
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)
Here's the error:
library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18
A straightforward way would be to just use strsplit on the entire column, and lapply ... as.numeric to convert the resulting list values from character vectors to numeric vectors.
x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3
This might help:
library(purrr)
mutate(data, converted = map(badColumn, function(txt) as.numeric(unlist(strsplit(txt, split = ",")))))
What you get is a list column which contains the numeric vectors.
Base R
A=c(as.numeric(strsplit(test,',')[[1]]))
A
[1] 3000 9843 9291 2161 3458 2347 22925 55836 2890 2824 2848 2805 2808 2775 2760 2706 2727 2688 2727 2658 2654 2588
df$NEw2=lapply(df$NEw, function(x) c(as.numeric(strsplit(x,',')[[1]])))
df%>%mutate(NEw2=list(c(as.numeric(strsplit(NEw,',')[[1]]))))

Dplyr - Error: column '' has unsupported type

I have a odd issue when using dplyr on a data.frame to compute the number of missing observations for each group of a character variable. This creates the error "Error: column "" has unsupported type.
To replicate it I have created a subset. The subset rdata file is available here:
rdata file including dftest data.frame
First. Using the subset I have provided, the code:
dftest %>%
group_by(file) %>%
summarise(missings=sum(is.na(v131)))
Will create the error:
Error: column 'file' has unsupported type
The str(dftest) returns:
'data.frame': 756345 obs. of 2 variables:
$ file: atomic bjir31fl.dta bjir31fl.dta bjir31fl.dta bjir31fl.dta ...
..- attr(*, "levels")= chr
$ v131: Factor w/ 330 levels "not of benin",..: 6 6 6 6 1 1 1 9 9 9 ...
However, taking a subset of the subset, and running the dplyr command again, will create the expected output.
dftest <- dftest[1:756345,]
dftest %>%
group_by(file) %>%
summarise(missings=sum(is.na(v131)))
The str(dftest) now returns:
'data.frame': 756345 obs. of 2 variables:
$ file: chr "bjir31fl.dta" "bjir31fl.dta" "bjir31fl.dta" "bjir31fl.dta" ...
$ v131: Factor w/ 330 levels "not of benin",..: 6 6 6 6 1 1 1 9 9 9 ...
Anyone have any suggestions about what might cause this error, and what to do about it. In my original file I have 300 variables, and dplyr states that most of these are of unsupported type.
Thanks.
This seems to be an issue with using filter when a column of the data frame has an attribute. For example,
> df = data.frame(x=1:10, y=1:10)
> filter(df, x==3) # Works
x y
1 3 3
Add an attribute to the x column. Notice that str(df) shows x as atomic now, and filter doesn't work:
> attr(df$x, 'width')='broad'
> str(df)
'data.frame': 10 obs. of 2 variables:
$ x: atomic 1 2 3 4 5 6 7 8 9 10
..- attr(*, "width")= chr "broad"
$ y: int 1 2 3 4 5 6 7 8 9 10
> filter(df, x==3)
Error: column 'x' has unsupported type
To make it work, remove the attribute:
> attr(df$x, 'width') = NULL
> filter(df, x==3)
x y
1 3 3

Resources