Get distinct values from columns in R [closed]

Get distinct values from columns in R [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following kind of data in my csv file
DriveNo Date and Time Longitude
156 2014-01-31 23:00:00 41.88367183
187 2014-01-31 23:00:01 41.92854
These data have a lot of noise. Sometimes, a driver(the DriveNo is unique) is present in two different locations at the same time , which is not possible and a noise. I tried to do it using distinct(select(five,DriveNo,Date and Time))
but i get the following error
Error: unexpected symbol in "distinct(select(five,DriveNo,Date and"
However, when i try
distinct(select(five,DriveNo,Longitude))
it works.But, i need it with DriveNo and Date and Time.

you can escape with backticks, like:
df %>%
select(DriveNo, `Date and Time`, Longitude) %>%
distinct()
or using group_by, like:
df %>%
group_by(DriveNo, `Date and Time`) %>%
select(Longitude) %>%
unique()

Related

Why don't we have statistics for each group? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Input:
library("UsingR")
library("dplyr")
data("kid.weights")
attach(Kid.weights)
df <- data.frame(gender[1:6],
weight[1:6],
height[1:6],
Kg=weight[1:6] * 0.453592,
M=height[1:6] * 0.0254)
df
df %>%
group_by(df$gender) %>%
summarise(mean(df$weight))
Output:
> df %>%
+ group_by(df$gender) %>%
+ summarise(mean(df$weight))
# A tibble: 2 x 2
`df$gender` `mean(df$weight)`
<fct> <dbl>
1 F 58.3
2 M 58.3
I want to make data frame for mean(weight(kg)) or median(weight(kg)) to gender.
but it is not working. looks like.
how to it solve?

Once you use %>% you don't need to reference to df anymore:
df %>%
group_by(gender) %>%
summarise(mean(weight))
%>% is a pipeline which makes you accessible to the columns directly, on each group, df$gender and df$weight would give you the whole column.

In my CSV file, it shows I have 1 column when I actually have 15 columns [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
> df <- read.csv("DATA ONLY.csv", header = TRUE, sep = ";")
> dim(df)
[1] 439 1
This is the code I use and this is the CSV
https://docs.google.com/spreadsheets/d/1SOqDKXZ7BAMW5LdqBcBIvQE9_PnFcNIHDfYDty3cTto/edit?usp=sharing

I am 99% sure that you have defined the field separator wrong. data.table::fread is really good at sniffing the correct format of csv's, and I quite often use fread even if I just convert the resulting data.table back to vanilla data frame, i.e.
library(data.table)
df <- fread("DATA ONLY.csv")
as.data.frame(df) -> df

A wierd problem that group_by() doesn't work? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have a dataset with 3 factor columns and 4 numeric columns. I want to use group_by() to summarize it. But no matter how I try it doesn't work, there is no group.
freetick <- read.csv("FreeTickAll.csv", stringsAsFactors=FALSE)
library(dplyr)
group1 <- freetick %>% group_by(Habitat, Month) %>% summarize(
meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva)
)
group1
The result:
> group1
meanAd meanNy meanLa
1 0.6129032 4.258065 20.1129
And my group1 data.frame also show:
mean Ad mean Ny mean La
1 0.6129032 4.258065 20.1129

If a function is common in multiple packages and those packages are loaded into the working env, then there is a possibility of masking the function from the last loaded package. In such cases, either restart the R session with only the package of interest loaded (dplyr in this case) or specify the function to be loaded explicitly from the package of interest (dplyr::summarise)
freetick %>%
dplyr::group_by(Habitat, Month) %>%
dplyr::summarise(meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva))

string detection in dplyr filter results in evaluation error [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am using R and the dplyr library to perform string search for a character data.frame that consists of one column. Performing an AND operation on two strings for conditional test to location data. I have an error in dplyr::filter evaluation.
str(dec2013.data)
'data.frame': 10481 obs. of 1 variable:
$ Routing.SYSID: chr "L839ITSRLX001TGU3" "L839SMARLX001TGU3"
"L839CRJRLX001TGU3" "L839BUARLX001TGU3"
dec2013.data.route1 <- data.frame()
dec2013.data.route1 <- dec2013.data %>%
filter(str_detect(Routing.SYSID,"L839" &
str_detect(Routing.SYSID,"TGU3")))
dput()
"L839CHHFNL626TGU3", "L839HPHFNL626TGU3", "L839NHBFNL626TGU3",
"L839BMQFNL626TGU3", "L839JUCFNL626TGU3", "L839KJYFNL626TGU3",
"L839KPPFNL626TGU3", "L839IWHFNL626TGU3", "L839NOFFNL626TGU3",
"L839DXQFNL626TGU3", "L839TMUFNL626TGU3", "L839RGCFNL626TGU3"
Error in filter_impl(.data, quo) : Evaluation error: operations are
possible only for numeric, logical or complex types.

you just have wrong parenthesis. This should work:
dec2013.data %>%
filter(str_detect(Routing.SYSID,"L839"), str_detect(Routing.SYSID,"TGU3"))

Calculation inaccuracy in date difference [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I tried to find the duration of the number of days of the starting date until the present time using Sys.time(). I use the following command to find the duration. However the output in R is totally wrong.
person$duration <- lubridate::interval(as.Date(person$create_date, "%m/%d/%y"),Sys.Date()) %/% days()
The output:
Name create_date duration
A 09/23/2014 -811
B 05/05/2014 -670
It is supposed to be 1380 days NOT -811. I am not sure why is it negative and why is it '-811' or '-670' specifically.

You were very close. Since your year consists of 4 digits, you need a capital Y.
library(lubridate)
interval(as.Date("09/23/2014", "%m/%d/%Y"),Sys.Date()) %/% days()
gives 1380.
In your code it took only the first 2 digits, and it assumed you wanted the current century, so year 2020. To be exact: in case you provide two numbers as a year, values between 69 and 99 are converted to 1969-1999, and values between 00 and 68 to 2000-2068.
interval(as.Date("09/23/2020", "%m/%d/%Y"),Sys.Date()) %/% days()
gives -811 as well.

Use the simple difference
as.numeric(as.Date("2018-07-05") - Sys.Date())
# use abs() if the date it's in the past

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Get distinct values from columns in R [closed] - r

you can escape with backticks, like: df %>% select(DriveNo, `Date and Time`, Longitude) %>% distinct() or using group_by, like: df %>% group_by(DriveNo, `Date and Time`) %>% select(Longitude) %>% unique()

Related

Why don't we have statistics for each group? [closed]

In my CSV file, it shows I have 1 column when I actually have 15 columns [closed]

A wierd problem that group_by() doesn't work? [closed]

string detection in dplyr filter results in evaluation error [closed]

Calculation inaccuracy in date difference [closed]

Categories

Resources