Why does R sometimes think ASCII characters are non-ASCII? [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 19 days ago.
Improve this question
I am trying to identify elements in a dataframe that contain non-ASCII characters. For example, in the dataframe below I would want all rows in the unicode_only column and the last two rows in the mixed column.
example_dataset <- tribble(
~ascii_only, ~unicode_only, ~mixed,
"a", "表", "c",
"b", "外", "表",
"c", "字", "外",
)
When I try filtering elements using the regular expression "[^[:ascii]]", however, some ASCII-only elements are included.
example_dataset %>%
mutate(row_number = row_number()) %>%
pivot_longer(c(everything(), -row_number),
names_to = "variable") %>%
select(variable, row_number, value) %>%
arrange(variable, row_number) %>%
filter(str_detect(value, "[^[:ascii]]"))
variable
row_number
value
ascii_only
2
b
mixed
2
表
mixed
3
外
unicode_only
1
表
unicode_only
2
外
unicode_only
3
字
Why would "[^[:ascii]]" match b?

The pattern is [:ascii:] and not [:ascii].

Related

Why don't we have statistics for each group? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Input:
library("UsingR")
library("dplyr")
data("kid.weights")
attach(Kid.weights)
df <- data.frame(gender[1:6],
weight[1:6],
height[1:6],
Kg=weight[1:6] * 0.453592,
M=height[1:6] * 0.0254)
df
df %>%
group_by(df$gender) %>%
summarise(mean(df$weight))
Output:
> df %>%
+ group_by(df$gender) %>%
+ summarise(mean(df$weight))
# A tibble: 2 x 2
`df$gender` `mean(df$weight)`
<fct> <dbl>
1 F 58.3
2 M 58.3
I want to make data frame for mean(weight(kg)) or median(weight(kg)) to gender.
but it is not working. looks like.
how to it solve?
Once you use %>% you don't need to reference to df anymore:
df %>%
group_by(gender) %>%
summarise(mean(weight))
%>% is a pipeline which makes you accessible to the columns directly, on each group, df$gender and df$weight would give you the whole column.

Selecting column in dataframe returns NULL [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to access a column in my dataframe using dataframe$column format. But it returns NULL. What am I doing wrong ? Please help
As you can see from the output, you don't have a column called Ozone; the column, and the only one, you have is called V1. You will have to split the data in V1 into columns. This can be done using tidyr's separate, like so:
Data:
df <- data.frame(
V1 = c("Ozone,Solar.R,Wind,Temp,Month,Day",
"41,190,7.4,67,5,1")
)
First, get your column names:
col_names <- unlist(strsplit(df$V1[1], ","))
The column names are now stored in a vector:
col_names
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
Now transform df:
library(dplyr)
library(tidyr)
df %>%
# first rename the col to be transformed:
rename("Ozone,Solar.R,Wind,Temp,Month,Day" = V1) %>%
# remove the first row, which is now redundant:
slice(2:nrow(.)) %>%
# separate into columns using the `col_names`:
separate(1, into = col_names, sep = ",")
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1

A wierd problem that group_by() doesn't work? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have a dataset with 3 factor columns and 4 numeric columns. I want to use group_by() to summarize it. But no matter how I try it doesn't work, there is no group.
freetick <- read.csv("FreeTickAll.csv", stringsAsFactors=FALSE)
library(dplyr)
group1 <- freetick %>% group_by(Habitat, Month) %>% summarize(
meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva)
)
group1
The result:
> group1
meanAd meanNy meanLa
1 0.6129032 4.258065 20.1129
And my group1 data.frame also show:
mean Ad mean Ny mean La
1 0.6129032 4.258065 20.1129
If a function is common in multiple packages and those packages are loaded into the working env, then there is a possibility of masking the function from the last loaded package. In such cases, either restart the R session with only the package of interest loaded (dplyr in this case) or specify the function to be loaded explicitly from the package of interest (dplyr::summarise)
freetick %>%
dplyr::group_by(Habitat, Month) %>%
dplyr::summarise(meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva))

Get distinct values from columns in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following kind of data in my csv file
DriveNo Date and Time Longitude
156 2014-01-31 23:00:00 41.88367183
187 2014-01-31 23:00:01 41.92854
These data have a lot of noise. Sometimes, a driver(the DriveNo is unique) is present in two different locations at the same time , which is not possible and a noise. I tried to do it using distinct(select(five,DriveNo,Date and Time))
but i get the following error
Error: unexpected symbol in "distinct(select(five,DriveNo,Date and"
However, when i try
distinct(select(five,DriveNo,Longitude))
it works.But, i need it with DriveNo and Date and Time.
you can escape with backticks, like:
df %>%
select(DriveNo, `Date and Time`, Longitude) %>%
distinct()
or using group_by, like:
df %>%
group_by(DriveNo, `Date and Time`) %>%
select(Longitude) %>%
unique()

dplyr, create a column conditional on presence or absence or text in another column [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I use dplyr. I would like to create a new column called "disease" with yes or no designation based on another column called description. If the description is NA then the value in the new column should be "N", if there is any text now in the description the value in the new column should be "Y". I tried the following code:
data%>%
mutate(disease= ifelse( is.na(Description)),"N", "Y")
There is a really simple solution using data.table
library(data.table)
setDT(data)[, disease := ifelse( is.na(cyl), "N", "Y")]
We can use base R to do this
transform(data, disease = c("Y", "N")[is.na(cyl)+1])

Resources