tibble to dataframe different lengths - r

I am working with a dataset, which I firstly converted from long to wide, because I need the rows (variables) to be columns:
I used:
library(tidyr)
cfa_model<-pivot_wider(cfa_data, names_from= variable, values_from = value)
and got:
I need this data to be as a dataframe, having 48 rows and 65 columns, not every column has the same length and I don't know if this is a problem, for the case when I got less observations a NA would be just fine.
The problem when covnerting to dataframe is that I got the values as a list all 48 rows in 1, I need each column to be as a normal data frame numeric variable.
Do you guys know how to fix this?
Thanks a lot! :)

If you want to use pivot_wider, one approach is to add row numbers for each variable to make unique first:
library(tidyverse)
cfa_data %>%
group_by(variable) %>%
mutate(row = row_number()) %>%
pivot_wider(id_cols = c(variable, row), names_from = variable, values_from = value)

Related

Merging many columns in R

I have an issue with merging many columns by the same ID. I know that this is possible for two lists but I need to combine all species columns into one so I have first column as species (combined) and then w,w.1,w.2,w.3, w.4... The species columns all have the same species in them but are not in order so I can't just drop every other column as this would mean the w values aren't associated with the right species. This is an extremely large dataset of 10000 rows and 2000 columns so would need to automated. I need the w values to be associated to the corresponding species. Dataset attached.
Thank you for any help
dataset
If your data is in a frame called dt, you can use lapply() along with bind_rows() like this:
library(dplyr)
library(tidyr)
bind_rows(
lapply(seq(1,ncol(dt),2), function(x) {
dt[,c(x,x+1)] %>%
rename_with(~c("Species", "value")) %>%
mutate(w = colnames(dt)[x+1])
})
) %>%
pivot_wider(id_cols = Species, names_from = w)

Convert a column of list to dummy in R

I have a column with lists of variables.
Seperated by comma plus sometimes values for the variables set by "=".
See picture.
I want the variables as columns and within the columns TRUE/FALSE or 1/0 values plus if there is a value set by "=" an extra column for this value.
I guess it's a similar question to Pandas convert a column of list to dummies but I need it in R.
Since you haven't provided explicit data, I needed to recreate one from your screenshots (please, update at least textual data the next time, it helps recreate your task).
Those chunks of code are explained with comments, they use tidyverse functions from packages included at the top of the chunk. Result is what you asked for with the exception that columns eventnumber_value are named value_eventnumber since naming a variable or column with a name that starts with number is not a good practice.
I don't know what you need the data for, but from my experience the wide format of the data is less useful than wide format for most of the cases. Especially here, since I expect, that one event may happen only for one ID. Thus, dat_pivoted is more convenient to operate on.
library(tibble)
library(tidyr)
library(dplyr)
library(stringr)
dat <- tribble(
~post_event_list, ~date_time,
"239=20.00,200,20149,100,101,102,103,104,105,106,107,108,114,198", "2022-03-01 00:23:50",
"257,159", "2022-03-01 00:02:51",
"201,109,110,111,112", "2022-03-01 00:57:23"
)
dat_pivoted <- dat %>%
mutate(post_event_list = str_split(post_event_list, ",")) %>% # transform comma separated strings into character vectors
unnest_longer(post_event_list) %>% # split characters into separate rows
separate(post_event_list, sep = "=", into = c("var", "val"), fill = "right") %>% # separate variables from values (case of 'X=Y'), put NA as value if there is no value
mutate(val = as.numeric(val)) # treat 'val' column as numeric
dat_values <- dat_pivoted %>%
pivot_wider(id_cols = date_time, names_from = var, names_prefix = "value_", values_from = val) %>% # turn data into wide format -- make a column per each event value, present or not
select(!where(~ all(is.na(.x)))) # select only those values columns, where not every element is NA
dat_indicator <- dat_pivoted %>%
mutate(val = TRUE) %>% # each row indicates a presence of event -- change all values to TRUE
pivot_wider(id_cols = date_time, names_from = var, values_from = val, values_fill = FALSE) # pivot columns again, replacing resulting NAs witth FALSE
dat_transformed <- left_join(dat_indicator, dat_values)

2 Numeric Values In A Dataframe Field In R

I have a dataset in R with a little under 100 columns.
Some of the columns have numeric values such as 87+3 as oppose to 90.
I have been able to update each column with the following piece of code:
library(dplyr)
new_dataframe = dataframe %>%
rowwise() %>%
mutate(new_value = eval(parse(text = value)))
However, I would like to be able to update a list of 60 columns in a more efficient way than simply repeating this line for each column.
Can someone help me find a more efficient way?
We can use mutate_at
library(dplyr)
dataframe %>%
rowwise() %>%
mutate_at(1:60, list(new_value = ~eval(parse(text = .))))

Combine values with column names as new column

I like to combine my values with column names(See current set and required set):
Current set =
- ncol = 9
- nrow = 26814
I want to add the values from SheetNaam to the columns XYEAR to expand my columns and decrease my rows, without losing data or 'NA'. Is this possible in R?
Difficult to explain by text, hope someone will understand my explanation.
We can try with gather and spread. gather the columns that starts_with 'X' followed by numbers, unite the 'SheetNaam', 'key' into a single column and do spread back to 'wide' format
library(tidyverse)
gather(df1, key, val, matches("^X\\d+$"), na.rm = TRUE) %>%
unite(SheetNaam, SheetNaam, key, sep = "") %>%
group_by(SheetNaam) %>%
mutate(rn = row_number()) %>%
spread(SheetNaam, val)

Normalize specified columns in dplyr by value in first row

I have a data frame with four rows, 23 numeric columns and one text column. I'm trying to normalize all the numeric columns by subtracting the value in the first row.
I've tried getting it to work with mutate_at, but I couldn't figure out a good way to get it to work.
I got it to work by converting to a matrix and converting back to a tibble:
## First, did some preprocessing to get out the group I want
totalNKFoldChange <- filter(signalingFrame,
Population == "Total NK") %>% ungroup
totalNKFoldChange_mat <- select(totalNKFoldChange, signalingCols) %>%
as.matrix()
normedNKFoldChange <- sweep(totalNKFoldChange_mat,
2, totalNKFoldChange_mat[1,])
normedNKFoldChange %<>% cbind(Timepoint =
levels(totalNKFoldChange$Timepoint)) %>%
as.tibble %>%
mutate(Timepoint = factor(Timepoint,
levels = levels(totalNKFoldChange$Timepoint)))
I'm so certain there's a nicer way to do it that would be fully dplyr native. Anyone have tips? Thank you!!
If we want to normalize all the numeric columns by subtracting the value in the first row, use mutate_if
library(dplyr)
df1 %>%
mutate_if(is.numeric, list(~ .- first(.)))

Resources