I like to combine my values with column names(See current set and required set):
Current set =
- ncol = 9
- nrow = 26814
I want to add the values from SheetNaam to the columns XYEAR to expand my columns and decrease my rows, without losing data or 'NA'. Is this possible in R?
Difficult to explain by text, hope someone will understand my explanation.
We can try with gather and spread. gather the columns that starts_with 'X' followed by numbers, unite the 'SheetNaam', 'key' into a single column and do spread back to 'wide' format
library(tidyverse)
gather(df1, key, val, matches("^X\\d+$"), na.rm = TRUE) %>%
unite(SheetNaam, SheetNaam, key, sep = "") %>%
group_by(SheetNaam) %>%
mutate(rn = row_number()) %>%
spread(SheetNaam, val)
Related
I have a column with lists of variables.
Seperated by comma plus sometimes values for the variables set by "=".
See picture.
I want the variables as columns and within the columns TRUE/FALSE or 1/0 values plus if there is a value set by "=" an extra column for this value.
I guess it's a similar question to Pandas convert a column of list to dummies but I need it in R.
Since you haven't provided explicit data, I needed to recreate one from your screenshots (please, update at least textual data the next time, it helps recreate your task).
Those chunks of code are explained with comments, they use tidyverse functions from packages included at the top of the chunk. Result is what you asked for with the exception that columns eventnumber_value are named value_eventnumber since naming a variable or column with a name that starts with number is not a good practice.
I don't know what you need the data for, but from my experience the wide format of the data is less useful than wide format for most of the cases. Especially here, since I expect, that one event may happen only for one ID. Thus, dat_pivoted is more convenient to operate on.
library(tibble)
library(tidyr)
library(dplyr)
library(stringr)
dat <- tribble(
~post_event_list, ~date_time,
"239=20.00,200,20149,100,101,102,103,104,105,106,107,108,114,198", "2022-03-01 00:23:50",
"257,159", "2022-03-01 00:02:51",
"201,109,110,111,112", "2022-03-01 00:57:23"
)
dat_pivoted <- dat %>%
mutate(post_event_list = str_split(post_event_list, ",")) %>% # transform comma separated strings into character vectors
unnest_longer(post_event_list) %>% # split characters into separate rows
separate(post_event_list, sep = "=", into = c("var", "val"), fill = "right") %>% # separate variables from values (case of 'X=Y'), put NA as value if there is no value
mutate(val = as.numeric(val)) # treat 'val' column as numeric
dat_values <- dat_pivoted %>%
pivot_wider(id_cols = date_time, names_from = var, names_prefix = "value_", values_from = val) %>% # turn data into wide format -- make a column per each event value, present or not
select(!where(~ all(is.na(.x)))) # select only those values columns, where not every element is NA
dat_indicator <- dat_pivoted %>%
mutate(val = TRUE) %>% # each row indicates a presence of event -- change all values to TRUE
pivot_wider(id_cols = date_time, names_from = var, values_from = val, values_fill = FALSE) # pivot columns again, replacing resulting NAs witth FALSE
dat_transformed <- left_join(dat_indicator, dat_values)
I am working with a dataset, which I firstly converted from long to wide, because I need the rows (variables) to be columns:
I used:
library(tidyr)
cfa_model<-pivot_wider(cfa_data, names_from= variable, values_from = value)
and got:
I need this data to be as a dataframe, having 48 rows and 65 columns, not every column has the same length and I don't know if this is a problem, for the case when I got less observations a NA would be just fine.
The problem when covnerting to dataframe is that I got the values as a list all 48 rows in 1, I need each column to be as a normal data frame numeric variable.
Do you guys know how to fix this?
Thanks a lot! :)
If you want to use pivot_wider, one approach is to add row numbers for each variable to make unique first:
library(tidyverse)
cfa_data %>%
group_by(variable) %>%
mutate(row = row_number()) %>%
pivot_wider(id_cols = c(variable, row), names_from = variable, values_from = value)
I have a data.frame that looks like this:
UID<-c(rep(1:25, 2), rep(26:50, 2))
Group<-c(rep(5, 25), rep(20, 25), rep(-18, 25), rep(-80, 25))
Value<-sample(100:5000, 100, replace=TRUE)
df<-data.frame(UID, Group, Value)
But I need the values separated into new rows so I run this:
df<-pivot_wider(df, names_from = Group,
values_from = Value,
values_fill = list(Value = 0))
Which introduces NULL into the dataset. Sorry, could not figure out a way to get an example dataset with NULL values. Note: this is now a tbl_df tbl data.frame
These aren't great variable names so I run this:
colnames(df)[which(names(df) == "20")] <- "pos20"
colnames(df)[which(names(df) == "5")] <- "pos5"
colnames(df)[which(names(df) == "-18")] <- "neg18"
colnames(df)[which(names(df) == "-80")] <- "neg80"
What I want to be able to do is create a new column (variable) that rowSums across columns. So I run this:
df<-df%>%
replace(is.na(.), 0) %>%
mutate(rowTot = rowSums(.[2:5]))
Which of course works on the example dataset but not on the one with NULL values. I have tried converting NULL to NA using df[df== "NULL"] <- NA but the values do not change. I have tried converting the lists to numeric using as.numeric(as.character(unlist(df[[2]]))) but I get an error telling me I have unequal number of rows, which I guess would be expected.
I realize there might be a better process to get my desired end result, so any suggestions to any of this is most appreciated.
EDIT: Here is a link to the actual dataset which will introduce Null values after using pivot_wider. https://drive.google.com/file/d/1YGh-Vjmpmpo8_sFAtGedxzfCiTpYnKZ3/view?usp=sharing
Difficult to answer with confidence without an actual reproducible example where the error occurs but I am going to take a guess.
I think your pivot_wider steps produces list columns (meaning some values are vectors) and that is why you are getting NULL values. Create a unique row for each Group and then use pivot_wider. Also rowSums has na.rm parameter so you don't need replace.
library(dplyr)
df %>%
group_by(temp) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = temp, values_from = numseeds) %>%
mutate(rowTot = rowSums(.[3:6], na.rm = TRUE))
Please change the column numbers according to your data in rowSums if needed.
I have a dataset in R with a little under 100 columns.
Some of the columns have numeric values such as 87+3 as oppose to 90.
I have been able to update each column with the following piece of code:
library(dplyr)
new_dataframe = dataframe %>%
rowwise() %>%
mutate(new_value = eval(parse(text = value)))
However, I would like to be able to update a list of 60 columns in a more efficient way than simply repeating this line for each column.
Can someone help me find a more efficient way?
We can use mutate_at
library(dplyr)
dataframe %>%
rowwise() %>%
mutate_at(1:60, list(new_value = ~eval(parse(text = .))))
I have a data frame with four rows, 23 numeric columns and one text column. I'm trying to normalize all the numeric columns by subtracting the value in the first row.
I've tried getting it to work with mutate_at, but I couldn't figure out a good way to get it to work.
I got it to work by converting to a matrix and converting back to a tibble:
## First, did some preprocessing to get out the group I want
totalNKFoldChange <- filter(signalingFrame,
Population == "Total NK") %>% ungroup
totalNKFoldChange_mat <- select(totalNKFoldChange, signalingCols) %>%
as.matrix()
normedNKFoldChange <- sweep(totalNKFoldChange_mat,
2, totalNKFoldChange_mat[1,])
normedNKFoldChange %<>% cbind(Timepoint =
levels(totalNKFoldChange$Timepoint)) %>%
as.tibble %>%
mutate(Timepoint = factor(Timepoint,
levels = levels(totalNKFoldChange$Timepoint)))
I'm so certain there's a nicer way to do it that would be fully dplyr native. Anyone have tips? Thank you!!
If we want to normalize all the numeric columns by subtracting the value in the first row, use mutate_if
library(dplyr)
df1 %>%
mutate_if(is.numeric, list(~ .- first(.)))