Is there a quick way to replace column values in R? - r

Suppose we have a data frame containing numeric values which looks like:
Temperature Height
32 157
31 159
33 139
I want to replace Height values with pic_00001, pic_00002 etc. so that the end result is:
Temperature Height
32 pic_00001
31 pic_00002
33 pic_00003
There are 10,000+ rows in the full data frame, hence I need a quicker way than doing this manually.

You can use sprintf:
# create the example used by the OP
dat <- data.frame(Temperature = 31:33,
Height = c(157, 159, 139))
# use sprintf along with seq_len
dat$Height <- sprintf("pic_%05d", seq_len(NROW(dat)))
# show the result
dat
#R> Temperature Height
#R> 1 31 pic_00001
#R> 2 32 pic_00002
#R> 3 33 pic_00003
You can change the 05d if you want more leading zeros. E.g. 07d will give a seven digit sequence. The manual page for sprintf have further details.

You can do:
id <- seq_len(nrow(data))
new_values <- paste("pic_",id,sep = "")
data$Height <- new_values

to achieve final output (from original by monjeanjean, i cant comment yet lol):
id <- seq_len(nrow(data))
new_values <- paste("pic_",formatC(x,width=5,flag="0",format="fg"),sep = "")
data$Height <- new_values

You can also use the following solution:
library(dplyr)
library(stringr)
df %>%
mutate(across(Height, ~ str_c("pic_", str_pad(row_number(), 5, "left", "0"))))
Temperature Height
1 32 pic_00001
2 31 pic_00002
3 33 pic_00003

Related

Merge named vectors in different sizes into data frame

I have some different named vectors, and I want to combine them into one date frame that sums the actions.
adjust balance drive idle other pick putdown replace sort wait
4 9 16 82 4 350 61 16 26 18
walk
14
adjust balance drive idle pick putdown replace sort unload walk
1 42 14 47 385 118 4 83 19 7
i want it to be this way:
adjust balance drive
5 51 30
and etc..
i find it very challenging because those are named vectors
would be grateful for your help, thank you!
We can use aggregate + stack like below
aggregate(. ~ ind, rbind(stack(vec1), stack(vec2)), sum)
You could convert to a data.frame and use the dplyr package to group by the names and sum the numbers together.
library(dplyr)
vec <- c(4, 9, 16, 1, 42, 14)
names(vec) <- c("adjust", "balance", "drive", "adjust", "balance", "drive")
data.frame(values = vec, name = names(vec)) %>% group_by(name) %>% summarise(values = sum(values))
If we want to add all elements that match between the two vectors:
# Resolve the matching names of the vectors:
# vec_nm_order => character vector
vec_nm_order <- intersect(
names(vec1),
names(vec2)
)
# Add the related scalars together:
# named integer vector => stdout(console)
vec1[vec_nm_order] + vec2[vec_nm_order]
If we only want to add values for adjust, balance, drive:
# Choose the names (keys) of elements we want to add together:
# vec_nm_order => character vector
vec_nm_order <- c(
"adjust", "balance", "drive"
)
# Add the related scalars together:
# named integer vector => stdout(console)
vec1[vec_nm_order] + vec2[vec_nm_order]

How do I Compute Binary String Permutations in R?

I have a binary string like this:
0010000
I'd like to have all these permutations:
1010000
0110000
0011000
0010100
0010010
0010001
Is there anybody know which function in R could give me these results?
R has functions for bitwise operations, so we can get the desired numbers with bitwOr:
bitwOr(16, 2^(6:0))
#> [1] 80 48 16 24 20 18 17
...or if we want to exclude the original,
setdiff(bitwOr(16, 2^(6:0)), 16)
#> [1] 80 48 24 20 18 17
However, it only works in decimal, not binary. That's ok, though; we can build some conversion functions:
bin_to_int <- function(bin){
vapply(strsplit(bin, ''),
function(x){sum(as.integer(x) * 2 ^ seq(length(x) - 1, 0))},
numeric(1))
}
int_to_bin <- function(int, bits = 32){
vapply(int,
function(x){paste(as.integer(rev(head(intToBits(x), bits))), collapse = '')},
character(1))
}
Now:
input <- bin_to_int('0010000')
output <- setdiff(bitwOr(input, 2^(6:0)),
input)
output
#> [1] 80 48 24 20 18 17
int_to_bin(output, bits = 7)
#> [1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"
library(stringr)
bin <- '0010000'
ones <- str_locate_all(bin, '1')[[1]][,1]
zeros <- (1:str_length(bin))[-ones]
sapply(zeros, function(x){
str_sub(bin, x, x) <- '1'
bin
})
[1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"
We assume that the problem is to successively replace each 0 in the input with a 1 for an input string of 0's and 1's.
Replace each character successively with a "1", regardless of its value and then remove any components of the result equal to the input. No packages are used.
input <- "0010000"
setdiff(sapply(1:nchar(input), function(i) `substr<-`(input, i, i, "1")), input)
## [1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"
Update: Have completely revised answer.

Remove a character from elements in a dataframe

I have a set of data where some elements are preceded by "<" and I need to remove "<" so that I can perform some data analysis. The data is saved in a .txt file and I'm bringing it into R using read.table. Below is an example of what the text file looks like.
Background: 18 <10 27 22 <3
Site: 30 44 23 <16 13
I used x=read.file to make a dataframe, then tried gsub("<","",x) to remove the "<" and the result is something completely unexpected, at least to me. This is what I get as a result.
[1] "1:2" "c(18, 30)" "1:2" "c(27, 23)" "c(2, 1)" "1:2"
I have no idea what that means or why it's happening. I would greatly appreciate explanation both of what is going on here, and how I should go about accomplishing my goal.
df <- read.table(header = TRUE, text = "Background Site
18 30
<10 44
27 23
22 <16
<3 13", stringsAsFactors = FALSE)
You can use mutate_at and apply the gsub function to the variables (i.e. Background and Site) which you wish to remove the preceding < sign.
library(dplyr)
df %>% mutate_at(vars(Background, Site),
funs(as.numeric(gsub("^<", "", .))))
The output is:
Background Site
1 18 30
2 10 44
3 27 23
4 22 16
5 3 13
Read the file with readLines, perform the gsub and then re-read it with read.table. No packages are used:
read.table(text = gsub("<", "", readLines("myfile")), as.is = TRUE)
If the data does not come from a file but is already in a data frame DF then define a clean function which cleans a column of DF and apply it to each numeric column:
clean <- function(x) as.numeric(gsub(">", "", x))
DF[-1] <- lapply(DF[-1], clean)

Problems subsetting columns based on values from two separate dataframes

I am using data obtained from a spatially gridded system, for example a city divided up into equally spaced squares (e.g. 250m2 cells). Each cell possesses a unique column and row number with corresponding numerical information about the area contained within this 250m2 square (say temperature for each cell across an entire city). Within the entire gridded section (or the example city), I have various study sites and I know where they are located (i.e. which cell row and column each site is located within). I have a dataframe containing information on all cells within the city, but I want to subset this to only contain information from the cells where my study sites are located. I previously asked a question on this 'Matching information from different dataframes and filtering out redundant columns'. Here is some example code again:
###Dataframe showing cell values for my own study sites
Site <- as.data.frame(c("Site.A","Site.B","Site.C"))
Row <- as.data.frame(c(1,2,3))
Column <- as.data.frame(c(5,4,3))
df1 <- cbind(Site,Row, Column)
colnames(df1) <- c("Site","Row","Column")
###Dataframe showing information from ALL cells
eg1 <- rbind(c(1,2,3,4,5),c(5,4,3,2,1)) ##Cell rows and columns
eg2 <- as.data.frame(matrix(sample(0:50, 15*10, replace=TRUE), ncol=5)) ##Numerical information
df2 <- rbind(eg1,eg2)
rownames(df2)[1:2] <- c("Row","Column")
From this, I used the answer from the previous questions which worked perfectly for the example data.
output <- df2[, (df2['Row', ] %in% df1$Row) & (df2['Column', ] %in% df1$Column)]
names(output) <- df1$Site[mapply(function(r, c){which(r == df1$Row & c == df1$Column)}, output[1,], output[2,])]
However, I cannot apply this to my own data and cannot figure out why.
EDIT: Initially, I thought there was a problem with naming the columns (i.e. the 'names' function). But it would appear there may be an issue with the 'output' line of code, whereby columns are being included from df2 that shouldn't be (i.e. the output contained columns from df2 which possessed column and row numbers not specified within df1).
I have also tried:
output <- df2[, (df2['Row', ] == df1$Row) & (df2['Column', ] == df1$Column)]
But when using my own (seemingly comparable) data, I don't get information from all cells specified in the 'df1' equivalent (although again works fine in the example data above). I can get my own data to work if I do each study site individually.
SiteA <- df2[, which(df2['Row', ] == 1) & (df2['Column', ] == 5)]
SiteB <- df2[, which(df2['Row', ] == 2) & (df2['Column', ] == 4)]
SiteC <- df2[, which(df2['Row', ] == 3) & (df2['Column', ] == 3)]
But I have 1000s of sites and was hoping for a more succinct way. I am sure that I have maintained the same structure, double checked spellings and variable names. Would anyone be able to shed any light on potential things which I could be doing wrong? Or failing this an alternative method?
Apologies for not providing an example code for the actual problem (I wish I could pinpoint what the specific problem is, but until then the original example is the best I can do)! Thank you.
The only apparent issue I can see is that mapply is not wrapped around unlist. mapply returns a list, which is not what you're after for subsetting purposes. So, try:
output <- df2[, (df2['Row', ] %in% df1$Row) & (df2['Column', ] %in% df1$Column)]
names(output) <- df1$Site[unlist(mapply(function(r, c){which(r == df1$Row & c == df1$Column)}, output[1,], output[2,]))]
Edit:
If the goal is to grab columns whose first 2 rows match the 2nd and 3rd elements of a given row in df1, you can try the following:
output_df <- Filter(function(x) !all(is.na(x)), data.frame(do.call(cbind,apply(df2, 2, function(x) {
##Create a condition vector for an if-statement or for subsetting
condition <- paste0(x[1:2], collapse = "") == apply(df1[,c('Row','Column')], 1, function(y) {
paste0(y,collapse = "")
})
##Return a column if it meets the condition (first 2 rows are matched in df1)
if(sum(condition) != 0) {
tempdf <- data.frame(x)
names(tempdf) <- df1[condition,]$Site[1]
tempdf
} else {
##If they are not matched, then return an empty column
data.frame(rep(NA,nrow(df2)))
}
}))))
It is quite a condensed piece of code, so I hope the following explanation will help clarify some things:
This basically goes through every column in df2 (with apply(df2, 2, FUN)) and checks if its first 2 rows can be found in the 2nd and 3rd elements of every row in df1. If the condition is met, then it returns that column in a data.frame format with its column name being the value of Site in the matching row in df1; otherwise an empty column (with NA's) is returned. These columns are then bound together with do.call and cbind, and then coerced into a data.frame. Finally, we use the Filter function to remove columns whose values are NA's.
All that should give the following:
Site.A Site.B Site.C
1 2 3
5 4 3
40 42 33
13 47 25
23 0 34
2 41 17
10 29 38
43 27 8
31 1 25
31 40 31
34 12 43
43 30 46
46 49 25
45 7 17
2 13 38
28 12 12
16 19 15
39 28 30
41 24 30
10 20 42
11 4 8
33 40 41
34 26 48
2 29 13
38 0 27
38 34 13
30 29 28
47 2 49
22 10 49
45 37 30
29 31 4
25 24 31
I hope this helps.

R : Extract a Specific Number out of a String

I have a vector as below
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
here i want to extract the first number before the "X" for each of the elements.
In case of situations with 2 "X" i.e. "6X2X75CL" the number 12 (6 multiplied by 2) should be calculated.
expected output
6, 24, 12, 168
Thank you for the help...
Here's a possible solution using regular expressions :
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
# this regular expression finds any group of digits followed
# by a upper-case 'X' in each string and returns a list of the matches
tokens <- regmatches(data,gregexpr('[[:digit:]]+(?=X)',data,perl=TRUE))
res <- sapply(tokens,function(x)prod(as.numeric(x)))
> res
[1] 6 24 12 168
Here is a method using base R:
dataList <- strsplit(data, split="X")
sapply(dataList, function(x) Reduce("*", as.numeric(head(x, -1))))
[1] 6 24 12 168
strplit breaks up the vector along "X". The resulting list is fed to sapply which the performs an operation on all but the final element of each vector in the list. The operation is to transform the elements into numerics and the multiply them. The final element is dropped using head(x, -1).
As #zheyuan-li comments, prod can fill in for Reduce and will probably be a bit faster:
sapply(dataList, function(x) prod(as.numeric(head(x, -1))))
[1] 6 24 12 168
We can also use str_extract_all
library(stringr)
sapply(str_extract_all(data, "\\d+(?=X)"), function(x) prod(as.numeric(x)))
#[1] 6 24 12 168
ind=regexpr("X",data)
val=as.integer(substr(data, 1, ind-1))
data2=substring(data,ind+1)
ind2=regexpr("[0-9]+X", data2)
if (!all(ind2!=1)) {
val2 = as.integer(substr(data2[ind2==1], 1, attr(ind2,"match.length")[ind2==1]-1))
val[ind2==1] = val[ind2==1] * val2
}

Resources