How to Edit a Row in JuliaDB? - julia

Is there a way to edit a field in JuliaDB? I want to modify column y where column x == 4.
using JuliaDB
t = table((x = [1,4,5,6], y = [2,2,24,5]))
I found a thing somewhere showing how to do this.. but it didn't work any more.. it used merge or push I believe..

Ok after a lot of digging and just mashing of keys... I figured it out.
using JuliaDB
t = table((x = [1,4,5,6], y = [2,2,24,5]))
Table with 4 rows, 2 columns:
x y
─────
1 2
4 2
5 24
6 5
t = transform(t, :y => (:x, :y) => row -> row.x == 4 ? 99.0 : row.y)
Table with 4 rows, 2 columns:
x y
───────
1 2
4 99.0
5 24
6 5

Related

Shifting positions of values in a single column

This is my first question, so please let me know if I made any mistakes in the ask.
I am trying to create a dataframe which has multiple columns all containing the same values in the same order, but shifted in position. Where the first value from each column is moved to the end, and everything else is shifted up.
For example, I would like to convert a data frame like this:
example = data.frame(x=c(1,2,3,4), y=c(1,2,3,4), z=c(1,2,3,4), w=c(1,2,3,4)
Which looks like this
x y z w
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
into this:
x y z w
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
In the new dataframe, the "peak" or # 4 has moved progressively up in rows.
I've seen advice on how to shift columns up and down, but just replacing the remaining values with zeroes or NA. But I don't know how to shift the column up and replace the bottom-most value with what was formerly at the top.
Thanks in advance for any help.
In base R, we can update with Map by removing the sequence of elements while appending values from the end
example[-1] <- Map(function(x, y) c(tail(x, -y),
head(x, y)), example[-1], head(seq_along(example), -1))
example
# x y z w
#1 1 2 3 4
#2 2 3 4 1
#3 3 4 1 2
#4 4 1 2 3
Or another option is embed
example[] <- embed(unlist(example), 4)[1:4, 4:1]

Generating unique ids and group ids using dplyr and concatenation

I have a problem that I suspect has arisen from a dplyr update combined with my hacky code. Given a data frame in which every row is duplicated, I want to assign each row a unique id by combining the entries of two columns with either "_" or "a_" in the middle. I also want to assign a group id by combining the entries of one column with either "" or "a". Because these formats are important for lining up with another data frame, I can't use solutions based on interact and factor that I've seen in other posts.
So I want to go from this:
Generation Identity
1 1 X
2 1 Y
3 1 Z
4 2 X
5 2 Y
6 2 Z
7 3 X
8 3 Y
9 3 Z
10 1 X
11 1 Y
12 1 Z
13 2 X
14 2 Y
15 2 Z
16 3 X
17 3 Y
18 3 Z
to this:
Generation Identity Unique_id Group_id
1 1 X 1_X X
2 1 Y 1_Y Y
3 1 Z 1_Z Z
4 2 X 2_X X
5 2 Y 2_Y Y
6 2 Z 2_Z Z
7 3 X 3_X X
8 3 Y 3_Y Y
9 3 Z 3_Z Z
10 1 X 1a_X Xa
11 1 Y 1a_Y Ya
12 1 Z 1a_Z Za
13 2 X 2a_X Xa
14 2 Y 2a_Y Ya
15 2 Z 2a_Z Za
16 3 X 3a_X Xa
17 3 Y 3a_Y Ya
18 3 Z 3a_Z Za
The minimal example below is based on code that previously worked for me and others in setting the unique id but that now causes RStudio to crash with a seg fault (Exception Type: EXC_BAD_ACCESS (SIGSEGV)). When I call a function containing this code it generates the message
Error in match(vector, df$Unique_id) : 'translateCharUTF8' must be
called on a CHARSXP
which I've read can be symptomatic of memory issues.
library(dplyr)
dff <- data.frame(Generation = rep(1:3, each = 3),
Identity = rep(LETTERS[24:26], times = 3))
dff <- rbind(dff, dff) # duplicate rows
dff <- group_by_(dff, ~Generation, ~Identity) %>%
mutate(Unique_id = c(paste0(Identity[1], "_", Generation[1]), paste0(Identity[1], "a", "_", Generation[1]))) %>%
ungroup
I think the problem is related to an update of dplyr (I'm using the latest release versions of RStudio and all packages, on OSX Sierra). In any case, my solution above is something of a hack. I'd very much appreciate suggestions for improved code, preferably using either base R or dplyr (since the code is part of a package that currently depends on dplyr).
Here is how you can approach the problem:
First find the duplicates of your data. I called my data A
dup=duplicated(A)
Then add a counter row:
A$count=1:nrow(A)
n=ncol(A)#THE COLUMN ADDED
now obtain the two columns needed and cbind it with the original dataframe:
B=data.frame(t(apply(A,1,function(x)
if(dup[as.numeric(x[n])]) c(paste0(x["Identity"],"a"),paste(x[-n],collapse="a_"))
else c(x["Identity"],paste(x[-n],collapse="_")))))
`names<-`(cbind(A[-n],B),c(names(A[-1]),"Group_ID","Unique_ID"))
Identity count Group_ID Unique_ID
1 1 X X 1_X
2 1 Y Y 1_Y
3 1 Z Z 1_Z
4 2 X X 2_X
5 2 Y Y 2_Y
6 2 Z Z 2_Z
7 3 X X 3_X
8 3 Y Y 3_Y
9 3 Z Z 3_Z
10 1 X Xa 1a_X
11 1 Y Ya 1a_Y
12 1 Z Za 1a_Z
13 2 X Xa 2a_X
14 2 Y Ya 2a_Y
15 2 Z Za 2a_Z
16 3 X Xa 3a_X
17 3 Y Ya 3a_Y
18 3 Z Za 3a_Z
Here's my amended version of Onyambu's solution, which refers to columns by name rather than number (and so can handle data frames that have additional columns):
dup <- duplicated(dff) # identify duplicates
dff$count <- 1:nrow(dff) # add count column to the dataframe
# create a new dataframe containing the unique and group ids:
B <- data.frame(t(apply(dff, 1, function(x)
if(dup[as.numeric(x["count"])]) c(paste0(x["Identity"], "a"),
paste(x["Identity"], x["Generation"], sep = "a_"))
else c(x["Identity"], paste(x["Identity"], x["Generation"], sep = "_")))))
# combine the dataframes:
colnames(B) <- c("Group_id", "Unique_id")
dff <- cbind(dff[-ncol(dff), B)

How to mark if value continuosly frozen more than or equal to 4 times

Hi i have a df as below
How to create column "rep" if values in value columns repeats >=4 tome
we can create RLE_created column with below code
df$RLE_created<-sequence(rle(as.character(df[,grep("Value",colnames(df))]))$lengths)
Value RLE_created rep
1 1
3 1 y
3 2 y
3 3 y
3 4 y
7 1
8 1
8 2
9 1 y
9 2 y
9 3 y
9 4 y
9 5 y
Thanks in advance
One option would be
library(data.table)
setDT(df)[, rep := c("", "Y")[(.N >= 4)+1], Value]
NOTE: It is better not to use function names for object names

Create a new column whose formula depends on a cell value of another row

How do I create a new column whose formula depends on a cell value of another row
x y z
1 a 1 10
2 a 2 20
3 a 3 30
4 b 1 40
This is my sample data. I want the final output to be as follows
x y z prevY
1 a 1 10 0
2 a 2 20 10
3 a 3 30 20
4 b 1 40 0
where prevY is the z value for x=current_x_val and y=current_y_val-1 0 if not available.
How do I achieve this.
My progress so far :
data[data$x == "a" & data$y==2-1,3]
I manually enter the values and get the prevY for each row. but how do i do it for all rows in a single shot ?
Or data.table solution (similar to MrFlick) but faster for a big data set
library(data.table)
setDT(dat)[, prevY := c(0, z[-length(z)]), by = x]
Here you can use the ave() function for doing group level transformations (here, a different transformation for each value of x).
dd$prevY <- with(dd, ave(z, x, FUN=function(x) head(c(0,x),-1)))
Here we take the values of z for each value of x and add a zero on the front and remove the last value. Then we assign this back to the data.frame.
This assumes that all the y values are sorted within each x group.
The result is
x y z prevY
1 a 1 10 0
2 a 2 20 10
3 a 3 30 20
4 b 1 40 0

Return row number(s) for a particular value in a column in a dataframe

I have a data frame (df) and I was wondering how to return the row number(s) for a particular value (2585) in the 4th column (height_chad1) of the same data frame?
I've tried:
row(mydata_2$height_chad1, 2585)
and I get the following error:
Error in factor(.Internal(row(dim(x))), labels = labs) :
a matrix-like object is required as argument to 'row'
Is there an equivalent line of code that works for data frames instead of matrix-like objects?
Any help would be appreciated.
Use which(mydata_2$height_chad1 == 2585)
Short example
df <- data.frame(x = c(1,1,2,3,4,5,6,3),
y = c(5,4,6,7,8,3,2,4))
df
x y
1 1 5
2 1 4
3 2 6
4 3 7
5 4 8
6 5 3
7 6 2
8 3 4
which(df$x == 3)
[1] 4 8
length(which(df$x == 3))
[1] 2
count(df, vars = "x")
x freq
1 1 2
2 2 1
3 3 2
4 4 1
5 5 1
6 6 1
df[which(df$x == 3),]
x y
4 3 7
8 3 4
As Matt Weller pointed out, you can use the length function.
The count function in plyr can be used to return the count of each unique column value.
which(df==my.val, arr.ind=TRUE)

Resources