Generate list of strings - python-3.6

Here's what is inside my CSV file
Symbol
0 AACAF
1 AACAY
2 AACTF
3 AAGC
4 AAGIY
5 AAIGF
6 AAMAF
7 AAPH
8 AAPT
9 AAST
10 AATDF
11 AATGF
12 AATRL
13 AAUKF
14 AAWC
15 ABBY
16 ABCAF
17 ABCCF
18 ABCE
19 ABCFF
20 ABCZF
21 ABCZY
22 ABEPF
23 ABHD
24 ABHI
25 ABLT
26 ABLYF
27 ABNAF
28 ABNK
29 ABNRY
I would like to build a function which could create strings by batch of three symbols, e.g.
'AACAF,AACAY,AACTF'
'AAGC,AAGIY,AAIGF'
'AAMAF,AAPH,AAPT'
'AAST,AATDF,AATGF'
'AATRL,AAUKF,AAWC'
'AABY,ABCAF,ABCCF'
'ABCE,ABCFF,ABCZF'
'ABCZY,ABEPF,ABHD'
'ABHI,ABLT,ABLYF'
'ABNAF,ABNK,ABNRY'
I started what I want in using python, but I don't know how to complete it. I think I could use the csv module to do that.
with open(path, 'r') as csvfile:
rows=[row for row in csvfile]
batch_size = 100
listing = []
string = ''
count = 0
for index, row in enumerate(rows):
if count >= batch_size:
listing.append(string)
string = ''
count = 0
','.join((string,row))
count += 1
How could I do that with python 3.6?

arr = pandas.read_csv(path).Symbol.values
symbol_groups = numpy.split(arr, len(arr) // 3)
result = [','.join(symbols) for symbols in symbol_groups]
Should be doing what you're looking for.

with open(path, 'r') as csvfile:
rows=[row.strip('\n') for row in csvfile]
batch_size = 100
listing = []
string = ''
count = 0
for index, row in enumerate(rows[1:]):
if count >= batch_size or index == len(rows[1:])-1:
listing.append(string)
string = ''
count = 0
if count == 0:
string = ''.join((string,row))
else:
string = ','.join((string,row))
count += 1

Related

In R, How to write from a list to file with a set amount of elements on each line?

Let's say I have a list of 23 elements.
ls <- list(1:23)
Which I want to write to a file which has 5 elements on each line, separated by a tab until not possible anymore:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23
How would I go about doing this? I don't see any options in write.lines or write.table.
The code by #akrun works best:
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n', file = 'file1.txt')
With a minor error for decimal values, as the resulting file1.txt looks like this:
0.0005862 0.0005983 0.0006225 0.0006637 0
.0006622 0.0006197 0.000599 0.0005983 0
.0006247 0.0006707 0.0006641 0.0006253 0
.0006087 0.0006234 0.0006807 0.0007485 0
.0007546 0.0007 0.000643 0.0006183 0
.0006264 0.0006819 0.000697 0.0006453 0
It can be done with cat and gsub. unlist the list, paste them into a single string, insert nextline (\n) at every block of 'n' digits with spaces, and use cat to write into console
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n')
#1 2 3 4 5
#6 7 8 9 10
#11 12 13 14 15
#16 17 18 19 20
#21 22 23
or write to a file
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n', file = 'file1.txt')
If it is a complex data with scientific notation etc. we could split into a list and then append NA at the end for those elements with less number of elements
v1 <- unlist(ls)
lst1 <- split(v1, (seq_along(v1)-1) %/% 4 + 1)
mat1 <- do.call(rbind, lapply(lst1, `length<-`, max(lengths(lst1))))
write(mat1, 'file2.txt')
You first need to define the chunks, I used BBmisc which have chunk function to obtain chunks of N elementes (five in your case).
Then you can use write.table witch have the append option.
library(BBmisc)
x <-list(1:20)
n<-5
splited<-chunk(x[[1]],n)
for(i in 1:length(splited)){
x=splited[[i]]
line=paste(x,collapse = "\t")
write.table(line, file = "output.txt", sep = "\t",
row.names = FALSE, col.names = FALSE, quote = FALSE, append = T)
}
Regards

Wide to long with many different columns

I have used pivot_longer before but this time I have a much more complex wide dataframe and I cannot sort it out. The example code will provide you a reproducible dataframe. I haven't dealt with such thing before so I'm not sure it's correct to try to format this type of df in long format?
df <- data.frame(
ID = as.numeric(c("7","8","10","11","13","15","16")),
AGE = as.character(c("45 – 54","25 – 34","25 – 34","25 – 34","25 – 34","18 – 24","35 – 44")),
GENDER = as.character(c("Female","Female","Male","Female","Other","Male","Female")),
SD = as.numeric(c("3","0","0","0","3","2","0")),
GAMING = as.numeric(c("0","0","0","0","2","2","0")),
HW = as.numeric(c("2","2","0","2","2","2","2")),
R1_1 = as.numeric(c("10","34","69","53","79","55","28")),
M1_1 = as.numeric(c("65","32","64","53","87","55","27")),
P1_1 = as.numeric(c("65","38","67","54","88","44","26")),
R1_2 = as.numeric(c("15","57","37","54","75","91","37")),
M1_2 = as.numeric(c("90","26","42","56","74","90","37")),
P1_2 = as.numeric(c("90","44","33","54","79","95","37")),
R1_3 = as.numeric(c("5","47","80","27","61","19","57")),
M1_3 = as.numeric(c("30","71","80","34","71","15","57")),
P1_3 = as.numeric(c("30","36","81","35","62","8","56")),
R2_1 = as.numeric(c("10","39","75","31","71","80","59")),
M2_1 = as.numeric(c("90","51","74","15","70","75","61")),
P2_1 = as.numeric(c("90","52","35","34","69","83","60")),
R2_2 = as.numeric(c("10","45","31","54","39","95","77")),
M2_2 = as.numeric(c("60","70","40","78","5","97","75")),
P2_2 = as.numeric(c("60","40","41","58","9","97","76")),
R2_3 = as.numeric(c("5","38","78","45","25","16","22")),
M2_3 = as.numeric(c("30","34","84","62","33","52","20")),
P2_3 = as.numeric(c("30","34","82","45","32","16","22")),
R3_1 = as.numeric(c("10","40","41","42","62","89","41")),
M3_1 = as.numeric(c("90","67","37","40","27","89","42")),
P3_1 = as.numeric(c("90","34","51","44","38","84","43")),
R3_2 = as.numeric(c("10","37","20","54","8","93","69")),
M3_2 = as.numeric(c("60","38","21","62","5","95","71")),
P3_2 = as.numeric(c("60","38","23","65","14","92","69")),
R3_3 = as.numeric(c("5","30","62","11","60","32","52")),
M3_3 = as.numeric(c("30","67","34","55","45","25","45")),
P3_3 = as.numeric(c("30","28","41","24","53","23","52")),
R1_4 = as.numeric(c("10","40","61","17","39","72","25")),
M1_4 = as.numeric(c("45","20","63","25","62","70","23")),
P1_4 = as.numeric(c("45","52","56","16","26","72","27")),
R2_4 = as.numeric(c("5","21","70","33","80","68","30")),
M2_4 = as.numeric(c("35","21","69","27","85","69","23")),
P2_4 = as.numeric(c("35","32","34","25","79","63","29")),
R3_4 = as.numeric(c("10","29","68","21","8","71","41")),
M3_4 = as.numeric(c("50","37","66","28","33","65","41")),
P3_4 = as.numeric(c("50","38","47","28","24","71","41"))
)
I would like to sort it out like in the following table
the new column names are extracted from the old ones such that (example) in R1_1:
R is the namer of the column containing the value previously stored
in R1_1
1 (the first character after 'R' in R1_1) is the value used
in column Speed
1 (last character of 'R1_1') is the value used in
column Sound
basically each row corresponds to 1 question answered by 1 person, and each question was answered through 3 different ratings (R, M, P)
thank you!
If I understood you correctly, the following should work:
df %>%
pivot_longer(
cols = matches('[RMP]\\d_\\d'),
names_to = c('RMP', 'Speed', 'Sound'),
values_to = 'Data',
names_pattern = '([RMP])(\\d)_(\\d)'
) %>%
pivot_wider(names_from = RMP, values_from = Data)
This assumes that both “speed” and “sound” are single-digit values. If there’s the possibility of multiple digits, the occurrences of \\d in the patterns above need to be replaced by \\d+.
Solution using our good ol' workhorse reshape. At first we grep the names with a "Wd_d" pattern, as well as their suffixes "d_d" for following use in reshape.
nm <- names(df[grep("_\\d", names(df))])
times <- unique(substr(nm, 2, 4))
res <- reshape(df, idvar="ID", varying=7:42, v.names=unique(substr(nm, 1, 1)),
times=times,direction="long")
Getting us close to the result, we just need to strsplit the newly created "time" variable at the "_" and rbind it to the former.
res <- cbind(res, setNames(type.convert(do.call(rbind.data.frame,
strsplit(res$time, "_"))),
c("Speed", "Sound")))
res <- res[order(res$AGE), ] ## some ordering
Result
head(res)
# ID AGE GENDER SD GAMING HW time R M P Speed Sound
# 15.1_1 15 18 – 24 Male 2 2 2 1_1 55 44 55 1 1
# 15.1_2 15 18 – 24 Male 2 2 2 1_2 90 95 91 1 2
# 15.1_3 15 18 – 24 Male 2 2 2 1_3 15 8 19 1 3
# 15.2_1 15 18 – 24 Male 2 2 2 2_1 75 83 80 2 1
# 15.2_2 15 18 – 24 Male 2 2 2 2_2 97 97 95 2 2
# 15.2_3 15 18 – 24 Male 2 2 2 2_3 52 16 16 2 3

Constructing numeric flag with switch command

I have a data frame with a variable ind_percentiles, which assumes values from 1 to 100. I want to create a numeric categorical variable—ind_Q—which takes the values of:
1 if ind_percentiles < 25
2 if ind_percentiles 26 - 50
3 if ind_percentiles 51 - 75
4 if ind_percentiles 76 - 100
I want to accomplish this using a switch statement rather than an if: else.
Data$IMD_Q = 0
switch(Data$Index_Q,
1 = {Data$ind_percentiles <= 25},
2 = {Data$ind_percentiles > 25},
3 = {Data$ind_percentiles > 50},
4 = {Data$ind_percentiles > 75})
Is this possible? How do I achieve this?

How to write a single text file from two different data frame-one has name and second has values using R

I have two data Frame. I want to create a single text file. it is an input file for hydrological model Thanks for the help. I tried to write the file code format but it is not accepting. It is also a single column with equal sign. I think given required output file is explaining well.
First data frame is as
> lfz
readLines("G:/Rlearning/wrds.txt")
1 HYDRUS_Version =
2 WaterFlow =
3 SoluteTransport =
4 Unsatchem =
5 Unsatchem =
6 HP1 =
7 HeatTransport =
8 EquilibriumAdsorption =
9 MobileImmobile =
10 RootWaterUptake =
11 RootGrowth =
12 MaterialNumbers =
13 SubregionNumbers =
14 SpaceUnit =
15 TimeUnit =
16 PrintTimes =
17 NumberOfSolutes =
18 InitialCondition =
19 NumberOfNodes =
20 ProfileDepth =
21 ObservationNodes =
22 GridVisible =
23 SnapToGrid =
24 ProfileWidth =
25 LeftMargin =
26 GridOrgX =
27 GridOrgY =
28 GridDX =
29 GridDY =
second data frame is
here 3 represent the row number from a back file that i created and I get the row 3 from it and convert it into column. here values are different. question is simple. I want to write the values after equal sign and want to a text file.
> C1
3
1 4
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 cm
15 days
16 160
17 0
18 1
19 101
20 120
21 160
22 160
23 160
24 160
25 160
26 160
27 160
28 160
29 160
Required output text file is as
HYDRUS_Version=4
WaterFlow=3
SoluteTransport=0
Unsatchem=0
Unsatchem=0
HP1=2
HeatTransport=0
EquilibriumAdsorption=1
MobileImmobile=0
RootWaterUptake=1
RootGrowth=0
MaterialNumbers=1
SubregionNumbers=1
SpaceUnit=cm
TimeUnit=days
PrintTimes=180
NumberOfSolutes=0
InitialCondition=1
NumberOfNodes=101
ProfileDepth=1.2E+02
ObservationNodes=5
GridVisible=1
SnapToGrid=1
ProfileWidth=80
LeftMargin=40
GridOrgX=0
GridOrgY=0
You just have to paste the strings together, remove the white space and then write it to a file:
outVec <- gsub("\\s*", "", paste(lfz[,1], C1[,1]))
writeLines(outVec, "outfile.txt")
Your data should then look like this:
HYDRUS_Version=4
WaterFlow=0
SoluteTransport=0
Unsatchem=0
Unsatchem=0
HP1=0
HeatTransport=0
EquilibriumAdsorption=0
MobileImmobile=0
RootWaterUptake=0
RootGrowth=0
MaterialNumbers=0
SubregionNumbers=0
SpaceUnit=cm
TimeUnit=days
PrintTimes=160
NumberOfSolutes=0
InitialCondition=1
NumberOfNodes=101
ProfileDepth=120
ObservationNodes=160
GridVisible=160
SnapToGrid=160
ProfileWidth=160
LeftMargin=160
GridOrgX=160
GridOrgY=160
GridDX=160
GridDY=160
Let me give you an example:
Let's create two dataframes:
a <- data.frame(col1 = c('a = ','b = ','c = '))
b <- data.frame(col2 = c(1,2,3))
> a
col1
1 a =
2 b =
3 c =
> b
col2
1 1
2 2
3 3
Let's copy columns from b to a
a$col2 <- b$col2
Let's concatinate both columns:
a$final <- paste0(a$col1, a$col2)
a$col1 <- NULL
a$col2 <- NULL
final
1 a = 1
2 b = 2
3 c = 3
Let's write this down into a file
write.csv(a, file = 'output.csv', row.names = FALSE, quote = FALSE)

Alternating between reading forwards and backwards in a loop

My array is 1D m in length. say m = 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The way I actually interpret the array is n x n = m
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
I require to read the array in this manner due to the way my physical environment is set up
0 4 8 12 13 9 5 1 2 6 10 14 15 11 7 3
What I came up with works but I really don't think it is the best way to do this:
bool isFlipped = true;
int x = 0; x < m; x++
if(isFlipped)
newLine[x] = line[((n-1)-x%n)*n + x/n)]
else
newLine[x] = line[x%n*n +x/n]
if(x != 0 && x % n == 0)
isFlipped = !isFlipped
This gives me the required result but I really think there is a way to get rid of this boolean by purely using a math formula. I am stuffing this into a 8kb microcontroller and I need to conserve as much space as I can because I will have some bluetooth communication and more math going into it later on.
Edit:
Thanks to a user I got to a one line solution-ish. (the below would replace the lines in the for-loop)
c=x/n
newLine[x] = line[((c+1)%2)*((x%n)*n+c) + (c%2)*((n-1)-2*(x%n))*n ];
You should be able to utilize the fact that odd columns in the n*n matrix are read from down up, and even columns are read from up down.
A number at index x in newLine is located in column number c=floor(x/n) in the n*n matrix. c%2 is 0 for even columns and 1 for odd columns. So something like this should work:
int c = x/n;
newLine[x] = line[(x%n)*n + (c%2)*((n-1)-2*(x%n))*n + c];

Resources