Group a File & Pull Random Values by Time Intervals

Question

This is a bit complex, and I greatly appreciate any help! I am trying to randomly sample rows from a .csv file. Essentially, I want a resulting file of unique locations (Locations are specified by Easting and Northing columns of the data file, below). I want to randomly pull 1 location per 12 hour period per SessionDate in this file (12 hour periods divided into: between 0631 and 1829 hours and between 1830 and 0630 hours; Given as Start: and End: in Data File, below); BUT if any 2 locations are within 6 hours of each other (based on their Start: time), for that location to be tossed, and a new location to be randomly drawn, and for this sampling to continue until no new locations are drawn (i.e., sampling WITHOUT replacement). I have been trying to do this with Python, but my experience is very limited. I tried first putting each row into a dictionary, and recently each row into a list, as follows:

import random

import csv

f=open('file.csv',"U")

list=[]

for line in f:

list.append(line.split(','))

I'm unsure where to go from here - how to sample from these lists the way I need to, and then write them to an output file with my'unique' locations.

Here is the top few lines of my data file:

SessionDate  Start:  End:  Easting  Northing

27-04-07  18:00   21:45  174739  9785206

28-04-07  18:00   21:30  171984  9784738

28-04-07  18:00   21:30  171984  9784738

28-04-07  18:00   21:30  171984  9784738

28-04-07  18:00   21:30  171984  9784738

It gets a bit complicated as some of the observations span midnight, so they may be on different dates, but can be within 6 hours of each other (which is why I have this criterion), for example:

SessionDate  Start:  End:  Easting  Northing

27-04-07  22:30   23:25  171984  9784738

28-04-07  0:25   1:30  174739  9785206

 

Answer

It’s convenient to achieve the whole algorithm in SPL (Structured Process Language):

A

B

C

D

1

=file("D:\\source.txt").import@t(SessionDate:   date:"dd-MM-yy",Start:time:"HH:mm",End:time:"HH:mm",Easting,Northing).derive(datetime(SessionDate,Start):DateStart)

2

for   A1.group(Easting,Northing)

=null

=A2.sort(DateStart).group@o(between(Start,time("6:30","HH:mm"):time("18:30","HH:mm")))

3

for   C2

=if(B2,B3.select(interval@s(B3.DateStart,DateStart)>60*60*6),B3)

4

if   C3!=[]

=@|(B2=C3(rand(C3.len())+1))

5

=file("D:\\result.txt").export@t(D4)

 

A1: Read in the data file where the first line is used as column names, set the data format for SessionDate, Start and End, and then join SessionDate and Start up into a complete DateStart value.

undefined

A2: Group A1’s table by Easting and Northing and loop each group to process data. The working scope of A2’s loop is B2-D4 (A2 references a loop variable); similarly the working scope of B3’s loop is C3-D4.

B2: Stores records selected by each previous loop, where the initial value is null.

C2: Sort A2 by DateStart and group the table every 12 hours for each Start. @o option enables a grouping without sorting.

B3: Loop each group in C2.

C3: Return the first group of records where B2==null; otherwise get the records where Start is 6 hours later from B2’s Start.

C4-D4: If C3 gets certain records, pull a random one from them to store in B2 and then append it to D4. The sign @ represents the current cell D4.

D4’s result:

undefined

A5: Write D4’s records to the target file result.txt.