Read More

The post Build your own discrete event simulation framework in Python Part III appeared first on Enholm Heuristics.

]]>Previously we’d gone over adding latitudes and longitudes in a position to track locations of our patrol units. Now we’re going to introduce some math for distance calculations between two positions that use latitude and longitude. Our first calculating method will be the law of cosines for calculating great circle distance on the surface of a sphere. The low of cosines for great circle distance *g* is:

*g = arccos(sin(lat1)sin(lat2) + cos(lat1)cos(lat2)cos(lon2-lon1))*R*

where* lat1,lon1* is latitude and longitude for the first position, and *lat2, lon2* is latitude and longitude for the second position, and *R* is the radius of the Earth in miles. *R* can be in kilometers too, and will give you the great circle distance in kilometers if you use it here. Latitude and longitude must be converted from degrees to radians to use this, and we’ll cover that here. To do all this in or simulation, we need to add this to the top of our Python script:

import math

which will import Python’s math library. Unfortunately, now things get complicated. First we’ll need a function to convert degrees to radians, it’s actually not too bad. Here it is:

#function degrees to radians def deg2rad(deg): return (deg * math.pi / 180)

So we’re going to pass it a degree measurement, and it’s going to convert it to radians in our great circle calculations, not too complicated. Here’s our great circle distance function in Python:

def gcd_slc(long1, lat1, long2, lat2): # convert degrees to radians long1 = deg2rad(long1) lat1 = deg2rad(lat1) long2 = deg2rad(long2) lat2 = deg2rad(lat2) R = 3959 # Earth mean radius [miles] d = math.acos(math.sin(lat1) * math.sin(lat2) + math.cos(lat1) * math.cos(lat2) * math.cos(long2 - long1)) * R return (d) # Distance in miles

It’s pretty much identical to the formula I showed you above, but it incorporates the *deg2rad* function for degree conversion, and we have to call the math library to access the trigonometric functions in Python.

There’s one more thing I want to incorporate here, when we’re calculating distance, and that’s the notion of *Manhattan distance*. Manhattan distance is also known as taxicab geometry, rectangular distance, and a few other names, but it’s basically the distance that you have to travel in a region where the roads predominately run north and south, or east and west. In an ideal world we’d incorporate all the roads and features of the area we’re covering, but we don’t have months to do this, so we’ll just use Manhattan distance. For our area (Phoenix) it’s sufficient. If you working with data in Boston, which was engineered by the Mad Hatter’s association, then it’s not going to work. In fact if you’re working with Boston data then almost nothing is going to work, but let’s not worry about that for now.

All we have to worry about for our Manhattan distance, is placing a “dummy” point between our two positions, that will have the latitude of one of the positions, and the longitude of the other position. Then we can use our great circle function above to go from our first position to our dummy position, then go from our dummy position to our final position.

Here’s how it’s implemented in Python:

def manhattan_dist(latlon1, latlon2): v_dist=gcd_slc(latlon1.lon,latlon1.lat,latlon1.lon,latlon2.lat) h_dist=gcd_slc(latlon1.lon,latlon2.lat, latlon2.lon, latlon2.lat) return(v_dist+h_dist)

Finally, we’re going to make a *Call* class, to hold the data objects that we need

class Call(object): """The call process (each call has a ``name``) calls the district and requests a patrol car. """ def __init__(self, name, call_length, latlon, historic_response_time): self.name=name self.call_length=call_length self.latlon=latlon self.call_log=None self.historic_response_time=historic_response_time self.response_time=None

A lot of the attributes of the class aren’t going to be used right now, the most important thing you need in the *Call* class is the *latlon* object, which we’ll use to position the call event. We’ll place the event at minute 13 of our prototype simulation script, and it will look something like this:

if i == 13: call_1 = Call(1, 23, LatLon(33.51305,-112.0856),25) print("Call %d called in at %s %s %s " % (call_1.name, current_day_hour_minute.day, current_day_hour_minute.hour, current_day_hour_minute.minute)) dist1=manhattan_dist(call_1.latlon, car_one.current_latlon) dist2=manhattan_dist(call_1.latlon, car_two.current_latlon) if dist1 <= dist2: print("Car one is closer at distance %.2f miles versus %.2f miles" %(dist1, dist2)) else: print("Car two is closer at distance %.2f miles versus %.2f miles" % (dist2, dist1))

So the call is made 13 minutes into the simulation, and we’re going to determine which of the patrol units is closer so we can dispatch it to the call. Next time, we’ll look at populating our simulation with real world events via text file. Here’s the entire script for this tutorial:

https://github.com/jenholm/MontePy/blob/master/Sim5.py

The post Build your own discrete event simulation framework in Python Part III appeared first on Enholm Heuristics.

]]>Read More

The post Build your own discrete event simulation framework in Python Part II appeared first on Enholm Heuristics.

]]>In the second part of the tutorial, we’ll go over queues. Queues are an integral part of many simulations, and used to handle cases where the simulation is used to examine processes where a server or servers are serving customers, handling events, and so forth.

Our eventual goal is to have patrol cars service calls, to see how fast a car will respond to calls when it’s placed in a particular location in a city. To do so, the first thing we’ll add to the script we built in Part I, is a queue. Python has a queue class that’s perfectly suitable for our purposes, and can even handle multi-threading which we aren’t going to use, and you don’t need to worry about right now. At least, I’m not going to go into it. With all that being said, let’s totally ignore the Python queue and write our own queue class. It’s going to look something like this:

class Queue: def __init__(self): self.items = [] def isEmpty(self): return len(self.items)==0 def enqueue(self, item): self.items.insert(0,item) def dequeue(self): return self.items.pop() def get(self, name): return_item=None object_list=[item for item in self.items if item.name==name] if len(object_list) > 0: return_item=object_list[0] #remove item from list self.items=[item for item in self.items if item.name!=name] return return_item def size(self): return len(self.items) def peek(self, name): return_item = None object_list = [item for item in self.items if item.name == name] if len(object_list) > 0: return_item = object_list[0] return return_item

Since this will be at the heart of our sim, some explanation is required. First, the *items* list will contain the objects waiting for service. Get() is not just going to get the item you want from the queue, it’s going to remove it also. If you just want to to look at it, without removing it, that’s what peek() is for.

You add items to the queue by writing:

queue=Queue() queue.enqueue(my_item)

Enqueue is different from the Python command *queue.items.append(my_item)*, it will place things at the back of the line. Actually in positions zero, but that’s the back for our purposes.

We are also using latitude and longitude positions for our patrol cars, so we need a latitude-longitude object to hold our positions:

class LatLon(object): def __init__(self, lat, lon): self.lat=lat self.lon=lon

Finally, we’ll need a Patrol Car class as a container for our patrol cars:

class PatrolCar(object): def __init__(self,car_number,latlon): self.name=car_number print("New Car %d assigned pos %.4f %.4f" % (car_number, latlon.lat, latlon.lon)) self.current_latlon=latlon self.patrol_latlon=latlon self.call_wait=0 self.move_wait=0 self.on_call=False self.call=None self.car_removed=False

There’s some other things in there that you don’t need to worry about right now. Just know for now, every PatrolCar object will need a LatLon object. When you instantiate the PatrolCar in your script, it’s going to store the LatLon that you give to it in both the PatrolCar’s current position and patrol position. We’ve also given it a number which will be synonymous with id. Here’s what it look’s like when you instantiate it:

car_one = PatrolCar(1,LatLon(33.448,-112.083))

For my model, I’m using the same datum that Google Maps uses, WGS84. Off topic, if you want to know what the WGS84 lat-long for a particular point in a Google map, right click on that spot of the map and choose *“What’s Here?”* and one of the things that will pop up, will be the WGS84 lat-long.

Here’s the instantiation and enqueuing:

car_pool = Queue() car_one = PatrolCar(1,LatLon(33.448,-112.083)) car_pool.enqueue(car_one) car_two = PatrolCar(2,LatLon(33.466,-112.100)) car_pool.enqueue(car_two)

Here’s the entire script:

https://github.com/jenholm/MontePy/blob/master/Sim4.py

In the next installment, we’ll set up a call queue that the patrol cars will respond to. Here’s Part I:

http://enholm.net/2017/09/20/build-discrete-event-simulation-framework-python-part/

The post Build your own discrete event simulation framework in Python Part II appeared first on Enholm Heuristics.

]]>I wouldn’t say a million. ...

Read More

The post Build your own discrete event simulation framework in Python Part I appeared first on Enholm Heuristics.

]]>I wouldn’t say a million. SimPy seems to be number one right now. There are a few others. And SimPy is a nice little framework. It has events that process, queues that automatically increment, all kinds of things. But after spending a few weeks trying to do what I want to do, the SimPy was a little to SimPle. I tried all kinds of ways to process the complicated set of events and state changes that needed to occur in SimPy, but for a medium-sized, industrial discrete event simulation, SimPy just didn’t have the fidelity and flexibility that you might need.

Here’s the system I need to simulate: I have a bunch of patrol cars roaming around the city, answering calls from people who need help. Obviously, we want to place our patrol cars to minimize the amount of time it takes to respond to a call. So I have a heuristic that places the patrol cars in the best spots to respond to a call, depending on what day of the week it is, and what hour of the day it is. The calls on Saturday night, come in at a different frequency, and different places than the calls on Thursday morning. So we need a dynamic solution where the patrol cars move to a new patrol position every hour. We also need the ability for the cars to be dispatched to the nearest call. And we need the ability for the cars to be able to go from call to call, in case they get too busy and the calls go into a queue. And the cars need to have several different states. This is where SimPy failed. The cars in SimPy are a resource, once a resource is assigned, you have limited-to-no control over that resource until it’s released by the requesting entity, in this case, the caller. There was just no simple way to do this in SimPy. I’m sure I could have figured it out if I banged my head on my desk a few more times, but if I had known the limitations of the framework, and how simple it was to create my own, I never would have gone down that road.

So, without further ado, let’s get started with the most important ingredient of our discrete event simulation, a clock. Without a clock, you can’t have discrete events, the notion of time is inherent in an event. An event will always take place, at a certain time. It may not be on-time, in fact if it’s a flight I’m on, it will almost never be on-time, but eventually, there will be a time where either the flight will take off, or it will be canceled.

But to go back to our simulation, we need a clock to control the state of these events. So the first thing I’m going to do, is create the simplest clock possible in my Python script, a for loop. Because my patrol car simulation’s basic unit of time measurement is the minute, I’m going to pretend that each iteration of my for loop is one minute. You can, of course change this, the unit of measurement is arbitrary, it can be a second, a millisecond, or a year, whatever works for your own needs. Here’s what it looks like:

#our unit of time here, is going to be #one minute, and we're going to run for one week SIM_TIME=7*24*60 ####START SIM RUN for i in range(1, SIM_TIME): print("Sim minute: %d" % i)

And here’s a little bit of sample output:

Sim minute: 1

Sim minute: 2

Sim minute: 3

Sim minute: 4

Sim minute: 5

Sim minute: 6

Sim minute: 7

Sim minute: 8

Sim minute: 9

Sim minute: 10

…

Very simple, right? A very basic simulation of a clock, but not very useful. For my patrol car simulation, the cars will be relocated every hour, that’s every sixty minutes for those of you that work for my favorite airline. How do we tell our simulation to do something every hour? We’ll use the modulo operator in Python : %. It looks something like this:

#our unit of time here, is going to be #one minute, and we're going to run for one week SIM_TIME=7*24*60 ####START SIM RUN hour=0 for i in range(1, SIM_TIME): print("Sim minute: %d" % i) if i % 60 == 0: print("Another hour has passed. Last hour %d" % hour) hour+=1 print("This hour: %d" % hour)

Now the output looks like this:

Sim minute: 56

Sim minute: 57

Sim minute: 58

Sim minute: 59

Sim minute: 60

Another hour has passed. Last hour 0

This hour: 1

Sim minute: 61

…

So now we have a way of making something happen every hour, which we need to move our patrol cars. Counting minutes is fine, and we can probably write our entire simulation by counting minutes, but it makes readability and debugging a lot harder. Let’s make our sim more readable and compatible with the outside world by making our clock look more like the outside world. First I’ll set up a couple of arrays that I can use to track hours and days:

DOW=["Sun","Mon","Tue","Wed","Thu","Fri","Sat"] hour_array=["00","01", "02", "03", "04", "05", "06", "07","08", "09", "10", "11", "12", "13", "14","15", "16", "17", "18", "19", "20", "21","22", "23"]

As you’ve notices my hours are going to be on the twenty-four hour clock system, as most law enforcement agencies use. I’m going to put these together in an class called a ScheduleHour, which looks something like this:

class ScheduleHour(object): def __init__(self, day, hour, index): self.day = day self.hour = hour self.index = index

ScheduleHour is going to be used to frame a weekly schedule to be used as a guide, AKA a schedule. To translate between the schedule and the clock, I’m going to use another class called DayHourMinute. This will look very similar, but everything’s a string:

class DayHourMinute(object): def __init__(self, day_string, hour_string, minute_string): self.day=day_string self.hour=hour_string self.minute=minute_string

So here’s what it looks like now, when I put it all together:

#our unit of time here, is going to be #one minute, and we're going to run for one week SIM_TIME=7*24*60 DOW=["Sun","Mon","Tue","Wed","Thu","Fri","Sat"] hour_array=["00","01", "02", "03", "04", "05", "06", "07","08", "09", "10", "11", "12", "13", "14","15", "16", "17", "18", "19", "20", "21","22", "23"] current_day_hour_minute=None class DayHourMinute(object): def __init__(self, day_string, hour_string, minute_string): self.day=day_string self.hour=hour_string self.minute=minute_string class ScheduleHour(object): def __init__(self, day, hour, index): self.day = day self.hour = hour self.index = index ####START SIM RUN hour=0 schedule = [] h=0 for this_day in DOW: for this_hour in hour_array: temp_hour = ScheduleHour(this_day, this_hour, h) schedule.append(temp_hour) h += 1 for i in range(1, SIM_TIME): if i % 60 == 0: print("Another hour has passed. Last hour %d" % hour) hour+=1 print("This hour: %d" % hour) day_index = DOW.index(schedule[hour].day) current_day_hour_minute = DayHourMinute(schedule[hour].day, schedule[hour].hour, str(i - int(schedule[hour].hour) * 60 - (1440 * day_index))) print("Day %s Hour %s Minute %s " % (current_day_hour_minute.day, current_day_hour_minute.hour, current_day_hour_minute.minute))

And here’s what the output looks like:

…

Day Sun Hour 00 Minute 57

Day Sun Hour 00 Minute 58

Day Sun Hour 00 Minute 59

Another hour has passed. Last hour 0

This hour: 1

Day Sun Hour 01 Minute 0

Day Sun Hour 01 Minute 1

Day Sun Hour 01 Minute 2

…

Much more readable, isn’t it? Notice the minute count goes from 0 to 59, instead of 1 to 60. This sim tutorial will be continued. Here’s the source code for this tutorial:

https://github.com/jenholm/MontePy/blob/master/Sim3.py

The post Build your own discrete event simulation framework in Python Part I appeared first on Enholm Heuristics.

]]>Read More

The post Excel to R Transhipment Problem Optimization Tutorial Part II appeared first on Enholm Heuristics.

]]>The R library we’re going to use for the linear program optimization is called lpSolveAPI. First we’ll put down the call to the library:

`library(lpSolveAPI)`

Then we’ll going to set down the number of constraints, the number of decision variables, and the type of error reporting. We have four positions we’re traveling FROM, four positions we’re traveling TO, and we need to model each different combinations of FROM position to TO position so we’re going to have sixteen different decision variables. For the constraints, we’re going to have nine, for reasons I’ll go into below. For the error reporting, we’re going to say, “full,” because this is our first time, so hey, tell me everything. The call to set these parameters is:

make.lp(number of constraints, number of decision variables, error reporting)

lpmodel<-make.lp(9, 16, "full")

Now we’re going to set our objective function, remember we want to minimize the distance these four police vehicles have to travel to the next patrol position. I’m not going to go into how this was calculated because I’m saving that pain for later. For now , the call will consist of basically the same thing as the Excel model, a vector of distances between every combination of points.

Which is done like this:

set.objfn(lpmodel, c(0.767,14.478,20.088, 24.415,

16.534,2.928,3.358,12.426,

22.115,18.007,18.431,2.657,

17.545,3.824,2.343,18.130))

There’s a few different ways to do the constraints, and there are ones that are much shorter than the ones we’re going to use here, but for instructional purposes we won’t use sparse matrices or any shortcuts. These constraint matrices are to use make sure that each FROM position will give up one car, and one car only and are tied to the above distance matrix. They assume we’re using the same format as the Excel model, where the first four lines are for position FROM one to each different TO, position FROM two to each different TO, and so forth

add.constraint(lpmodel, c(1, 1, 1, 1,

0, 0, 0, 0,

0, 0, 0, 0,

0, 0, 0, 0), "=", 1)

add.constraint(lpmodel, c(0, 0, 0, 0,

1, 1, 1, 1,

0, 0, 0, 0,

0, 0, 0, 0), “=”, 1)

add.constraint(lpmodel, c(0, 0, 0, 0,

0, 0, 0, 0,

1, 1, 1, 1,

0, 0, 0, 0), “=”, 1)

add.constraint(lpmodel, c(0, 0, 0, 0,

0, 0, 0, 0,

0, 0, 0, 0,

1, 1, 1, 1), “=”, 1)

Now we’re going to make sure that the TO positions only will have one car moving to each of them:

add.constraint(lpmodel, c(1, 0, 0, 0,

1, 0, 0, 0,

1, 0, 0, 0,

1, 0, 0, 0), "=", 1)

add.constraint(lpmodel, c(0, 1, 0, 0,

0, 1, 0, 0,

0, 1, 0, 0,

0, 1, 0, 0), "=", 1)

add.constraint(lpmodel, c(0, 0, 1, 0,

0, 0, 1, 0,

0, 0, 1, 0,

0, 0, 1, 0), "=", 1)

add.constraint(lpmodel, c(0, 0, 0, 1,

0, 0, 0, 1,

0, 0, 0, 1,

0, 0, 0, 1), "=", 1)

Now we're going to add a constraint to make sure, that there the model variables don't go negative:

add.constraint(lpmodel, c(1, 1, 1, 1,

1, 1, 1, 1,

1, 1, 1, 1,

1, 1, 1, 1), ">=", 0)

And we're going to define the variables as integers, and add something that lpsolve needs to add by defining a columns variable:

columns<-seq(1, 16)

set.type(lpmodel, columns, type = c(“integer”))

To solve the model, we make this call:

solve(lpmodel)

To look at the objective:

get.objective(lpmodel)

Look at which variables were picked for the actual position-to-position transfers:

get.variables(lpmodel)

We can also print out the lp format of our model for debugging:

write.lp(lpmodel, 'model.lp', type='lp')

Here’s the .R file for the model. I had to change the file extension to .txt, wordpress is having a conniption about a .R extension:

The post Excel to R Transhipment Problem Optimization Tutorial Part II appeared first on Enholm Heuristics.

]]>Read More

The post Excel to R Transhipment Problem Optimization Tutorial Part I appeared first on Enholm Heuristics.

]]>The problem I’m working on is: I have a list of police patrol car positions that I’ve put together, and I need to these positions to change every hour so we can minimize their response time when they respond to calls. These calls come from different parts of the city depending on the time of day, and day of week, that’s why we need to move them.

This type of problem is a variation of the Traveling Salesman Problem called the Transshipment problem and is commonly used to schedule shipments of cargo between a series of points.

To make the problem simpler, I’m going to isolate a small amount of the transfers, down to four patrol cars, transferring between two positions. I’m going to skip a complete mathematical description of the transshipment problem for brevity here, and just talk through our particular problem. In Excel, we’re going to

- Minimize the total distance between the two sets of points by
- Changing a set of binary selection variables subject to:
- Each car is transferred to one, and only one point

This can be modeled in Excel, using the Excel Solver by arranging each set of possible points in a matrix, with the selection variable, the **From** points, the **To** points, and the distance between each possible combination of points. Since I have four **From** points, and four **To** points, I’ll have a matrix with sixteen rows with the different combinations. The **From** points are labeled **one to four** and the **To** points are labeled **five to eight**. The **Select** variable in the left column will be utilized by the optimization engine to select the row, by putting a one or zero in the column. The **Distance** column defines the distance between the two points, we’re going to total that on the bottom if the **Select** column has a one in it.The total distances selected will define our optimization objective.

Select |
From |
To |
Distance |

1.0 | 1 | 5 | 0.767 |

0.0 | 1 | 6 | 14.478 |

0.0 | 1 | 7 | 20.088 |

0.0 | 1 | 8 | 24.415 |

0.0 | 2 | 5 | 16.534 |

1.0 | 2 | 6 | 2.928 |

0.0 | 2 | 7 | 3.358 |

0.0 | 2 | 8 | 12.426 |

0.0 | 3 | 5 | 22.115 |

0.0 | 3 | 6 | 18.007 |

0.0 | 3 | 7 | 18.431 |

1.0 | 3 | 8 | 2.657 |

0.0 | 4 | 5 | 17.545 |

0.0 | 4 | 6 | 3.824 |

1.0 | 4 | 7 | 2.343 |

0.0 | 4 | 8 | 18.130 |

To model the constraint that each from position will be assigned to one and only one to position, I’m going to set up two more tables in Excel. One table will track all the **From** assignments, and make sure that the assignments only add up to 1 for each from variable by using a sum constraint.

FromPosition |
Position Sum |
Sum Constraint |

1 | 1 | 1 |

2 | 1 | 1 |

3 | 1 | 1 |

4 | 1 | 1 |

The other table will track to To variables and use the same sum constraints to make sure that each To position only sums to 1.

ToPosition |
Transferred |
Sum Constraint |

5 | 1 | 1 |

6 | 1 | 1 |

7 | 1 | 1 |

8 | 1 | 1 |

Finally, I’m going to program this into the Excel solver. My Objective is to Minimize the total distance covered when I move the cars between their patrol positions.

Total Distance |
8.695 |

Then I need to put in a constraint to make sure all my binary selection variables stay positive, so there’s no funny business by the Excel solver to make the selection variables go negative, add my two table sum constraints, and finally, this is a linear program, so I’m going to make sure that Simplex LP is chosen for the Solving method.

Press solve, and we’re done. Now that we’ve modeled this very simple model in Excel, the next time we’re going to transfer this problem to R, so we can scale it to more realistic schedules and situations.

The entire problem on Excel looks like this:

Here’s the Excel file if you want to play with it:

Part II of this tutorial, where we’ll turn the Excel model, into an R model is here:

Excel to R Transhipment Problem Optimization Tutorial Part II

The post Excel to R Transhipment Problem Optimization Tutorial Part I appeared first on Enholm Heuristics.

]]>Read More

The post The Analyst’s Dilemma, R vs. Python, or How I Stopped Worrying and Hated Them Both appeared first on Enholm Heuristics.

]]>Unfortunately, defeat was snatched from the jaws of victory when I installed RMonkey, and found that SurveyMonkey.com had changed its API, as companies are wont to do, without making the changes backward compatible with the old API. OK, this isn’t RMonkey’s fault, it’s SurveyMonkey’s fault. But this isn’t an isolated incident, and reflects a fault across the R developershere of

- Building a cool R library
- Put it out there
- Fuhgetaboutit

So we have all these R libraries out there on github, like the island of unwanted toys, or former celebrities on Dancing with the Stars, that no longer work, but hover in the source repositories like middle aged men at Vegas MTV pool parties, no longer wanted, dysfunctional, and kinda creepy.

Even worse, all of SurveyMonkey’s documentation is in Python, making it tougher still to write my own library in R to access the API. Eventually I ended up using the Python surveymonty package to access the API. After extracting the site data, I did the analysis of the survey results with R, which did have some great (and functioning libraries) to sift out questionable results using the Mahalanobis distance as a basis for multivariate outlier detection.

Fast forward several months, and I found the need for a discrete event simulation library in R. Their seemed to be a good one, called simmer based on the simpy library in python.

Uh-oh, I thought.

Once again, after spending a day or so working the examples in simmer, I found the library to be buggy, so much so the documentation’s examples didn’t even run without exception. Apparently the author(s) of the library had made some changes without updating the documentation, making the entire shebang invalid. Back I went to Python, and the simpy library, which ran without incident, and in accordance with the supplied examples.

There seems to be a recurring theme here. Maybe several; don’t use projects that have been kicked out of the CRAN repository due to bad maintenance, that’s my fault. The second issue, passing of bad information about libraries that might work for your project. The third is, a lack of policing the internet, or maybe I should say rating projects that you can use on the internet. Perhaps we need a better method of rating projects, or in some cases, any method of rating projects and passing that information to others, particularly on github. While we are thankful that builders took the time to build their projects and put them out there, we don’t want the house they built to fall on our heads.

Joel Spolsky has described how they test software at Microsoft, and the hoops they jump though to try and make sure the system works with as much legacy software as humanly possible. While we don’t have the assets that Microsoft does in testing R projects, we do have the assets to at least rate the libraries. This is implemented in SourceForge, but not on github or CRAN. I guess I could right an app to do that, but then who would maintain it? Now, as we say when we use Regex, we have two problems.

The post The Analyst’s Dilemma, R vs. Python, or How I Stopped Worrying and Hated Them Both appeared first on Enholm Heuristics.

]]>Read More

The post Crime Analysis Series: Manhattan Distance in R appeared first on Enholm Heuristics.

]]>

Calculating driving distance for an officer to respond to a call has to be done using the “Manhattan” or “Taxicab” distance method. This method assumes that there are no direct routs between points and thus you have to “square off” the distance by using right angles between the points.

This isn’t as hard as it sounds. If you have a set of two Lat-Long coordinates, all you have to do is use an intermediate point with the 1st points Longitude, and the 2nd points latitude, and calculate the distance to that point from the first using your great circle distance function. Then calculate the distance from your intermediate point to your second point, and add the two intermediate distances together.

Using our R function from this post:

We can calculate the Manhattan distance using this function:

################################################################################ #manhattan.dist calculates the rectangular travel distance between two points #by using north-south great circle distance and then east-west great circle distance # ################################################################################ manhattan.dist<-function(long1, lat1, long2, lat2){ v_dist<-gcd.slc(long1,lat1,long1,lat2) h_dist<-gcd.slc(long1,lat2, long2, lat2) v_dist+h_dist }

The post Crime Analysis Series: Manhattan Distance in R appeared first on Enholm Heuristics.

]]>Read More

The post Crime Analysis Series Calculating Great Circle Distance Between Two Points in R Using Haversine Formula appeared first on Enholm Heuristics.

]]>Warning: this formula assumes the Earth is a perfect sphere,which is not the case and results in inaccuracies compared to more rigorous methods. For my purposes it works fine.

First, since the function I’m going to create uses radians, and latitudes and longitudes are basically a measurement of degrees, we need to create some helper functions to translate back and forth. Here they are:

#function to convert degrees to radians deg2rad <- function(deg) return(deg*pi/180)

Now we’re going to create the actual function. One of the things that messed with my head a little bit with this whole process is the fact that every R function that deals with GIS uses longitude first, then latitude. As a former navigator, I did the opposite for years. But the reason R does it this way, is because when you’re plotting latitude and longitude, longitude is going to end up in the X plane, which is always first in plotting functions.

Also take note, that this function is going to return the distance in statute miles. This is easy to change, just change the Earth mean radius part to your measurement of choice, and the function will return the distance in that metric. Here’s the function:

gcd.slc <- function(long1, lat1, long2, lat2) { # Convert degrees to radians long1<-deg2rad (long1) lat1<-deg2rad(lat1) long2<-deg2rad(long2) lat2<-deg2rad(lat2) R <- 3959 # Earth mean radius [miles] d <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R return(d) # Distance in miles }

And that’s it. To call it:

gcd.slc(-112.0738, 33.448266, -111.9322, 33.477) [1] 8.400443

Notice the WGS84 coordinates, that’s the system that Google uses in it’s maps, so it’s convenient to flip back and forth. For this answer you should get 8.4 statute miles.

The post Crime Analysis Series Calculating Great Circle Distance Between Two Points in R Using Haversine Formula appeared first on Enholm Heuristics.

]]>Read More

The post The Political Forecast Massacre of 2016 appeared first on Enholm Heuristics.

]]>Here’s a post from a gentleman who posted that Nate Silver at 538 actual got it right. His reasoning is that Silver’s expected margin of victory was only wrong for Ohio:

That may be true, but as a political forecaster, should I be forecasting the margin of victory for each candidate, or the electoral vote win?

His chart means absolutely nothing. And even if it could mean something, he’s making the assumption that Ohio counts as much as Maine. From a political battleground perspective, this is completely faulty logic. Ohio has eighteen votes, Maine has two. Are they equal? Not even close. Should they be treated equally in the election? If you, as a military commander attacks a position with your division, would a difference of eighteen battalions versus three make a difference? If you don’t think so, you’re not going to be around for long.

A presidential candidate wins the electoral vote in the U.S. to win the office. Reason: The smaller states would have never ratified the Constitution if they thought that they would have no voice. The same reason why we have a senate. If you think this isn’t still important, try to pass a constitutional amendment to take away the electoral vote, and see how many of the smaller states sign off on it, and give up their senate seats in the bargain. Here’s a forecast for you: none of them will. And yet, all we saw on a lot of polls were straight voting percentages from the national perspective.

Here’s Nate Silver’s electoral forecast map for the election:

Actually, you can see he called Ohio for Trump, or actually close and leaning Trump. So the article above about Silver’s “triumph’ is misleading in two ways. What killed Silver’s forecast (and everyone else’s) were the states of Florida, Pennsylvania, Wisconsin, Michigan, and North Carolina. Here’s what actually happened:

**Interestingly enough, Silver predicted Clinton getting 302 electoral votes and Trump getting 235, almost the exact opposite of what happened.**

So, let’s treat the entire massacre as a statistical crime scene. What happened? Why was everyone so wrong? Most of the forecasts, including Silver’s, were based on aggregates of polls from different states. Obviously, polling isn’t an exact science. If we’re going to talk about polls we have three different alternatives:

- People didn’t tell the pollsters who they were really going to vote for

This is called the “Bradley Effect” and is a theory to explain discrepancies between polls and actual results. Rasmussen reports actually looked into this and found some credence to this theory in play for this election. But none of the major forecasters took this into account. Is there a way they could have? Well, Rasmussen did.

2. The pollsters were biased

The Trump campaign has complained about this from the beginning, and many statisticians have complained that some of the polls seem like they’re oversampling democrats. While bias is difficult and in some cases impossible to prove, even Nate Silver has printed a mea culpa about his own bias during the primaries. Even though Silver admitted that there was a problem, there’s ample evidence that it was repeated, refuting his claim that you can “fail forward” to becoming a good scientist. Scientists and researchers are supposed to be a group that holds their objectivity high, particularly in the twenty-first century. But there is evidence that research methods are becoming increasingly biased, especially in the social sciences with this study claiming that sixty-five percent of research papers are making faulty conclusions due to misconduct or outright fraud. As an amusing side note, here’s Politico refuting the “Shy Trump” voters notion. But even Politico noticed the difference between the online polls and phone polls, which leads to my third possibility.

3. The pollsters were incompetent

While many pollsters called Florida and North Carolina as tossup states, none of the majors called Pennsylvania, Wisconsin, or Michigan as battleground states. Silver claimed it back in May, but it didn’t seem to figure in his map above. In any case, Silver was wrong about Pennsylvania deciding the election in that post, it was Florida, and Pennsylvania, and Ohio, and the rest, there wasn’t a single pivotal state like in 2000. It wasn’t a landslide, but it wasn’t close from an electoral perspective.

Let’s discount the Bradley Effect, since there was opportunity, especially after Brexit to figure this into the models. Then we’re stuck with incompetence, or malice in the form of bias, or a combination of both.

What does this mean?

In terms of polls, we’re going to have to assume, that everything in the political polls, or that the pollsters claim, was either exaggerated or plain wrong, until rigorously proven otherwise. I know we’re talking about statistics here, and there’s no right or wrong, but when you claim the one party is going to get 230 electoral votes, and the other party is going to get 302, and the exact opposite happens, I think the word “wrong” is warranted here, but if you insist on a probabilistic measure, Silver gets a Brier score of .49, which isn’t great. Most of the other pollsters get a Brier score close to one, which is terrible.

Now, confidence into the overall reliability of these poll measurements must be questioned. Some things just don’t fit. Like the President having a high approval rating in the polls. If the Democrats lost an election to a mandate at the same time someone who is carrying forward his legacy loses, that poll doesn’t quite fit. A year ago, I would have accepted it without question. I have no evidence it’s wrong, but the inaccuracy of the election polls just calls the whole system into question. Then there are the supposed effects on the polls of the FBI investigation, or the debates, or any of the other items that came up during the election, and the pollsters said the polls went up by this much, or down by this much. We can no longer gauge those effects, because we have no gauge.

The post The Political Forecast Massacre of 2016 appeared first on Enholm Heuristics.

]]>http://enholm.net/iSnake/OpsResearch.html

The post New app showing feeds from our favorite Operations Research and Data Science Blogs appeared first on Enholm Heuristics.

]]>http://enholm.net/iSnake/OpsResearch.html

The post New app showing feeds from our favorite Operations Research and Data Science Blogs appeared first on Enholm Heuristics.

]]>