Notes on Scientific and Mathematical Programming: 2014

Friday, December 26, 2014

Understanding Voter Updates

Some of the Age and Precinct charting code updated 12:55 PM 12/29/201 -RMF
Political piece will be here. Output differentiated from R (3.12) code with text color red.

Analysis of 2012 - 2014 General Election Matchbacks: Part II

Analysis of 2012 - 2014 General Election Matchbacks: Part II By Age. Political piece here. The data from this comes from Part I.

Analysis of 2012 - 2014 General Election Matchbacks: Part I

Updated 5:00 PM 12/14/2014 -RMF. Political piece under construction.

Analysis of Final Matchbacks for 2014

Analysis of Final Matchbacks for 2014. Political piece here.

This R code is designed to accumulate "MatchBacks" from and election and return results. I then bind a recent voterdb to see how many and what percentage (on a precinct level) of votes were "left on the table".

Three ballotcounted dispositions mapped per precinct for two elections plus their difference

precinctid | ge2013blank | ge2012blank | bdiff | ge2013count | ge2012count | cdiff | ge2013notcount | ge2012notcount | ncdiff

801 | 38 | 63 | -25 | 311 | 504 | -193 | 339 | 121 | 218
701 | 31 | 56 | -25 | 387 | 578 | -191 | 340 | 124 | 216
611 | 32 | 54 | -22 | 343 | 468 | -125 | 206 | 59 | 147
610 | 42 | 72 | -30 | 567 | 734 | -167 | 267 | 70 | 197
609 | 44 | 82 | -38 | 530 | 740 | -210 | 335 | 87 | 248
608 | 35 | 66 | -31 | 413 | 570 | -157 | 270 | 82 | 188
607 | 28 | 49 | -21 | 429 | 555 | -126 | 218 | 71 | 147
606 | 46 | 68 | -22 | 449 | 637 | -188 | 312 | 102 | 210
605 | 21 | 36 | -15 | 289 | 386 | -97 | 158 | 46 | 112
604 | 45 | 73 | -28 | 389 | 578 | -189 | 298 | 81 | 217
603 | 50 | 77 | -27 | 341 | 529 | -188 | 298 | 83 | 215
...

This postgres SQL (9.5) takes the three ballotcounted dispositions (null, 0, 1) and maps them each per precinct for two elections and shows the difference for each disposition between the two elections. This was over 140 lines of SQL; so there must be a simpler but more optimized method of obtaining the same result with the OVER or WITH statements. My joins are lacking in sophistication. However, despite the fact that my code is wordy and unreadable; there is no noticeable delay in function or speed. Political piece here.

Measure per precinct turnout for active voters

Poltical piece is here.

Compare Precinct Voting Lists over time

precinctid	cnt_db090514	cnt_db091914	cnt_db100214	cnt_db102014	difference	variance
101	939	907	911	911	28	1.0307354555
102	632	622	624	627	5	1.0079744817
103	689	680	681	682	7	1.0102639296
104	428	426	427	427	1	1.0023419204
105	414	404	408	408	6	1.0147058824
106	778	768	772	774	4	1.0051679587
107	946	914	920	922	24	1.0260303688
108	1082	1054	1056	1063	19	1.0178739417
110	687	683	685	688	-1	0.9985465116

...

Pro I594 Contributions

Post on pro I594 Contributions is here. Code below.

Scripts for PDC Data:10.07.2014 Contributions to WA LD 42

Political piece is here.

SQL for flushing Inactive Voters

count | ballotcounted_1 | ballotcounted_2
------+-----------------+-----------------
634 | 0 | 1
506 | 0 | 0
333 | 1 | 1
211 | 0 |
153 | |
51 | 1 |
22 | 1 | 0
8 | | 1
7 | | 0

We can filter 'lost' voters like this:

~14K inactive in Whatcom County
~8K inactive in 42nd LD
~2K marked inactive since Certification of Primary e.g '08/20/2014'

That filter gives us these targets listed by priority

333 of that remaining ~2K who voted in both of the last General Elections.
73 of that remaining ~2K who voted only in the last General Election.
642 of that remaining ~2K who voted only in the General Election before last.

Political piece is here.

Summary view of a voterdb

Summary voterdb statistics for this political piece.

avg(age::int4) OVER (partition by precinctid)

Political piece is here.

precinctid | activeaverage | inactiveaverage | diffaverage
------------+---------------+-----------------+-------------
268 | 53.6 | 36.1 | 17.5
203 | 52.9 | 36.4 | 16.5
139 | 45.9 | 30.6 | 15.4
202 | 60.1 | 45.3 | 14.8
167 | 53.9 | 39.2 | 14.7
103 | 58.9 | 44.5 | 14.4
205 | 50.7 | 36.4 | 14.3
201 | 52.5 | 39.2 | 13.3
144 | 49.6 | 36.6 | 13.1
131 | 52.2 | 39.5 | 12.7
....

There are several new Postgres moves here for me. At this point, I expect my joins to become more sophisticated in the future. Postgres syntax:

avg(age::int4) OVER (partition by precinctid)

allows a statistical slice of a factor similar to xtabs (cross tabulation) in R. Quite frankly, I think R does this with greater fluidity and less code.

SQL for comparing Active vs Inactive voters over time

/* 6:20 PM 9/23/2014 -RMF Political piece for this code is here.
I create query to dump out the various (voter history) databases I wish to compare to Postgres 'views'. A 'view' in Postgres is essentially and 'in memory' table. There's more to than that but...:

Select * from voterdb where statuscode = 'I'; -- 'I' for 'inactive'

The unique voter registration number isn't quite reliable as a primary key over time (in my humble opinion) so I use a 'unique tuple' (ARRAY[lastname,firstname,middlename]) of my own invention.
Something like ARRAY[lastname,firstname,middlename,registrationnumber::TEXT] would be even more unique. You also need the 119 precincts of LD 42.
*/

SQL to query active vs. inactive voters, primary vs. general election participation

Postgres SQL code to query a Whatcom County Voter database to comapare previous and current elections for active vs. inactive voters, precincts vs. general election participation via ballotcounted and precinctid fields. Political piece is here.

Code to Parse PDC (financial disclosure) data for the State of WA

Quantile Regression for PDC Funding for the 42nd WA LD as of 09/01/2014. Republicans in Blue, Democrats in Red, All 42nd funding in grey

This is code to parse PDC data for candidates from a WA legislative district. See political piece here (link coming). Note: Updated 2:21 PM 9/6/2014 -RMF

Code to Parse Census County and Block Group data

Code to Parse Census County and Block Group data. Political piece here and here.

42nd District By The Numbers

Political piece is here. My attempt to develop a comprehensive suite to look at all aspects of a voter database.

City,County,Census (CVAP) data from ESRI shp files in R 3.1.0

Political piece is here. This post is under construction. I am using R to read ESRI shp with City, County, Precinct, CVAP (Census) data.

More Matchback Code

Political piece for this code is here. "Match backs" are the database rendition of the first part of the Vote By Mail process in Whatcom County. Before your ballots are run through a Sequoia 400C Optical Scanner, the outer envelope is processed: verification and validation procedures are applied to the incoming ballot. A number of interesting fields are returned:

Whatcom County Matchback Report 08.01.2014

Political piece is here.

The Campaign Financing War 07/27/2014 : Part II

Where the Votes Are...

Political piece for this code is here.

How RED or BLUE?

The political post for this code is here.

Voter Stability as a Predictor

Political piece for this is here.

PDC Database : Frequency vs. Donation size in the 42nd LD to date

The political piece for this code is here. This R (3.1.0) code helped me dig into the PDC database to understand campaign financing for the 42nd LD in WA. The first example gives a jpeg_create() I used for automated printing, but I had some trouble applying it for all my charts configured par(mfrow) . Hadley Wickham's Advanced R text on data structures gave me the stringsAsFactors = FALSE setting for read.csv. I have an intuition that could be a useful flag. QQplot seems to portray certain types of data that would otherwise be complicated to understand very well. One such data is comparing the frequency vs. donation size of out both in and out of state donations in a specific political district.

05.20.2014 Whatcom County Voter General Election Database: The Precincts....

The political piece for this code is here. I received quite a bit of practice in the skill of sending multivariate graphs to file for this piece. Election databases nearly scream for raster and GIS files. Something to learn next....

Boomers vs. Millenials ...

Political post for this code is here.

Hadley Wickham has said that one of the problems with R is that it isn't very "programmerly". In working through R loops in complex functions, I have discovered this. Here is what I want to do:

for (i in 1:5) {VDB_M <- as.data.frame(table(VDB_N%i%$BirthDate)) ... }

where %i% is the set of variables I want to loop through. If there is a method for doing that, I can not find it yet. Clearly a Python or Powershell wrapper might be the answer. But it is a big language and maybe applying functions in R for the data analyst needs more research on my part.

05.20.2014 Whatcom County Voter General Election Database: What Happened...

Corresponding political piece here.

Whatcom County Voter Database 05.20.2014

The corresponding political piece for this article is here.
The active voter Whatcom County database as of 05.20.2014 is (row) 36 * (column) 126283 = (ncell) 4546188. On an 8 GB I-5 laptop on running 64 bit R 3.1 this isn't much of a problem. There are a number of different approaches/packaged I have found to using R for large data:

data.table
RPostgres
plyr
sqldf

For my laptop, ncell < 5M is handled well enough by the base stats and graphics packages, although I really like the plyr count function. Both RPostgres and data.table manipulate the code a faster at some added complexity and increased functionality. Being able to keep multiple voter databases in Postgres, perform complex SQL queries from the RPostrgres DBI interface and display the results/graphs in R is probably the best bet. I will discuss this in another post. For the graphs above, I rely heavily on an awkward and nearly unreadable subsetting and concatenation as in this line:

Using par,mfrow, cex, pch, rgb , col

I needed complicated graphing to find a way to compare some Geiger-Mueller testing of air filters to understand radioactivity in our local community. I needed some visual way to compare/contrast "background" or NORML radiation with multiple samples. The 'par' commands really help here (mfrow, pch, cex) . The data I am plotting with looks like this:

s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1 8.47 8.47 8.47 0.00
2 0.00 8.47 0.00 0.00
3 8.47 16.95 0.00 8.47
4 0.00 8.47 8.47 0.00
5 16.95 8.47 16.95 8.47
6 16.95 8.47 8.47 8.47

..

Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.

col=rgb(.075,.075,.075,0.25)
col=rgb(.6,.4,.9,0.4)

The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.

Curve, Constants, lockBinding,locator()

# Show how to choose floating point length, create the constant e and bind it to the global environment
# Use curve() function to create a sine wave plot with constants pi and e
# use locator() to select arbitrary points which can be plotted again

options(digits=22)
e <- exp(1)
lockBinding("e", globalenv())
e
# [1] 2.718281828459

curve(sin, e^pi, -pi^-e) # produces plot
z <- as.matrix(locator()) # select points on plot then right click and choose 'stop'

# locator() creates x,y matrix

z x y
# 1 1.906552221196785357193 -0.4129490482535249085139
# 2 4.856457177229445143496 -0.3680698252087759581030
# 3 7.107700433149108043551 -0.3643298899550468927799
...

plot(z)

Thursday, April 17, 2014

The Rain Over Hazel: raster, adehabitat,zoom

The R Code for this exercise is far below.

Inspired by a Dan McSha ne post, I looked at an Atlas 2 (NOAA) GIS data set covering Precipitation Frequency in WA State from 1897 - 1970. I am going to have to dig a little deeper at http://www.climate.gov/datasearch/ to get the later data. The pacific northwest region does not have updated data for precipitation frequency in this particular format yet. NOAA admits that such updated reports for select regions are overdue. Western WA receives tremendous amounts of rainfall in comparison to other areas of the country. Most of us who have lived here more than ten years have surmised that the rainy season seems to be increasing in length with more 'warm' rain coming from the South ("Chinooks"). Plenty of hard, cold rain still comes from the North as well.

The first chart below shows the drenching the Olympic Peninsula and Olympic mountains receive. You will notice the distinctive "rain shadow" that appears north and east of the Olympics. Don't let this fool you. You can get caught in a serious rain storm driving though Squim or Port Angeles or any of the small cities on the Pennisula long before you decide to spend the day hiking Hurricane Hill in the Olympics. All of western Washington and especially the Cascades receive substantial amounts of moisture. In Whatcom County, Mt. Baker holds some of the last of the glaciated mountains in the contiguous 48 states. Sometimes Artist Point is not cleared of snow until July 4th. Here in Western, WA., we don't waste our brief but beautiful two months of summer, nor any sunny days that flirt in between storms before then. But we are not afraid of the rain,wind, or snow either.

The two charts below show that the area of the Hazel mudslide appears to (historically) have existed at the bottom of a "wet pocket". The approximate location for the Hazel headscarp is 48.2848171, -121.8494478 . (For a picture of the headscarp, please see this outstanding March 26th photograph from Earth Fix!) . The slopes above Hazel, appear to have been a particularly rainy. Indeed, it almost appears as if Hazel herself seems to have been the southern gate keeper of this rainy pocket. This data has been developed in conjunction with some algorithms, but we can take it at face value for now.

Function for using RM80 to detect/compare CSV data

RM80 <- function() {
require(plyr)
# RMF Media/ RMF Network Security 7:00 PM 3/20/2014. Tested on R 3.03
# Takes three CSV samples (One Control and Two Samples) from Aware Electronics RM-80 GM Counter.
# Configured to read data from "1 TBU per line" for any Time Base Unit
# Samples must have headers repleace with "Time" and "MicroRads_HR" like this:
# Time MicroRads_HR
#1 41715.85 10.17
#2 41715.85 15.25
# ...

Hallquist script to show memory use...

A really nice script to show memory use of objects by Michael Hallquist :

showMemoryUse <- function(sort="size", decreasing=FALSE, limit) {

  objectList <- ls(parent.frame())

  oneKB <- 1024
  oneMB <- 1048576
  oneGB <- 1073741824

  memoryUse <- sapply(objectList, function(x) as.numeric(object.size(eval(parse(text=x)))))

  memListing <- sapply(memoryUse, function(size) {
        if (size >= oneGB) return(paste(round(size/oneGB,2), "GB"))
        else if (size >= oneMB) return(paste(round(size/oneMB,2), "MB"))
        else if (size >= oneKB) return(paste(round(size/oneKB,2), "kB"))
        else return(paste(size, "bytes"))
      })

  memListing <- data.frame(objectName=names(memListing),memorySize=memListing,row.names=NULL)

  if (sort=="alphabetical") memListing <- memListing[order(memListing$objectName,decreasing=decreasing),] 
  else memListing <- memListing[order(memoryUse,decreasing=decreasing),] #will run if sort not specified or "size"

  if(!missing(limit)) memListing <- memListing[1:limit,]

  print(memListing, row.names=FALSE)
  return(invisible(memListing))
}

Saturday, March 8, 2014

Using Options in R

To list all options is R:

as.matrix(.Options[1:length(.Options)])
> as.matrix(.Options[1:length(.Options)])

Friday, December 26, 2014

Sunday, December 14, 2014

Tuesday, December 9, 2014

Friday, December 5, 2014

Thursday, November 6, 2014

Thursday, October 30, 2014

Thursday, October 23, 2014

Wednesday, October 22, 2014

Thursday, October 16, 2014

Friday, October 10, 2014

Monday, October 6, 2014

Thursday, October 2, 2014

Thursday, September 25, 2014

Tuesday, September 23, 2014

Monday, September 15, 2014

Saturday, September 6, 2014

Friday, August 22, 2014

Tuesday, August 19, 2014

Wednesday, August 13, 2014

Wednesday, August 6, 2014

Friday, August 1, 2014

Monday, July 28, 2014

Sunday, July 20, 2014

Monday, July 14, 2014

Friday, June 13, 2014

Saturday, June 7, 2014

Wednesday, June 4, 2014

Saturday, May 31, 2014

Wednesday, May 28, 2014

Monday, May 19, 2014

Saturday, May 17, 2014

Saturday, May 10, 2014

Thursday, April 17, 2014

Thursday, March 20, 2014

Sunday, March 16, 2014

Saturday, March 8, 2014