Friday, December 26, 2014

Understanding Voter Updates



Some of the Age and Precinct charting code updated  12:55 PM 12/29/201 -RMF
Political piece will be here. Output differentiated from R (3.12) code with text color red.

Sunday, December 14, 2014

Analysis of 2012 - 2014 General Election Matchbacks: Part II

Analysis of 2012 - 2014 General Election Matchbacks: Part II  By Age. Political piece here. The data from this comes from Part I.

Thursday, November 6, 2014

Votes left on the Table

This R code is designed to accumulate "MatchBacks" from and election and return results. I then bind a recent voterdb to see how many and what percentage (on a precinct level) of votes were "left on the table".

Thursday, October 30, 2014

Three ballotcounted dispositions mapped per precinct for two elections plus their difference


precinctid | ge2013blank | ge2012blank | bdiff | ge2013count | ge2012count | cdiff | ge2013notcount | ge2012notcount | ncdiff

       801 |          38 |          63 |       -25 |           311 |           504 |        -193 |              339 |              121 |            218
       701 |          31 |          56 |       -25 |           387 |           578 |        -191 |              340 |              124 |            216
       611 |          32 |          54 |       -22 |           343 |           468 |        -125 |              206 |               59 |            147
       610 |          42 |          72 |       -30 |           567 |           734 |        -167 |              267 |               70 |            197
       609 |          44 |          82 |       -38 |           530 |           740 |        -210 |              335 |               87 |            248
       608 |          35 |          66 |       -31 |           413 |           570 |        -157 |              270 |               82 |            188
       607 |          28 |          49 |       -21 |           429 |           555 |        -126 |              218 |               71 |            147
       606 |          46 |          68 |       -22 |           449 |           637 |        -188 |              312 |              102 |            210
       605 |          21 |          36 |       -15 |           289 |           386 |         -97 |              158 |               46 |            112
       604 |          45 |          73 |       -28 |           389 |           578 |        -189 |              298 |               81 |            217
       603 |          50 |          77 |       -27 |           341 |           529 |        -188 |              298 |               83 |            215
...

This postgres SQL (9.5) takes the three ballotcounted dispositions (null, 0, 1) and maps them each per precinct for two elections and shows the difference for each disposition between the two elections. This was over 140 lines of SQL; so there must be a simpler but more optimized method of  obtaining the same result with the OVER or WITH statements. My joins are lacking in sophistication. However, despite the fact that my code is wordy and unreadable; there is no noticeable delay in function or speed. Political piece here.

Wednesday, October 22, 2014

Compare Precinct Voting Lists over time

precinctid cnt_db090514 cnt_db091914 cnt_db100214 cnt_db102014 difference variance
101 939 907 911 911 28 1.0307354555
102 632 622 624 627 5 1.0079744817
103 689 680 681 682 7 1.0102639296
104 428 426 427 427 1 1.0023419204
105 414 404 408 408 6 1.0147058824
106 778 768 772 774 4 1.0051679587
107 946 914 920 922 24 1.0260303688
108 1082 1054 1056 1063 19 1.0178739417
110 687 683 685 688 -1 0.9985465116

...

Monday, October 6, 2014

SQL for flushing Inactive Voters



count | ballotcounted_1 | ballotcounted_2
------+-----------------+-----------------
  634 | 0               | 1
  506 | 0               | 0
  333 | 1               | 1
  211 | 0               |
  153 |                 |
   51 | 1               |
   22 | 1               | 0
    8 |                 | 1
    7 |                 | 0

We can filter 'lost' voters like this:

~14K inactive in Whatcom County
~8K inactive in 42nd LD
~2K marked inactive since Certification of Primary e.g '08/20/2014'

That filter gives us these targets listed by priority

  • 333 of that remaining ~2K who voted in both of the last General Elections.
  • 73 of that remaining ~2K who voted only in the last General Election.
  • 642 of that remaining ~2K who voted only in the General Election before last.
Political piece is here.




Thursday, September 25, 2014

avg(age::int4) OVER (partition by precinctid)

Political piece is here.

 precinctid | activeaverage | inactiveaverage | diffaverage
------------+---------------+-----------------+-------------
        268 |  53.6         |  36.1           |  17.5
        203 |  52.9         |  36.4           |  16.5
        139 |  45.9         |  30.6           |  15.4
        202 |  60.1         |  45.3           |  14.8
        167 |  53.9         |  39.2           |  14.7
        103 |  58.9         |  44.5           |  14.4
        205 |  50.7         |  36.4           |  14.3
        201 |  52.5         |  39.2           |  13.3
        144 |  49.6         |  36.6           |  13.1
        131 |  52.2         |  39.5           |  12.7
....

There are several new Postgres moves here for me. At this point, I expect my joins to become more sophisticated in the future.  Postgres syntax:

avg(age::int4) OVER (partition by precinctid)

allows a statistical slice of a factor similar to xtabs  (cross tabulation) in R.  Quite frankly, I think R does this with greater fluidity and less code.

Tuesday, September 23, 2014

SQL for comparing Active vs Inactive voters over time

/* 6:20 PM 9/23/2014 -RMF Political piece for this code is here.
I create query to dump out the various (voter history) databases I wish to compare to Postgres 'views'.  A 'view' in Postgres is essentially and 'in memory' table. There's more to than that but...:

Select * from voterdb where statuscode = 'I'; -- 'I' for 'inactive' 

The unique voter registration number isn't quite reliable as a primary key over time (in my humble opinion) so I use a 'unique tuple' (ARRAY[lastname,firstname,middlename]) of my own invention.
Something like ARRAY[lastname,firstname,middlename,registrationnumber::TEXT] would be even more unique.  You also need the 119 precincts of LD 42.
*/

Monday, September 15, 2014

SQL to query active vs. inactive voters, primary vs. general election participation

Postgres SQL code to query a Whatcom County Voter database to comapare previous and current elections for active vs. inactive voters, precincts vs. general election participation via ballotcounted and precinctid fields. Political piece is here.

Saturday, September 6, 2014

Code to Parse PDC (financial disclosure) data for the State of WA

Quantile Regression for PDC Funding for the 42nd WA LD as of 09/01/2014. Republicans in Blue, Democrats in Red, All 42nd funding in grey
This is code to parse PDC data for candidates from a WA legislative district. See political piece here (link coming). Note: Updated 2:21 PM 9/6/2014 -RMF

Friday, August 22, 2014

Tuesday, August 19, 2014

42nd District By The Numbers



Political piece is here. My attempt to develop a comprehensive suite to look at all aspects of a voter database.

Wednesday, August 13, 2014

City,County,Census (CVAP) data from ESRI shp files in R 3.1.0

Political piece is here. This post is under construction. I am using R to read ESRI shp with City, County, Precinct, CVAP (Census) data.

Wednesday, August 6, 2014

More Matchback Code

Political piece for this code is here. "Match backs" are the database rendition of the first part of the Vote By Mail process in Whatcom County. Before your ballots are run through a Sequoia 400C Optical Scanner, the outer envelope is processed: verification and validation procedures are applied to the incoming ballot. A number of interesting fields are returned:

Friday, June 13, 2014

PDC Database : Frequency vs. Donation size in the 42nd LD to date

The political piece for this code is hereThis R (3.1.0) code helped me dig into the PDC database to understand campaign financing for the 42nd LD in WA. The first example gives a jpeg_create() I used for automated printing, but I had some trouble applying it for all my charts configured par(mfrow) . Hadley Wickham's Advanced R text on data structures gave me the stringsAsFactors = FALSE setting for read.csv.  I have an intuition that could be a useful flag. QQplot seems to portray certain types of data that would otherwise be complicated to understand very well. One such data is comparing the frequency vs. donation size of out both in and out of state donations in a specific political district.

Saturday, June 7, 2014

05.20.2014 Whatcom County Voter General Election Database: The Precincts....





The political piece for this code is here.  I received quite a bit of practice in the skill of sending multivariate graphs to file for this piece. Election databases nearly scream for raster and GIS files.  Something to learn next....

Wednesday, June 4, 2014

Boomers vs. Millenials ...


Political post for this code is here.



Hadley Wickham has said that one of the problems with R is that it isn't very "programmerly".  In working through R loops in complex functions, I have discovered this. Here is what I want to do:

for (i in 1:5)  {VDB_M <- as.data.frame(table(VDB_N%i%$BirthDate)) ... }

where %i% is the set of variables I want to loop through. If there is a method for doing that, I can not find it yet.  Clearly a Python or Powershell wrapper might be the answer.   But it is a big language and maybe applying functions in R for the data analyst needs more research on my part.

Wednesday, May 28, 2014

Whatcom County Voter Database 05.20.2014





The corresponding political piece for this article is here.
The active voter Whatcom County database as of 05.20.2014 is  (row) 36  *  (column) 126283 =  (ncell) 4546188. On an 8 GB I-5 laptop on running 64 bit R 3.1 this isn't much of a problem. There are a number of different approaches/packaged I have found to using R for large data: 
  • data.table
  • RPostgres
  • plyr
  • sqldf
For my laptop, ncell < 5M is handled well enough by  the base stats and graphics packages, although I really like the plyr count function.  Both RPostgres and data.table manipulate the code a faster at some added complexity and increased functionality.  Being able to keep multiple voter databases in Postgres, perform complex SQL queries from the RPostrgres DBI interface and display the results/graphs in R is probably the best bet. I will discuss this in another post.  For the graphs above, I rely heavily on an awkward and nearly unreadable subsetting  and concatenation as in this line:

Saturday, May 17, 2014

Using par,mfrow, cex, pch, rgb , col

I needed complicated graphing to find a way to compare some Geiger-Mueller testing of air filters to understand radioactivity in our local community.  I needed some visual way to compare/contrast "background" or NORML radiation with multiple samples.  The 'par' commands really help here (mfrow, pch, cex) . The data I am plotting with looks like this:


  s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1            8.47            8.47            8.47            0.00
2            0.00            8.47            0.00            0.00
3            8.47           16.95            0.00            8.47
4            0.00            8.47            8.47            0.00
5           16.95            8.47           16.95            8.47
6           16.95            8.47            8.47            8.47

..

Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.

col=rgb(.075,.075,.075,0.25)
col=rgb(.6,.4,.9,0.4)

The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.


Saturday, May 10, 2014

Curve, Constants, lockBinding,locator()


# Show how to choose floating point length, create the constant e  and bind it to the global environment
# Use curve() function to create a sine wave plot with constants pi and e
# use locator() to select arbitrary points which can be plotted again

options(digits=22)
e <- exp(1)
lockBinding("e", globalenv())
e
# [1] 2.718281828459

curve(sin, e^pi, -pi^-e)  # produces plot
z <- as.matrix(locator()) # select points on plot then right click and choose 'stop'

# locator() creates x,y matrix

z                       x                         y
# 1  1.906552221196785357193 -0.4129490482535249085139
# 2  4.856457177229445143496 -0.3680698252087759581030
# 3  7.107700433149108043551 -0.3643298899550468927799
...

plot(z)

Thursday, April 17, 2014

The Rain Over Hazel: raster, adehabitat,zoom

The R Code for this exercise is far below. 

Inspired by a Dan McShane post, I looked at an Atlas 2 (NOAA) GIS data set covering Precipitation Frequency in WA State from 1897 - 1970.   I am going to have to dig a little deeper at http://www.climate.gov/datasearch/ to get the later data. The pacific northwest region does not have updated data for precipitation frequency in this particular format yet. NOAA admits that such updated reports for select regions are overdue. Western WA receives tremendous amounts of rainfall in comparison to other areas of the country. Most of us who have lived here more than ten years have surmised that the rainy season seems to be increasing in length with more 'warm' rain coming from the South ("Chinooks").  Plenty of hard, cold rain still comes from the North as well. 

The first chart below shows the drenching the Olympic Peninsula and Olympic mountains receive. You will notice the distinctive "rain shadow" that appears north and east of the Olympics.  Don't let this fool you. You can get caught in a serious rain storm driving though Squim or Port Angeles or any of the small cities on the Pennisula long before you decide to spend the day hiking Hurricane Hill in the Olympics.  All of western Washington and especially the Cascades receive substantial amounts of moisture. In Whatcom County, Mt. Baker holds some of the last of the glaciated mountains in the contiguous 48 states.  Sometimes Artist Point is not cleared of snow until July 4th.  Here in Western, WA., we don't waste our brief but beautiful two months of summer, nor any sunny days that flirt in between storms before then. But we are not afraid of the rain,wind, or snow either.


The two charts below  show that the area of the Hazel mudslide appears to (historically) have existed at the bottom of  a  "wet pocket".  The approximate location for the Hazel headscarp is 48.2848171, -121.8494478 . (For a picture of the headscarp, please see this outstanding March 26th photograph from Earth Fix!) . The slopes above Hazel, appear to have been a particularly rainy. Indeed, it almost appears as if Hazel herself seems to have been the southern gate keeper of this rainy pocket.  This data has been developed in conjunction with some algorithms, but we can take it at face value for now.

Thursday, March 20, 2014

Function for using RM80 to detect/compare CSV data


RM80 <- function() {
require(plyr)
# RMF Media/ RMF Network Security 7:00 PM 3/20/2014. Tested on R 3.03
# Takes three CSV samples (One Control and Two Samples) from Aware Electronics RM-80 GM Counter.
# Configured to read data from "1 TBU per line" for any Time Base Unit
# Samples must have headers repleace with "Time" and "MicroRads_HR" like this:
#       Time MicroRads_HR
#1  41715.85        10.17
#2  41715.85        15.25
# ...

Sunday, March 16, 2014

Hallquist script to show memory use...

A really nice script to show memory use of objects by Michael Hallquist :

showMemoryUse <- function(sort="size", decreasing=FALSE, limit) {

  objectList <- ls(parent.frame())

  oneKB <- 1024
  oneMB <- 1048576
  oneGB <- 1073741824

  memoryUse <- sapply(objectList, function(x) as.numeric(object.size(eval(parse(text=x)))))

  memListing <- sapply(memoryUse, function(size) {
        if (size >= oneGB) return(paste(round(size/oneGB,2), "GB"))
        else if (size >= oneMB) return(paste(round(size/oneMB,2), "MB"))
        else if (size >= oneKB) return(paste(round(size/oneKB,2), "kB"))
        else return(paste(size, "bytes"))
      })

  memListing <- data.frame(objectName=names(memListing),memorySize=memListing,row.names=NULL)

  if (sort=="alphabetical") memListing <- memListing[order(memListing$objectName,decreasing=decreasing),] 
  else memListing <- memListing[order(memoryUse,decreasing=decreasing),] #will run if sort not specified or "size"

  if(!missing(limit)) memListing <- memListing[1:limit,]

  print(memListing, row.names=FALSE)
  return(invisible(memListing))
}

Saturday, March 8, 2014

Using Options in R

To list all options is R:

as.matrix(.Options[1:length(.Options)])
 > as.matrix(.Options[1:length(.Options)])