R, Julia, SQL, Octave and others: Personal notes on data analysis, computation, data access most especially for querying voter history, Census, PDC, and other election data. Reader is advised to just paste the code text into Notepad++.
Friday, December 26, 2014
Understanding Voter Updates
Some of the Age and Precinct charting code updated 12:55 PM 12/29/201 -RMF
Political piece will be here. Output differentiated from R (3.12) code with text color red.
Sunday, December 14, 2014
Analysis of 2012 - 2014 General Election Matchbacks: Part II
Analysis of 2012 - 2014 General Election Matchbacks: Part II By Age. Political piece here. The data from this comes from Part I.
Tuesday, December 9, 2014
Analysis of 2012 - 2014 General Election Matchbacks: Part I
Updated 5:00 PM 12/14/2014 -RMF. Political piece under construction.
Friday, December 5, 2014
Thursday, November 6, 2014
Votes left on the Table
This R code is designed to accumulate "MatchBacks" from and election and return results. I then bind a recent voterdb to see how many and what percentage (on a precinct level) of votes were "left on the table".
Thursday, October 30, 2014
Three ballotcounted dispositions mapped per precinct for two elections plus their difference
precinctid | ge2013blank | ge2012blank | bdiff | ge2013count | ge2012count | cdiff | ge2013notcount | ge2012notcount | ncdiff
801 | 38 | 63 | -25 | 311 | 504 | -193 | 339 | 121 | 218
701 | 31 | 56 | -25 | 387 | 578 | -191 | 340 | 124 | 216
611 | 32 | 54 | -22 | 343 | 468 | -125 | 206 | 59 | 147
610 | 42 | 72 | -30 | 567 | 734 | -167 | 267 | 70 | 197
609 | 44 | 82 | -38 | 530 | 740 | -210 | 335 | 87 | 248
608 | 35 | 66 | -31 | 413 | 570 | -157 | 270 | 82 | 188
607 | 28 | 49 | -21 | 429 | 555 | -126 | 218 | 71 | 147
606 | 46 | 68 | -22 | 449 | 637 | -188 | 312 | 102 | 210
605 | 21 | 36 | -15 | 289 | 386 | -97 | 158 | 46 | 112
604 | 45 | 73 | -28 | 389 | 578 | -189 | 298 | 81 | 217
603 | 50 | 77 | -27 | 341 | 529 | -188 | 298 | 83 | 215
...
This postgres SQL (9.5) takes the three ballotcounted dispositions (null, 0, 1) and maps them each per precinct for two elections and shows the difference for each disposition between the two elections. This was over 140 lines of SQL; so there must be a simpler but more optimized method of obtaining the same result with the OVER or WITH statements. My joins are lacking in sophistication. However, despite the fact that my code is wordy and unreadable; there is no noticeable delay in function or speed. Political piece here.
Thursday, October 23, 2014
Wednesday, October 22, 2014
Compare Precinct Voting Lists over time
precinctid | cnt_db090514 | cnt_db091914 | cnt_db100214 | cnt_db102014 | difference | variance |
101 | 939 | 907 | 911 | 911 | 28 | 1.0307354555 |
102 | 632 | 622 | 624 | 627 | 5 | 1.0079744817 |
103 | 689 | 680 | 681 | 682 | 7 | 1.0102639296 |
104 | 428 | 426 | 427 | 427 | 1 | 1.0023419204 |
105 | 414 | 404 | 408 | 408 | 6 | 1.0147058824 |
106 | 778 | 768 | 772 | 774 | 4 | 1.0051679587 |
107 | 946 | 914 | 920 | 922 | 24 | 1.0260303688 |
108 | 1082 | 1054 | 1056 | 1063 | 19 | 1.0178739417 |
110 | 687 | 683 | 685 | 688 | -1 | 0.9985465116 |
...
Thursday, October 16, 2014
Friday, October 10, 2014
Monday, October 6, 2014
SQL for flushing Inactive Voters
count | ballotcounted_1 | ballotcounted_2
------+-----------------+-----------------
634 | 0 | 1
506 | 0 | 0
333 | 1 | 1
211 | 0 |
153 | |
51 | 1 |
22 | 1 | 0
8 | | 1
7 | | 0
~14K inactive in Whatcom County
~8K inactive in 42nd LD
~2K marked inactive since Certification of Primary e.g '08/20/2014'
That filter gives us these targets listed by priority
- 333 of that remaining ~2K who voted in both of the last General Elections.
- 73 of that remaining ~2K who voted only in the last General Election.
- 642 of that remaining ~2K who voted only in the General Election before last.
Political piece is here.
Thursday, October 2, 2014
Thursday, September 25, 2014
avg(age::int4) OVER (partition by precinctid)
Political piece is here.
precinctid | activeaverage | inactiveaverage | diffaverage
------------+---------------+-----------------+-------------
268 | 53.6 | 36.1 | 17.5
203 | 52.9 | 36.4 | 16.5
139 | 45.9 | 30.6 | 15.4
202 | 60.1 | 45.3 | 14.8
167 | 53.9 | 39.2 | 14.7
103 | 58.9 | 44.5 | 14.4
205 | 50.7 | 36.4 | 14.3
201 | 52.5 | 39.2 | 13.3
144 | 49.6 | 36.6 | 13.1
131 | 52.2 | 39.5 | 12.7
....
There are several new Postgres moves here for me. At this point, I expect my joins to become more sophisticated in the future. Postgres syntax:
avg(age::int4) OVER (partition by precinctid)
allows a statistical slice of a factor similar to xtabs (cross tabulation) in R. Quite frankly, I think R does this with greater fluidity and less code.
precinctid | activeaverage | inactiveaverage | diffaverage
------------+---------------+-----------------+-------------
268 | 53.6 | 36.1 | 17.5
203 | 52.9 | 36.4 | 16.5
139 | 45.9 | 30.6 | 15.4
202 | 60.1 | 45.3 | 14.8
167 | 53.9 | 39.2 | 14.7
103 | 58.9 | 44.5 | 14.4
205 | 50.7 | 36.4 | 14.3
201 | 52.5 | 39.2 | 13.3
144 | 49.6 | 36.6 | 13.1
131 | 52.2 | 39.5 | 12.7
....
There are several new Postgres moves here for me. At this point, I expect my joins to become more sophisticated in the future. Postgres syntax:
avg(age::int4) OVER (partition by precinctid)
allows a statistical slice of a factor similar to xtabs (cross tabulation) in R. Quite frankly, I think R does this with greater fluidity and less code.
Tuesday, September 23, 2014
SQL for comparing Active vs Inactive voters over time
/* 6:20 PM 9/23/2014 -RMF Political piece for this code is here.
I create query to dump out the various (voter history) databases I wish to compare to Postgres 'views'. A 'view' in Postgres is essentially and 'in memory' table. There's more to than that but...:
Select * from voterdb where statuscode = 'I'; -- 'I' for 'inactive'
The unique voter registration number isn't quite reliable as a primary key over time (in my humble opinion) so I use a 'unique tuple' (ARRAY[lastname,firstname,middlename]) of my own invention.
Something like ARRAY[lastname,firstname,middlename,registrationnumber::TEXT] would be even more unique. You also need the 119 precincts of LD 42.
*/
I create query to dump out the various (voter history) databases I wish to compare to Postgres 'views'. A 'view' in Postgres is essentially and 'in memory' table. There's more to than that but...:
Select * from voterdb where statuscode = 'I'; -- 'I' for 'inactive'
The unique voter registration number isn't quite reliable as a primary key over time (in my humble opinion) so I use a 'unique tuple' (ARRAY[lastname,firstname,middlename]) of my own invention.
Something like ARRAY[lastname,firstname,middlename,registrationnumber::TEXT] would be even more unique. You also need the 119 precincts of LD 42.
*/
Monday, September 15, 2014
SQL to query active vs. inactive voters, primary vs. general election participation
Postgres SQL code to query a Whatcom County Voter database to comapare previous and current elections for active vs. inactive voters, precincts vs. general election participation via ballotcounted and precinctid fields. Political piece is here.
Saturday, September 6, 2014
Code to Parse PDC (financial disclosure) data for the State of WA
Friday, August 22, 2014
Tuesday, August 19, 2014
42nd District By The Numbers
Political piece is here. My attempt to develop a comprehensive suite to look at all aspects of a voter database.
Wednesday, August 13, 2014
City,County,Census (CVAP) data from ESRI shp files in R 3.1.0
Political piece is here. This post is under construction. I am using R to read ESRI shp with City, County, Precinct, CVAP (Census) data.
Wednesday, August 6, 2014
More Matchback Code
Political piece for this code is here. "Match backs" are the database rendition of the first part of the Vote By Mail process in Whatcom County. Before your ballots are run through a Sequoia 400C Optical Scanner, the outer envelope is processed: verification and validation procedures are applied to the incoming ballot. A number of interesting fields are returned:
Friday, August 1, 2014
Monday, July 28, 2014
Sunday, July 20, 2014
Monday, July 14, 2014
Friday, June 13, 2014
PDC Database : Frequency vs. Donation size in the 42nd LD to date
The political piece for this code is here. This R (3.1.0) code helped me dig into the PDC database to understand campaign financing for the 42nd LD in WA. The first example gives a jpeg_create() I used for automated printing, but I had some trouble applying it for all my charts configured par(mfrow) . Hadley Wickham's Advanced R text on data structures gave me the stringsAsFactors = FALSE setting for read.csv. I have an intuition that could be a useful flag. QQplot seems to portray certain types of data that would otherwise be complicated to understand very well. One such data is comparing the frequency vs. donation size of out both in and out of state donations in a specific political district.
Saturday, June 7, 2014
05.20.2014 Whatcom County Voter General Election Database: The Precincts....
The political piece for this code is here. I received quite a bit of practice in the skill of sending multivariate graphs to file for this piece. Election databases nearly scream for raster and GIS files. Something to learn next....
Wednesday, June 4, 2014
Boomers vs. Millenials ...
Political post for this code is here.
Hadley Wickham has said that one of the problems with R is that it isn't very "programmerly". In working through R loops in complex functions, I have discovered this. Here is what I want to do:
for (i in 1:5) {VDB_M <- as.data.frame(table(VDB_N%i%$BirthDate)) ... }
where %i% is the set of variables I want to loop through. If there is a method for doing that, I can not find it yet. Clearly a Python or Powershell wrapper might be the answer. But it is a big language and maybe applying functions in R for the data analyst needs more research on my part.
Saturday, May 31, 2014
Wednesday, May 28, 2014
Whatcom County Voter Database 05.20.2014
The corresponding political piece for this article is here.
The active voter Whatcom County database as of 05.20.2014 is (row) 36 * (column) 126283 = (ncell) 4546188. On an 8 GB I-5 laptop on running 64 bit R 3.1 this isn't much of a problem. There are a number of different approaches/packaged I have found to using R for large data:
- data.table
- RPostgres
- plyr
- sqldf
Monday, May 19, 2014
Saturday, May 17, 2014
Using par,mfrow, cex, pch, rgb , col
I needed complicated graphing to find a way to compare some Geiger-Mueller testing of air filters to understand radioactivity in our local community. I needed some visual way to compare/contrast "background" or NORML radiation with multiple samples. The 'par' commands really help here (mfrow, pch, cex) . The data I am plotting with looks like this:
s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1 8.47 8.47 8.47 0.00
2 0.00 8.47 0.00 0.00
3 8.47 16.95 0.00 8.47
4 0.00 8.47 8.47 0.00
5 16.95 8.47 16.95 8.47
6 16.95 8.47 8.47 8.47
..
Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.
col=rgb(.075,.075,.075,0.25)
col=rgb(.6,.4,.9,0.4)
The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.
s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1 8.47 8.47 8.47 0.00
2 0.00 8.47 0.00 0.00
3 8.47 16.95 0.00 8.47
4 0.00 8.47 8.47 0.00
5 16.95 8.47 16.95 8.47
6 16.95 8.47 8.47 8.47
..
Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.
col=rgb(.075,.075,.075,0.25)
col=rgb(.6,.4,.9,0.4)
The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.
Saturday, May 10, 2014
Curve, Constants, lockBinding,locator()
# Show how to choose floating point length, create the constant e and bind it to the global environment
# Use curve() function to create a sine wave plot with constants pi and e
# use locator() to select arbitrary points which can be plotted again
options(digits=22)
e <- exp(1)
lockBinding("e", globalenv())
e
# [1] 2.718281828459
curve(sin, e^pi, -pi^-e) # produces plot
z <- as.matrix(locator()) # select points on plot then right click and choose 'stop'
# locator() creates x,y matrix
z x y
# 1 1.906552221196785357193 -0.4129490482535249085139
# 2 4.856457177229445143496 -0.3680698252087759581030
# 3 7.107700433149108043551 -0.3643298899550468927799
...
plot(z)
Thursday, April 17, 2014
The Rain Over Hazel: raster, adehabitat,zoom
The R Code for this exercise is far below.
Inspired by a Dan McShane post, I looked at an Atlas 2 (NOAA) GIS data set covering Precipitation Frequency in WA State from 1897 - 1970. I am going to have to dig a little deeper at http://www.climate.gov/datasearch/ to get the later data. The pacific northwest region does not have updated data for precipitation frequency in this particular format yet. NOAA admits that such updated reports for select regions are overdue. Western WA receives tremendous amounts of rainfall in comparison to other areas of the country. Most of us who have lived here more than ten years have surmised that the rainy season seems to be increasing in length with more 'warm' rain coming from the South ("Chinooks"). Plenty of hard, cold rain still comes from the North as well.
The first chart below shows the drenching the Olympic Peninsula and Olympic mountains receive. You will notice the distinctive "rain shadow" that appears north and east of the Olympics. Don't let this fool you. You can get caught in a serious rain storm driving though Squim or Port Angeles or any of the small cities on the Pennisula long before you decide to spend the day hiking Hurricane Hill in the Olympics. All of western Washington and especially the Cascades receive substantial amounts of moisture. In Whatcom County, Mt. Baker holds some of the last of the glaciated mountains in the contiguous 48 states. Sometimes Artist Point is not cleared of snow until July 4th. Here in Western, WA., we don't waste our brief but beautiful two months of summer, nor any sunny days that flirt in between storms before then. But we are not afraid of the rain,wind, or snow either.
The two charts below show that the area of the Hazel mudslide appears to (historically) have existed at the bottom of a "wet pocket". The approximate location for the Hazel headscarp is 48.2848171, -121.8494478 . (For a picture of the headscarp, please see this outstanding March 26th photograph from Earth Fix!) . The slopes above Hazel, appear to have been a particularly rainy. Indeed, it almost appears as if Hazel herself seems to have been the southern gate keeper of this rainy pocket. This data has been developed in conjunction with some algorithms, but we can take it at face value for now.
Inspired by a Dan McShane post, I looked at an Atlas 2 (NOAA) GIS data set covering Precipitation Frequency in WA State from 1897 - 1970. I am going to have to dig a little deeper at http://www.climate.gov/datasearch/ to get the later data. The pacific northwest region does not have updated data for precipitation frequency in this particular format yet. NOAA admits that such updated reports for select regions are overdue. Western WA receives tremendous amounts of rainfall in comparison to other areas of the country. Most of us who have lived here more than ten years have surmised that the rainy season seems to be increasing in length with more 'warm' rain coming from the South ("Chinooks"). Plenty of hard, cold rain still comes from the North as well.
The first chart below shows the drenching the Olympic Peninsula and Olympic mountains receive. You will notice the distinctive "rain shadow" that appears north and east of the Olympics. Don't let this fool you. You can get caught in a serious rain storm driving though Squim or Port Angeles or any of the small cities on the Pennisula long before you decide to spend the day hiking Hurricane Hill in the Olympics. All of western Washington and especially the Cascades receive substantial amounts of moisture. In Whatcom County, Mt. Baker holds some of the last of the glaciated mountains in the contiguous 48 states. Sometimes Artist Point is not cleared of snow until July 4th. Here in Western, WA., we don't waste our brief but beautiful two months of summer, nor any sunny days that flirt in between storms before then. But we are not afraid of the rain,wind, or snow either.
The two charts below show that the area of the Hazel mudslide appears to (historically) have existed at the bottom of a "wet pocket". The approximate location for the Hazel headscarp is 48.2848171, -121.8494478 . (For a picture of the headscarp, please see this outstanding March 26th photograph from Earth Fix!) . The slopes above Hazel, appear to have been a particularly rainy. Indeed, it almost appears as if Hazel herself seems to have been the southern gate keeper of this rainy pocket. This data has been developed in conjunction with some algorithms, but we can take it at face value for now.
Thursday, March 20, 2014
Function for using RM80 to detect/compare CSV data
RM80 <- function() {
require(plyr)
# RMF Media/ RMF Network Security 7:00 PM 3/20/2014. Tested on R 3.03
# Takes three CSV samples (One Control and Two Samples) from Aware Electronics RM-80 GM Counter.
# Configured to read data from "1 TBU per line" for any Time Base Unit
# Samples must have headers repleace with "Time" and "MicroRads_HR" like this:
# Time MicroRads_HR
#1 41715.85 10.17
#2 41715.85 15.25
# ...
Sunday, March 16, 2014
Hallquist script to show memory use...
A really nice script to show memory use of objects by Michael Hallquist :
showMemoryUse <- function(sort="size", decreasing=FALSE, limit) {
objectList <- ls(parent.frame())
oneKB <- 1024
oneMB <- 1048576
oneGB <- 1073741824
memoryUse <- sapply(objectList, function(x) as.numeric(object.size(eval(parse(text=x)))))
memListing <- sapply(memoryUse, function(size) {
if (size >= oneGB) return(paste(round(size/oneGB,2), "GB"))
else if (size >= oneMB) return(paste(round(size/oneMB,2), "MB"))
else if (size >= oneKB) return(paste(round(size/oneKB,2), "kB"))
else return(paste(size, "bytes"))
})
memListing <- data.frame(objectName=names(memListing),memorySize=memListing,row.names=NULL)
if (sort=="alphabetical") memListing <- memListing[order(memListing$objectName,decreasing=decreasing),]
else memListing <- memListing[order(memoryUse,decreasing=decreasing),] #will run if sort not specified or "size"
if(!missing(limit)) memListing <- memListing[1:limit,]
print(memListing, row.names=FALSE)
return(invisible(memListing))
}
Saturday, March 8, 2014
Using Options in R
To list all options is R:
as.matrix(.Options[1:length(.Options)])
> as.matrix(.Options[1:length(.Options)])
as.matrix(.Options[1:length(.Options)])
> as.matrix(.Options[1:length(.Options)])