Wednesday, May 28, 2014

Whatcom County Voter Database 05.20.2014

The corresponding political piece for this article is here.
The active voter Whatcom County database as of 05.20.2014 is  (row) 36  *  (column) 126283 =  (ncell) 4546188. On an 8 GB I-5 laptop on running 64 bit R 3.1 this isn't much of a problem. There are a number of different approaches/packaged I have found to using R for large data: 
  • data.table
  • RPostgres
  • plyr
  • sqldf
For my laptop, ncell < 5M is handled well enough by  the base stats and graphics packages, although I really like the plyr count function.  Both RPostgres and data.table manipulate the code a faster at some added complexity and increased functionality.  Being able to keep multiple voter databases in Postgres, perform complex SQL queries from the RPostrgres DBI interface and display the results/graphs in R is probably the best bet. I will discuss this in another post.  For the graphs above, I rely heavily on an awkward and nearly unreadable subsetting  and concatenation as in this line:

Saturday, May 17, 2014

Using par,mfrow, cex, pch, rgb , col

I needed complicated graphing to find a way to compare some Geiger-Mueller testing of air filters to understand radioactivity in our local community.  I needed some visual way to compare/contrast "background" or NORML radiation with multiple samples.  The 'par' commands really help here (mfrow, pch, cex) . The data I am plotting with looks like this:

  s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1            8.47            8.47            8.47            0.00
2            0.00            8.47            0.00            0.00
3            8.47           16.95            0.00            8.47
4            0.00            8.47            8.47            0.00
5           16.95            8.47           16.95            8.47
6           16.95            8.47            8.47            8.47


Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.


The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.

Saturday, May 10, 2014

Curve, Constants, lockBinding,locator()

# Show how to choose floating point length, create the constant e  and bind it to the global environment
# Use curve() function to create a sine wave plot with constants pi and e
# use locator() to select arbitrary points which can be plotted again

e <- exp(1)
lockBinding("e", globalenv())
# [1] 2.718281828459

curve(sin, e^pi, -pi^-e)  # produces plot
z <- as.matrix(locator()) # select points on plot then right click and choose 'stop'

# locator() creates x,y matrix

z                       x                         y
# 1  1.906552221196785357193 -0.4129490482535249085139
# 2  4.856457177229445143496 -0.3680698252087759581030
# 3  7.107700433149108043551 -0.3643298899550468927799