Wednesday, May 28, 2014

Whatcom County Voter Database 05.20.2014





The corresponding political piece for this article is here.
The active voter Whatcom County database as of 05.20.2014 is  (row) 36  *  (column) 126283 =  (ncell) 4546188. On an 8 GB I-5 laptop on running 64 bit R 3.1 this isn't much of a problem. There are a number of different approaches/packaged I have found to using R for large data: 
  • data.table
  • RPostgres
  • plyr
  • sqldf
For my laptop, ncell < 5M is handled well enough by  the base stats and graphics packages, although I really like the plyr count function.  Both RPostgres and data.table manipulate the code a faster at some added complexity and increased functionality.  Being able to keep multiple voter databases in Postgres, perform complex SQL queries from the RPostrgres DBI interface and display the results/graphs in R is probably the best bet. I will discuss this in another post.  For the graphs above, I rely heavily on an awkward and nearly unreadable subsetting  and concatenation as in this line:

Saturday, May 17, 2014

Using par,mfrow, cex, pch, rgb , col

I needed complicated graphing to find a way to compare some Geiger-Mueller testing of air filters to understand radioactivity in our local community.  I needed some visual way to compare/contrast "background" or NORML radiation with multiple samples.  The 'par' commands really help here (mfrow, pch, cex) . The data I am plotting with looks like this:


  s1.MicroRads_HR s2.MicroRads_HR s3.MicroRads_HR s4.MicroRads_HR
1            8.47            8.47            8.47            0.00
2            0.00            8.47            0.00            0.00
3            8.47           16.95            0.00            8.47
4            0.00            8.47            8.47            0.00
5           16.95            8.47           16.95            8.47
6           16.95            8.47            8.47            8.47

..

Using the rgb function as input to the 'col' parameter allows me to create custom colors with the first three arguments.

col=rgb(.075,.075,.075,0.25)
col=rgb(.6,.4,.9,0.4)

The fourth argument for 'rgb' is an alpha parameter ('transparency'). By varying the transparency and width ('cex') of the graph tick ('pch'), I was able to nest a transparent and slightly smaller series of circles from the control test inside the sample results.


Saturday, May 10, 2014

Curve, Constants, lockBinding,locator()


# Show how to choose floating point length, create the constant e  and bind it to the global environment
# Use curve() function to create a sine wave plot with constants pi and e
# use locator() to select arbitrary points which can be plotted again

options(digits=22)
e <- exp(1)
lockBinding("e", globalenv())
e
# [1] 2.718281828459

curve(sin, e^pi, -pi^-e)  # produces plot
z <- as.matrix(locator()) # select points on plot then right click and choose 'stop'

# locator() creates x,y matrix

z                       x                         y
# 1  1.906552221196785357193 -0.4129490482535249085139
# 2  4.856457177229445143496 -0.3680698252087759581030
# 3  7.107700433149108043551 -0.3643298899550468927799
...

plot(z)