It’s been a while…

I haven’t posted much lately since I have been pretty busy with a lot of things.  I recently joined the Mac world with the purchase of a MacBook Air 13″.  I needed something a little more portable, since my HP Laptop was a beast.  I also chose to make the move to Mac since I had heard that they were very nice to develop on, so this I had to checkout.

At first I was thinking of going with the MacBook Pro 15″, which is what I see a lot of my peers at the Denver Java Users Group and Denver Open Source Users Group using.  I was really leaning on the 15″ Pro until I heard about the brand new MacBook Air outperforming the 2010 MacBook Pros.  I thought I’d head down to the Apple Store and compare them side by side.  An hour of playing around on it and I was sold on the Air.

So the past three weeks I have been coming up to speed with learning OS X and all the subtleties of the hot-keys and navigation from that of the Windows platform, where I have been for years.  I do however have a bit of experience with Linux, so that helped tremendously with me coming up to speed.  It was also good timing that my purchase coincided with the release of Lion, so I am learning with everyone else.

 

Posted in General | Leave a comment

Automating R Scripts on Amazon EC2

Overview:

  • How to setup R on an EC2 instance of Ubuntu 11.04 (Natty Narwhal)
  • How to setup Apache Tomcat 6.0 web server and configuring it with basic authentication so that we can view our output from R on a password protected webpage
  • How to automate your R scripts to run as a daily cron job.

Lately, my new hobby of algorithmic stock trading has necessitated running nightly R scripts which take about an hour to complete.  Most of this time is spent on single-treaded web-scraping, which I could put into parallel to speed up the process, but this might surge those websites and have them block my IP address from getting all that free data.  I’m also hesitant to run something on my home pc in fear of random windows updates which could impact my program.  Another problem is random loss of connection from Comcast, so the running at home options was out.  I decide to turn to the cloud!

I’ve been playing around a lot with Amazon EC2 lately, and have been really happy with how powerful it is at such a reasonable cost.  It’s also nice to play around with the micro instances since Amazon has the AWS free usage tier where you can have up to 750 free hours per month of usage.  I decided that the micro instances would be the best space to run my nightly R jobs.  Since you can do pretty much anything with your instances, I thought it would be good to put a simple web server on there so I could view my results from anywhere.  This also allows me to have some basic authentication to my files, as to not give away my quant strategies.

Creating the EC2 instance

Login to the AWS management console.  I started by launching an instance of a community AMI, ami-1aad5273, Ubuntu 11.04 Natty EBS boot that is 64-bit.

For the Instance Details, you should choose an instance type of Micro, unless you have a need for it to be higher.  This will also keep you eligible for the AWS free usage tier.

For Advanced Instance Options, keep the defaults and click continue.

Next, enter in a descriptive name for this instance.

Enter in a name for your Key Pair. Press ‘Create & Download your Key Pair’.  Remember where you save this, as you will need it later.

Enter in a Security Group with ports 22 and 8080.  You will need port 22 for SSL access to the VM and you will need 8080 open for web access to Apache Tomcat.

Then press ‘Launch’ on the Review page.  Your instance will be ready to use in seconds.

Logging on to your EC2 instance

Logging on to your EC2 instance can be done several different ways and quite different for each environment.  Since I did this from a Windows machine, I posted instructions for how to SSH to your Amazon EC2 instance using a free tool called PuTTY.  For the AMI we are using, the login will be ‘ubuntu‘.

Installing R

Update your apt-get package list so you get the latest stable version for your OS.

sudo apt-get update

Install R using apt-get.  I like using the -y argument since it does not prompt you if you are sure you want to install.  At the time of writing this, apt-get was using R version 2.12.1 (2010-12-16).

sudo apt-get –y install r-base

At this point you need to see if R runs correctly by typing ‘R’ at the prompt.

To exit from R on command line linux, press <ctrl> d or type q().

Here is a good link about R on Ubuntu from UCSB if you would like more information.

Installing and configuring Apache Tomcat

Install Tomcat using apt-get.  I like using the -y argument since it does not prompt you if you are sure you want to install.  At the time of writing this, apt-get installed Tomcat 6.0.28-10.

sudo apt-get -y install tomcat6

Now you need to configure tomcat to allow browsing of the directories on the server.  You need to first edit the default servlet in the /etc/tomcat6/web.xml file to have ‘listings’ be ‘true’.

sudo vi /etc/tomcat6/web.xml
nested in the web.xml file at about line number 104:

    <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>true</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

Now create a directory that will be viewable in a web browser where you will save output from your R scripts.

cd /var/lib/tomcat6/webapps/ROOT
sudo mkdir testdir

Now check in a web browser to see if the directory is viewable.  You should see this and not a 404-Not found error.  Be sure you are using port 8080.  http://IPADDRESS:8080/testdir/

Now we need to setup the basic authentication.  We will need to edit the /etc/tomcat6/web.xml file again.  This time we will be adding a security-constraint and login-config (in bold below) under the default servlet in the file.

sudo vi /etc/tomcat6/web.xml
    <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>true</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

    <security-constraint>
        <web-resource-collection>
            <web-resource-name>R Test</web-resource-name>
            <url-pattern>/testdir/*</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <role-name>member</role-name>
        </auth-constraint>
        </security-constraint>

    <login-config>
        <auth-method>BASIC</auth-method>
        <realm-name>Secure Area</realm-name>
    </login-config>

Now we need to add users to the /etc/tomcat6/tomcat-users.xml file with the same role that we setup in the web.xml file.  We used a role called ‘member’ in web.xml, so we will need to use the same one in tomcat-users.xml.

sudo vi /etc/tomcat6/tomcat-users.xml
<tomcat-users>
    <user username="user" password="password" roles="member"/>
</tomcat-users>

Restart Tomcat for changes to take.

sudo /etc/init.d/tomcat6 restart

Now check to see if the authentication works.



You’re in!  Additional information about Apache Tomcat on Ubuntu 11.04 can be found here.

Running a batched R script

In you home directory, put the R file you want to run in batch mode.  I wrote this quick test program that saves the output to the Tomcat web directory with the date in the filename.

sudo vi TestBatch.R
# This is a test program to save file with today's date
# Author: Travis Nelson
#######################################################
setwd("/var/lib/tomcat6/webapps/ROOT/testdir")
filename <- "_output.txt"
filename <- paste(as.character(Sys.Date()), filename, sep="")
data <- paste("Output for ", as.character(Sys.Date()), sep="")
write(x=data,file=filename)

Test the batch program to see if it works.  If you do not use sudo, you will see that “Permission denied, Execution halted” in the TestBatch.Rout file, but nothing to the prompt.

sudo R CMD BATCH TestBatch.R

Check to see if the file correctly outputted to the web directory (/var/lib/tomcat6/webapps/ROOT/testdir/) or on your web browser.

Setting up a cron job to run your batch script

Now let’s automate the job to run Monday through Friday at 5:00am.  Since I am in Denver, this is GMT -7, so we will need to take this into account when setting up the cron job.

First, create the script file that will be used to call the R batch command.  I just named it test.sh with the contents:

sudo R CMD BATCH TestBatch.R

Then change the permissions for the file so that it can be executed.

sudo chmod 750 test.sh

Verify that running the script will update the timestamp of your R output file in your output directory.  Here you see that the timestamp changes for this file so we know the script is working.

$ ls -l /var/lib/tomcat6/webapps/ROOT/testdir/2011-06-05_output.txt
-rw-r--r-- 1 root root 22 2011-06-05 08:55 /var/lib/tomcat6/webapps/ROOT/testdir/2011-06-05_output.txt
$ sudo /home/ubuntu/test.sh
$ ls -l /var/lib/tomcat6/webapps/ROOT/testdir/2011-06-05_output.txt
-rw-r--r-- 1 root root 22 2011-06-05 09:19 /var/lib/tomcat6/webapps/ROOT/testdir/2011-06-05_output.txt

Add a crontab for your job:

sudo crontab -u ubuntu -e

This will bring up a editor where you will need to add this line to the bottom of the file.

0 11 * * 1-5 sudo /home/ubuntu/test.sh

Save the file and you should be good to go.  For testing this, you might want to try for the hourly option of “55 * * * * sudo /home/ubuntu/test.sh”, where 55 is the number of minutes after the hour to see if it is running correctly.  Also, if you need additional help, I found this site with some helpful information about the crontab options.

I hope this tutorial is helpful and please leave me comments/questions.

 

 

Posted in Cloud, Computational Finance, R | 15 Comments

EC2 Micro instance of RStudio

I wanted to see see if I could setup RStudio on a micro instance on Amazon EC2.  I thought it would be nice to have my own instance running in the cloud and why not use AWS free usage tier to host it.

The first time I tried to create a micro EC2 instance with RStudio, I ran into some problems because I tried it with a “Quick Start” instance (Basic 64-bit Amazon Linux AMI 2011.02.1 Beta – AMI Id: ami-8e1fece7) which RStudio does not test and certify that their platform works correctly for.  From their documentation:

Note that while it is likely that RStudio will correctly compile and install on your target platform the only platforms currently tested and certified are Ubuntu and RedHat/CentOS.

I decided to go with a micro instance of Ubuntu 11.04 (ami-1aad5273 – Ubuntu 11.04 Natty EBS boot – 64-bit) with R version 2.12.1 (2010-12-16) that I loaded using apt-get.

Creating the EC2 instance

Press ‘Launch Instance’ button

Select “Community AMIs”, aand search for ami-1aad5273

 

Select ‘Micro’ instance type, since the AWS free usage tier is for micro instances only.

 

Use the defaults on the ‘Advanced Instance Options’.

Enter a name for the instance so you know what it is from your AWS console.

 

Enter in a Key Pair. If you need help with this, see my previous posting on how to do this.

Enter in a Security Group with ports 22, 80, and 8787.  You will need 8787 open for RStudio, and will need 80 open if you setup a proxy (highly suggested by RStudio).  If you need help with this, see my previous posting on how to do this.

Then launch instance.

 

Logging on to your EC2 instance

Logging on to you EC2 instance can be done several different ways and quite different for each environment.  Since I did this from a Windows machine, I posted instructions for how to SSH to your Amazon EC2 instance using a free tool called PuTTY.  For the AMI we are using, the login will be ‘ubuntu‘.

 

Installing R and RStudio

I found the instructions to download RStudio from their server download page which also references the easiest way to download R using apt-get.

Install R on the instance

sudo apt-get install r-base

 

Add new user.  This will be your login to RStudio.  I chose ‘travis’, but you might want to change it…unless you really like my name.

sudo adduser travis


Install RStudio and start it.

wget https://s3.amazonaws.com/rstudio-server/rstudio-server-0.93.89-amd64.deb
sudo dpkg -i rstudio-server-0.93.89-amd64.deb

Check to see if you can login from a browser using your the server address and port 8787.  For this instance, my address was http://ec2-50-16-128-37.compute-1.amazonaws.com:8787

 

Now see if you can play around with your own personal cloud instance of RStudio

 

 

Running with a Proxy

Since I am running RStudio on a public network, the RStudio guys strongly recommended that I deploy RStudio behind another web server.  They said that this will greatly improve performance and security, and have clear instructions in their documentation.

 

Posted in Cloud, R | Leave a comment

Build instructions for R on Amazon EC2

In this post, I will show:

- How to create an Amazon EC2 micro instance

- How to login to the EC2 instance using PuTTY

- How to install the R source and build it.

- Use R in the cloud!

 

Creating your EC2 instance

Go to the ASW console, Instances.

Press ‘Launch Instance’ button

Select ‘Basic 64-bit Amazon Linux AMI 2011.02.1 Beta (AMI Id: ami-8e1fece7)’

Build details: Basic 64-bit Amazon Linux AMI 2011.02.1 Beta (AMI Id: ami-8e1fece7)
Amazon Linux AMI Base 2011.02.1, EBS boot, 64-bit architecture with Amazon EC2 AMI Tools.
Root Device Size: 8 GB

 

Select ‘Micro’ instance type, since the AWS free usage tier is for micro instances only.

 

Use the defaults on the ‘Advanced Instance Options’.

 

Put in a descriptive name of what the instance is going to be used for.  I made the mistake of not doing this with my first instance, which I have no idea what they contain now.

 

Create a new key (unless you want to use an existing key) and download it.  It will save the file newkey.pem

 

Create a new Security Group with port 22 open.  Port 22 is for the SSH access.

 

Then launch instance.

 

Logging on to your EC2 instance

Logging on to you EC2 instance can be done several different ways and quite different for each environment.  Since I did this from a Windows machine, I posted instructions for how to SSH to your Amazon EC2 instance using a free tool called PuTTY.  For the AMI we are using, the login will be ‘ec2-user‘.

 

Building R from the source code

cd /opt
sudo mkdir R
cd R
sudo wget http://cran.at.r-project.org/src/base/R-2/R-2.13.0.tar.gz
sudo tar -xvf R-2.13.0.tar.gz
sudo rm -f R-2.13.0.tar.gz

Now we will install gcc, gcc-c++, gcc-gfortran, readline-devel, and make.  Whenever you are prompted with “Is this ok [y/N]:” enter y and press enter

sudo yum install gcc
sudo yum install gcc-c++
sudo yum install gcc-gfortran
sudo yum install readline-devel
sudo yum install make
cd R-2.13.0/

Since we are in a VE, we will need to turn off the X Window System using the command line argument –with-x=no

sudo ./configure --with-x=no
sudo make

 

The make will take about 30 minutes to complete.

export PATH=$PATH:/opt/R/R-2.13.0/bin

 

Check to see if R is working with:

R --version

Type ‘R’ to start the R console.  To exit the console, press ctrl+d.

R in the cloud!

Posted in Cloud, R | 2 Comments

Accessing your EC2 instance from Windows using PuTTY

Logging into your EC2 instance from Windows can be a little tricky.  Here is a set of instructions I put together for how to do this using the open source SSH tool called PuTTY.

You will need to download the following PuTTY binaries for windows: putty.exe and puttygen.exe

Open puttygen and go to Conversions, Import Key, to import your newkey.pem that you created in your AWS management console.

Save private key with or without a passphrase (whatever you prefer) as newkey.ppk

Open PuTTY and navigate to Connections, SSH, Auth.  Point to where you put newkey.ppk

 

For Host Name (or IP Address) in the Session category, add the public dns (obtained from the instance details in AWS console) or Elastic IP address if you have that setup for this instance.  You can add an Elastic IP in the AWS console, but if you leave it not associated to an instance, you will get charged $.01/hour (see Elastic IP Addresses).  I have been burned with a couple of times, so just an FYI.

 

Press ‘Yes’ when this screen appears since it is your first time logging into this server.

 

When the terminal appears, you can login using the default user for that instance.  The Amazon Linux AMI that I used below has ‘ec2-user’ for the default user.  Ubuntu images I have used, have the default user as ‘ubuntu’.  You might need to do some research for the image you chose. 

You are in!

 

 

Posted in Cloud | 4 Comments

Fun with twitteR: Osama bin Laden tweets

I thought it would be fun to play around with the R package twitteR , an R API into Twitter.  I decided to take the most prominent news story of the past few days, Osama bin Laden’s death, to see the progression of tweets as news spread.  I quickly found that the max results for the searchTwitter() function were 1500 per day since this is such a big story right now.  I decided to narrow the results by location using the geocode parameter to look at how people in Denver were tweeting.  I used the Colorado State Capitol Building (39.739567,-104.984794) with a radius of 6 miles.  I am still maxing out the results per day, but it looks like a better representation than at a 5 mile radius.  I’m still new to this API so maybe with a little tweaking, I can improve the results.

 

 

 

library(twitteR)
#Colorado State Capitol Building
geoLocation = '39.739567,-104.984794,6mi'
tweets <- searchTwitter("Osama Bin Laden", n=1500,
     geocode=geoLocation, since='2011-04-30', until='2011-05-01')
tweets <- rbind(tweets, searchTwitter("Osama Bin Laden", n=1500,
     geocode=geoLocation, since='2011-05-01', until='2011-05-02'))
tweets <- rbind(tweets, searchTwitter("Osama Bin Laden", n=1500,
     geocode=geoLocation, since='2011-05-02', until='2011-05-03'))
tweets <- rbind(tweets, searchTwitter("Osama Bin Laden", n=1500,
     geocode=geoLocation, since='2011-05-03'))
times <- sapply(tweets, function(x) format(x$created,
     "%m-%d %H:00",tz = "America/Denver"))
users <- sapply(tweets, function(x) x$screenName)
times <- times[!duplicated(users)] #removing duplicate
counts <- table(times)
bp <- barplot(counts, main="Counts of 'Osama Bin Laden' Tweets by
     Hour\nwithin 6 miles of Colorado State Capitol Building",
     col="lightblue", border=NA, ylim=c(0,300),las=3 )
lines(spline(counts ~ bp), lwd=3, lty="dashed", col="darkblue")

Created by Pretty R at inside-R.org

Posted in R | Leave a comment

R/Finance conference in Chicago – April 29, 2011 to April 30, 2011

This was my first year to attend the R/Finance conference that focuses on the use R programming in applied finance.  I was unable to get out there until mid-morning on Friday, so I missed Jeff Ryan’s tutorial on Automated Trading with R.  I guess he showed how to use the R package IBrokers (which he is the author and maintainer) to connect to Interactive Brokers Trader Workstation.  I’ve been using the Java API to Interactive Brokers, so I would have liked to see how it differs.

As for the rest of the conference where I was able to attend.  I thought there was a lot of really great content, a lot of it being way over my head in the applied math realm.  I did get a lot of ideas for things I would like to experiment with using R.  I really enjoyed the lightning talks that were limited to 10 minutes.  There was a lot of really great content packed in a short amount of time.  At the conference dinner at the Rivers Restaurant, I sat with the RStudio guys where I got to hear more about their amazing, open-source IDE.  I’m still using Eclipse/StatET, but I’m seriously considering switching now.

Here’s the agenda for the talks that were given:

Friday, April 29th, 2011
9:00 - 11:00 Optional Pre-Conference Tutorials
Ryan: Automated Trading with R
Yollin: High-Frequency Financial Data Analysis with R
Zivot: Financial Risk Models with R
12:15 - 12:30 Welcome and opening remarks
12:30 - 13:20 Faber: Global Tactical Investing
13:20 - 13:40 Boudt: Intraday Liquidity Dynamics Of The DJIA Around Price Jumps
13:40 - 14:00 Dunand-Chatellet: Mutually Exciting Hawkes Processes …
14:00 - 14:20 Kane: Evaluating the Effect of FINRA’s New Circuit Breaker Regulation
14:20 - 14:50 Break
14:50 - 15:40 Iacus: Statistical Analysis of Financial Time Series and Option Pricing in R
15:40 - 16:00 Switanek: The Impact of News Readability on Market Response Times
16:00 - 16:20 Break
16:20 - 16:40 Lewis: The betfair Package
16:40 - 17:00 Kumar: Carry Trades – Don’t Get Carried Away
17:00 - 17:30 Nelson: Beyond Vignettes: Dexy for Documenting R and More
Rothermich: Alt. Data Sources for Measuring Market Sentiment and Events
Long: The Segue Package for R
17:30 - 22:00 Conference Reception and optional Dinner (East Terrace and Rivers Restaurant)
Saturday, April 30th, 2011
8:00 - 9:00 Continental Breakfast
9:00 - 9:30 Rowe: A Beautiful Paradigm: Functional Programming in Finance
Ryan: High Performance Time Series in R: xtime, xts, and indexing
Peterson: Building and Testing Quantitative Strategy Models in R
9:30 - 9:50 Zivot: Factor Risk and Performance Attribution
9:50 - 10:10 Gramacy: Shrinkage Regression for Multivariate Inference \ldots
10:10 - 10:30 Break
10:30 - 10:50 Martin: Tail Risk Budgeting versus Modern Portfolio Theory
10:50 - 11:10 Niemenmaa: Benchmarking Parallel Loops Without Data Dependency in R
11:10 - 12:00 Bollinger: Yesterday, Today and Tomorrow: A Trip Through Computational Finance
12:00 - 13:30 Sponsor Lunch with presentations by Revolution, OneTick and RStudio
13:30 - 14:00 Teetor: Better Hedge Ratios
Ang: The Impact of Oil Prices on the Houston Housing Market and Economy
Yadav: Modeling Low Default Credit Portfolios in R
14:00 - 14:20 Wildi: Multivariate DFA
14:20 - 14:40 Matteson: Independent Component Analysis via Distance Covariance
14:40 - 15:00 Break
15:00 - 15:50 Kates: R and proto
15:50 - 16:10 Vermes: Stochastic Volatility Models Massively Parallel in R
16:10 - 16:30 Pfaff: Interfacing NEOS from R: The rneos Package
16:30 - 17:00 Horner: Rack: A Web Server Interface for R
Haynold:: RserveCLI: An Rserve Client Implementation for CLI/.NET
North: Repast Simphony
17:00 - 17:15 Closing remarks
Posted in Computational Finance, R | 3 Comments

Creating this blog

 

I wanted to create this blog in a space that I had complete control over, so I felt this would be a great opportunity to play around with Amazon EC2.

I started off by looking at all the available blogging software stacks and narrowed it to the two most prominent ones, WordPress and Blogger.  I know several people on both platforms but I have seen a shift of people moving to WordPressfrom Blogger since they don’t upgrade their service.  I thought a good metric for choosing which software stack to use would be to perform a google search for “moving from wordpress to blogger” vs “moving from blogger to wordpress”.  The results were 3,560 results to 53,800 results respectively.  I felt this a fairly good indicator for what platform to move forward with, so I chose WordPress.

I found an excellent posting for how to setup an EC2 linux instance for binami wordpress at A Technical Discourse’s site.  I followed the instructions fairly closely, except I went for a newer image (ami-3c0af955) of bitnami-wordpress than the one they used (ami-e0d62389).

I also didn’t follow his instructions for removing /wordpress from the URL, but instead took the advise of some of the people that commented on his page.  DocumentRoot needs to be /opt/bitnami/apps/wordpress/htdocs instead of what he had.  Here is what I did:

$ sudo chmod 777 /opt/bitnami/apache2/conf/httpd.conf
$ vi /opt/bitnami/apache2/conf/httpd.conf

changed DocumentRoot to be:  DocumentRoot “/opt/bitnami/apps/wordpress/htdocs”

$ sudo chmod 544 /opt/bitnami/apache2/conf/httpd.conf
$ sudo apachectl -k restart

 

I also changed the Site address (URL) in General Settings to not have /wordpress.

 

It is also useful to know that the default bitnami-wordpress login is:

username: user
password: bitnami

It might be a good idea to change this to avoid unauthorized access.

 

Additional infomation can be found at BitNami’s Cloud / Amazon EC2 FAQ.

Posted in Cloud | Leave a comment

A little late to the game…

I’m a bit behind the curve with starting a blog, but I thought it was better late than never.  I wanted to have a something out there were I could showcase some of the projects I have been working on and get some feedback from the community.  I haven’t quite decided if I will use this for just software development, or if I will have it be for all interesting facets of my life.  I guess we will see.

Posted in General | Leave a comment