Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Monthly Archives: April 2009

David Stevenson

GoGaRuCo '09 – Using ruby to fight AIDS – Jacqui Maher

David Stevenson
Saturday, April 18, 2009

Using ruby to fight AIDS – Jacqui Maher

Links:

  • Github: baobabhealth
  • Twitter: baobabhealth
  • IRC: irc://irc.freenode.net/baobab
  • Website: http://baobabhealth.org/
  • Blog: http://www.baobabhealth.org/feed/

Baobab Health

Malawi based non-profit organization founded in 2000.

Baobab is a tree found throughout Africa and Australia. Local legend says the hyena that was given the baobab tree during the creation time planted it upside-down.

Baobab presented at RailsConf Europe in 2007. They knew of her interest in epidemiology, programming, Africa. She subsequently flew to africa, and visited the Kamuzu central hospital in Malawi.

She got to know the guys working there and what they do. Jeff Rafter was the main contact.

The main focus of Baobab is AIDS

GoGaRuCo '09 - Jacqui Maher

AIDS in Africa

  • 6.7 million
  • 33 million 2 millions AIDS related deaths last year
  • 1.5 million AIDS related deaths last year
  • 1.9 mill new HIV infections last year
  • 5% of adults

What does that mean? Africa post-colonial was on the upswing, but the AIDS epidemic took a giant toll, lowering the life expectancy from about 60 to almost 40 years old!

AIDS Impact:

  • Lowered life expectancy
  • Children orphaned
  • Economic impacts

Malawi is a land-locked country in sub-Saharan Africa, with the 2nd fastest growing economy in Africa.

In 2002 a major famine, a major contributing factor to the deaths was AIDS.

  • population 14 million
  • 84000 deaths per year
  • 250 new inffections daily
  • 8 people die per hour from AIDS, leaving 1.5 million children orphaned.
  • 280 doctors only
  • 3500 HIV patients per doctor!
    • long lines
    • people leave
    • complex registration form
    • incorrect or missing data
    • incorrect treatment

What can be done?

GoGaRuCo '09 - Jacqui Maher

  • more verifiable data
  • accssible data (faster/shared)

Solution: Digitize important data:

  • portable hardware
  • touch screen laptops
  • software: easy to use, validation, treatment protocols
  • network connectivity: between clinics & the internet
  • power: power outages happen often (several times per day), some places have generators/batteries
  • collaboration: between clinics and organizations
  • authority: your solution must be recognized, trusted, and respected

Baobab’s Solution

  • save lives by improving patient treatment
  • computerized data entry + retrieval
  • portable work stations
  • system based treatment protocols

Hardware (known as the I-Opener) is portable tablets with 56k modem. It bombed in US and Europe, but they got a bunch, and have hacked them to have Ethernet, Power-over-ethernet, Touchscreen, and a Bar code scanner.

Government has instituted a national health id as a barcode to help facilitate treatement. If you plug in a bar code scanner, you can read their data without even typing their name.

Bought I-Openers off of Ebay from the USA, the owner of which eventually donated 2000 units. Set up a wireless mesh network, which is ad-hoc node-base routing. It’s also self healing – if one of the nodes is down, you just skip right over it (very good for frequent and sporadic power outages). Power was provided by rechargeable batteries which can be used when the power goes out.

Software

  • Ubuntu Linux servers
  • Ruby On Rails
  • MySQL
  • Custom systems monitoring library

BART: Baobab Anti-Retroviral Treatment

  • Data model: OpenMRS (Medical Research System)
  • Templating using ERB
  • Applcation calls via AJAX
  • Testing with Rspec
  • Reporting

The data model was complex.

The system as a whole accomplishes the following goals:

  • Patient registration
  • Encounters
  • Observations
  • Perscriptions

Registration

  • enter a new patient data
  • generate national id bar code
  • scan an existing bar code

Encounters

  • interactions with patients
  • forms

Observations

  • diagnoses
  • disease progression
  • vital stats
  • patience compliance
  • regimen progress

Perscriptions

  • Drugs
  • Drug ingredients
  • Dosage and formulas
  • Inventory
  • Orders

When you are making $2 per day, you cannot afford a pill that costs $100.

Cool, so we’re done, right?

  • working on refactoring for reliability
  • Lots of tests in Rspec, but they are fighting on many fronts.

Barries to contribution

Presented at RailsConf in berlin, but there was not response, because they were not set up for people around the world to contribute:

  • No public repository (SVN)
  • No reliable internet access
  • Patient data security
  • Feature & Infrastructure development schedule

What did they do?

  • Github – baobabhealth
  • IRC – irc://irc.freenode.net/baobab
  • Twitter – baobabhealth
  • Employee Blogs – http://www.baobabhealth.org/feed/
  • Website – http://baobabhealth.org/

Benefits of using

  • More people see doctors
  • Application contraints
    • validation
    • workflow guidance
  • Easy to use interface: More people can help
  • Gem the Janitor even learned to register patients (system is so easy to learn)
  • Data collection enables extensive reporting
  • International agencies can make decisions stategically based on this data
  • Comparative Oberservations

Results

  • experiment was a success
  • electronic patient administration is possible even in the developing world

  • and it’s better than the typical first world paper records

  • and you can accomplish it using new state-of-the-art technology

Impact locally:

  • creates a local development community
  • inspire kids to program
  • training in associated technologies

Ruby Community

  • community consensus on best practices
  • actively contribute to OSS
  • accessible info on full stack
  • superb interactive tutorials (like peepcode)

Why ruby?

  • Elegant & readable
  • Easier to learn offline
  • Self contained documentation
  • ActiveRecord: complex data models easier
  • Execute SQL directory for more complex queries

Innovation

  • urgent need for solutions
  • old-school patient admin doesn’t work amidst an epidemic
  • no existing infrastructure
  • getting basic tools often requires thinking ouside the box
  • alternative: death & disorder

If you have no existing infrastructure, you might as well start with the latest and greatest thing!

Questions

Q: Is the mesh network the same as the OLPC mesh network?
A: As far as I know, no. It is local to Malawi. It is an infrastructure mesh, not laptop to laptop.

Q: How widely is it deployed? Of the 280 doctors?
A: About 265 of them, so almost all. It has plans to go outside Malawi.

Q: How is Baobab involved with education and prevention of AIDS?
A: Baobab’s main focus is to deal with doctors and patients, not directly involved in prevention and education which is done by other groups.

Q: How are the african engineers learning about ruby and rails?
A: Some of them had no programming experience whatsoever, others knew .NET or PHP. They learned everything from scratch with peepcode and other tutorials. One of the best contributions we can make is to publish information on these best practices.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Chad Woolley

GoGaRuCo '09 – Using Shoes to create better iPhone apps – Tim Elliott

Chad Woolley
Saturday, April 18, 2009

Using Shoes to Have Fun – Tim Elliott

Intro

He rode his bicycle from Chico. He works at Trevidia, and is a Rails developer

Links:

  • The Shoes website – To get and learn about Shoes
  • The Shoe Box – A collection of user apps written in Shoes
  • Nobody Knows Shoes (comic book)

GoGaRuCo '09 - Tim Elliott

Shoes

The highlight is the fun that he has found by programming in Shoes. This was originally a talk about iPhone, but he wanted to put more fun into it.

why the lucky stiff wrote a comic about Shoes. In 2003, why wrote a blog post called “The Little Coder’s Predicament”. It is a call to arms for all programmers, beginning and expert, to have more fun.

Such as the old Commodore 64 program, written in BASIC:
10 PRINT “TIM RULES”
20 GOTO 10

You could do really fun things really easily, even if you didn’t own a C64!

However, Ruby is harder for kids (especially on windows) to get started with:

  • Ruby one click installer
  • RubyGems
  • Sqlite

    rails cat_app
    script/generate
    etc…

That lets you “make a cat”, but not too exciting for kids. Involves too many other languages (HTML, CSS, Javascript). They want cats to jump around the screen and do cool stuff.

Shoes

  • One installer
  • draw and animate

Shoes is not a Gem

Couldn’t use a standard ruby distro, had to install a new one, but this is because shoes is not geared towards developers, but people who are installing for the first time.

GoGaRuCo '09 - Tim Elliott

Shoes is a GUI toolkit that embeds Ruby. It includes a packager and a few gems.

Very flexible but understandable layout engine.

Example of Shoes:
Shoes.app do
stack do
para 'wanna click a button?'
button('sure') { alert 'woot' }
end
end

“Stacks” and “Flows”. You can do simple or complex layouts using these two principles.

Flows act like left floated html elements.

Everything is wrapped in a Shoes.app block, and does a class eval.

You can put your own classes outside the app block and use them inside, but there’s a gotcha. Once you use classes from outside the app block, the class eval doesn’t work, but there is a workaround.

Shoes keeps an array in the form of an in-memory stack that remembers everything. So when you start putting controls in, Shoes knows where to put them. it is always tracking which container you are in, so you can get an idea of where everything shows up.

Also watch out for long-running tasks at the OS level, this can kill the app performance (due to green threads).

Demo of Shoes:

Drawing some ovals

Shoes online manual

Shoes itself comes with online documentation when you install it, which has a nice search tool and examples.

Shoes is also good for rapid prototyping, such as desktop apps or iPhone apps. The advantage is that you get to do it all in Ruby.

Sharing with friends

It is easy to share. If you are running shoes, you can use a packager that comes with it to create an executable installer which runs on Windows, Linux, or OSX.

It will be small, a few meg, and still under ten even if you use video.

You can use ruby-debug to interactively debug.

`Shoes.setup do
gem ‘ruby-debug’
end

require ‘ruby-debug’

Shoes.app do
debugger
end
`
Shoes has a gem installer gui and will install gems includes in the Shoes setup block, this gets included in the package.

It includes native HTTP download libraries for all platforms, because ruby http lib is slow and for other issues.

In the git distro, it includes “bloopsiphone” which lets you create Atari-like sounds and noises, and it also has an easy API.

Be creative and have fun, make robot apps, make the robots eat each other. This is a great way to connect your passion with non-programmers, because everyone likes robots eating other robots.

Q: You mentioned development for the iPhone can you go into more detail?
A: You can’t run ruby on the iPhone yet but its still useful for prototyping.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Pivotal Labs

GoGaRuCo '09 – CloudKit: Hacking the Open Stack with Ruby and Rack – Jon Crosby

Pivotal Labs
Saturday, April 18, 2009

CloudKit: Hacking the Open Stack with Ruby and Rack – Jon Crosby

Intro

Thanks for the votes, his talk is here because of GoGaRuCo attendee votes.

He works for Engineyard, and they are hiring.

This talk will be “lightning-talk” style, so that means it will be very fast (and also means this live-blog will be pretty sparse)

GoGaRuCo '09 - Jon Crosby

Cloudkit

Cloud Kit is an Open Web JSON Appliance
Can quickly and easily spin up an API for
RESTful Collections of JSON Documents

Similiar to CouchDB and Perservere
Implemented in Ruby (unlike CouchDB)…

Now Frameworks are basically another MVC framework

So why wouldn’t you want to do a new MVC architecture?

gem install cloudkit

Radar

“If your RESTFUL API cannot be accessed with curl, you lose”

Resource Composition in the Browser

If you have two widgets in the browser doing different tasks, you can point them at different resources.
Example: 280Slides
Example: SproutCore

Mobile apps can benefit from this style of restful architecture as well.

ESI caching layers – like Old Skool SSI, except that they are cache includes.

Cloudkit is built on Rack. Rack is awesome.

HTTP Intermediaries – such as Rack Middleware.
Rack Is The Web
The spec for rack middleware is runnable and readable

Build an App! create config.ru
require ‘cloudkit’
expose :todos, :profiles

Cloudkit bootstraps so you can query it
You can ask it for it’s Options and it’ll tell you what you can do with it

Hypermedia as the Engine of Application State

Cloudkit is read-optimized

No SQL, no ORM, uses Tokyo Cabinet Tables instead

Schema Free, HTTP and JSON are the schema

Can do a PUT to place a new record at a specific location

Can do POST to update. By supplying the version etag the server can solve the “lost update” problem

Auto-versioning, any time you update a resource, the previous version is archived. That’s reflected in the url – :collection/:version. This is solves the last-update problem when 2 users update the same document at once. If you try to update a resource without providing the version, it will return 400 bad request. If two clients try to update the same version, the seconds get 412 precondition mismatch response.

Cloudkit also solves the batch GET problem, where you can access the resource with id “_resolved” to get multiple documents at once (and their complete contents).

Finally, with DELETE, you can’t delete things that out of date, similar to update. The 410 Gone response will get returned in this case.

“Rewrite in Scala… or solve the problem”

What’s missing?
The ability to ask questions
Pagination
Querying – solved with JSONQuery. (/todos[0:10][?priority=3])

jQuery plugin for Cloudkit

All code is up at Jon’s Github

Because it’s OpenWeb, you can easily add OAuth, OpenID, etc. A desktop application might use OAuth, whereas a web application could use OpenID for authentication.

Q: Isn’t querying slow?
A: Yeah, it can be slow. There’s indexing work that needs to be done on write to optimize read. Tokyo Cabinet might come to the rescue here about searching data with regular expressions.

Q: Are there real world apps using cloudkit?
A: Not that I know of. One company might be trying it.

Q: What kind of apps are good for cloudkit?
A: I’m personally using it for Actiontastic, a synchronizing web service that provides a REST interface.

Q: Are there plans to abstract away the key/value storage system so other systems can be used?
A: Yehuda has a library called Moneta that’s an abstraction for Key/value stores that I’d like to move to.

Q: How does CouchDB map/reduce company to cloudkit’s JSONQuery?
A: It first started as a Sinatra app that sat between couchDB, but I found JSONQuery to be better suited.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Chad Woolley

GoGaRuCo '09 – Meta Meta – LiveBlogging the LiveBlogging – Coda/SubEtha

Chad Woolley
Saturday, April 18, 2009

For the second day of GoGaRuCo, my fellow Pivots David Stevenson, Zach Brock, and Ryan Dy are helping out with the live-blogging duties (Tom Sawyer says live blogging is SOO FUN!).

We are ALL writing the blog posts collaboratively, using the Coda editor which is based on the SubEthaEngine:

GoGaRuCo '09 - MetaMeta - LiveBlogging the LiveBlogging - Coda/SubEtha

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Pivotal Labs

GoGaRuCo '09 -Hypertable and Rails: DB Scaling Solutions with HyperRecord – Josh Tyler & Rusty Burchfield

Pivotal Labs
Saturday, April 18, 2009

Intro

Hypertable and Rails: DB Scaling Solutions with HyperRecord

Links:
Hypertable
HyperRecord

Rusty is from Zvents, a local search engine

Presentation

Showing example of hourly data for the last month for a single event

GoGaRuCo '09 - Rusty Burchfield

Old benchmark was over 1M rows inserted per second sustained

Hypertable is an open-source implementation of Google’s BigTable.

Hypertable is a Column-Oriented DBMS

Data Model
5-part key:
Row Key
Column Family
Column Qualifier
Timestamp
Revision

One index per table (on the row key)
Only stores strings

Architecture
Master server – tracks range servers and where data is stored (spare master is also usually run, as it’s a single point of failure)
Range servers – data is broken up into individual range servers
Hyperspace – Handles locking and master recovery
HDFS – Stores redundant copies of data

GoGaRuCo '09 - Rusty Burchfield

ThriftBroker – An RPC wrapper for Hypertable for many languages using the Thrift Wrapper

HyperRecord

HyperRecord is a subclass of ActiveRecord for Hypertable
Supported by the Hypertable

Example
Loading data into simple pages app
Loading first 10,000 articles of wikipedia
150MB of data infiled in 14 seconds
Loads all the data into a rails scaffold and browses it

Design considerations
Denormalization – can’t do joins so you have to put your data in an appropriate format for querying. Can use MapReduce to interact with data.
Column families/qualifiers – You can store data in the key part of the key value pair
Revisions – deletes are represented as inserted delete cells

Questions

Q: How do you break down data by hours in example
A: Broken down by Ruby and aggregated

Q: It looks like the keys in that list were strings, not timestamps, did you have to take the timestamp and convert it to a string yourself?
A: Pretty much

Q: Did the wikipedia articles contain any of the sub-data like images, links, etc?
A: No, just a sql dump as a demo of querying the database through a rails scaffold

Q: Does hypertable select support SQL limits, order, etc?
A: HQL supports a lot of things you’d expect from SQL, but it’s still somewhat limited.

Q: What do you do with it?
A: We store all of our log data and process it using Cascading to gather hourly data for all our pages. We then put it in Hypertable so we can query it quickly to generate reports.

Rusty:
Cascading is Java code
You can easily construct complicated MapReduce jobs using it

Josh:
Some other uses of Hypertable at Zvents
Changelog
We deal with a lot of user created content, and things change often and we don’t always know what
We log everything that ever happens to our data so that we can track everything that happens to our data. From uploaded images to deleted links to edited descriptions, we can see what changed, when and how.

Zvents and Baidu are the primary sponsors of the Hypertable project. Hypertable and HyperRecord are both on Github.

Hypertable development started 2 years ago as a forward looking solution to analytics problems.

The search problem for Zvents is many dimensional: Time, Location, Description, User Data and User Behavior and Hypertable is a way to inform a lot of that data.

Q: What kind of problems are well suited to HyperTable
A: We’re trying to move our entire site over. A canonical example for this kind of database is a crawl database.
A2: Anything where you have mountains and mountains of data and want to query over it.

Example of Crawl Database stored in Hypertable.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Chad Woolley

GoGaRuCo '09 – Josh Susser and Leah Silber

Chad Woolley
Saturday, April 18, 2009

Conference Organizers Extraordinaire!

GoGaRuCo '09 - Josh Susser and Leah Silber

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Edward Hieatt

GoGaRuCo

Edward Hieatt
Saturday, April 18, 2009

We had a great time at GoGaRuCo yesterday. If you don’t know already, we’re live-blogging the conference at pivotallabs.com/gogaruco/blog. Follow along again today as we continue documenting the conference!

GoGaRuCo is being held in the Swedish American hall in San Francisco. It’s a great venue. Here we are demoing Tracker at lunchtime yesterday.

GoGaRuCo 09 - Ruby People!

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

GoGaRuCo '09 – Magic scaling sprinkles – Nick Kalen

David Stevenson
Saturday, April 18, 2009

Magic Scaling Sprinkles

GoGaRuCo '09 - Nick Kallen

Nick works at Twitter and is part of the team that makes it scale.

With respect to the title, magic sprinkles are the fundamentals of computer science, irrespective of languages. Nick is disappointed when people point to simply a technology to solve their scaling concerns, like erlang.

Scaling comes down to 3 main issues, which are really compute science issues:

  • Distribution
  • Balance
  • Locality

First, nick builds a very simple echo network service, called the JokeServer. You connect to it and it echos the input. Then he does a simple load test on it, using a custom TCP benchmarking system based on apache benchmark On first try, it shows 8450 req/sec, but admittedly it does almost nothing.

If you have a simple single threaded worker that completes 2 jobs/sec, the throughput is 2 req/sec and the latency is 0.5 sec/req. Things get more interesting when we add more threads, where the latency stays approximately the same, but the throughput goes up. Latency is an efficiency question, and throughput is a scalability question.

He then modifies the JokeServer, adding the following code:

10000.times { Time.now }
sleep rand

These contrieved inefficiencies are representative of two kinds of work that application servers usually perform. The first uses a lot of system calls and object creation, and the second blocks on some I/O for a specific time.

When he reruns the benchmark, the throughput drops to around 1 req/sec. If thousands of users need to use this service at once, Nick asks the basic question: “How many of these can we run per machine, and how many per core?”

Distribution

To answer this question, he adds Statisaurus to collect statistics from the critical section of server code. It outputs timestamps, transaction ID, and 3 measurements of ellapsed time (wall clock, system time, user time). First, he points out that the wall clock != (system time + user time). The excess is refered to as “stall time”, and it can be caused by waiting on I/O or by context switching. Context switching is very expensive at the CPU level, so the goal here is to find the optimal number of processes per core.

Suppose a worker takes 0.5 sec of CPU time and 0.5 sec of “stall time”. What is the optimal scheduling for 2 workers on 1 core? It’s obviously to run process1′s CPU and process2′s stall, then vica versa. That sort of optimal scheduling is what we’re hoping to achieve in the real world by controlling the # of processes we run. In general, you want to take the wall clock time and divide it by the CPU time, which will yield the number of processes to run per core. In Nick’s example JokeServer, he shows about 6 processes per core. Since he has 2 cores, he’s going to run 10 processes total (some room for error).

What distribution strategy should we use to divide client requests amount our processes? We can try out a simple TCP proxy to divide requests between our many workers. The proxy introduces a point of failure, but is completely transparent to the server and the client. Another model is the DNS model, where the client talks to a “nameserver”, which gives the client the address of an available worker. The client can then talk directly to the server, removing the extra proxy latency. In a third model, the client uses a distributed hash table (like memcached) to determine which server to communicate with directly. The advantages are obvious, but the disadvantages can be logistical nightmares.

In his demo, Nick is going to build a simple proxy. By adding a proxy, it’s another part of giant moving system. To keep things from becoming impossible to debug, Nick suggests that you use logging with transaction ids. The proxy will generate a transaction GUID, then pass it to the backend, where it uses this ID in its logs as well (great for debugging and correlating requests).

Balancing

First, he demonstrates the “random” strategy. With a concurrency level of 10, we can now get a throughput of 4 req/sec. It’s better than non-concurrent, but random is clearly inefficient.

Next, he tries round-robin, where we load balance sequentially across each worker in order. This sound really good, but it assumes that each job takes exactly the same time. With a concurrency level of 10 req/sec, we get 7 req/sec.

Last, he points out that the jobs are not of the same duration, so round robin is not a good strategy. Instead, we try a “least busy” strategy, where the proxy forwards the request to the worker with the least # of currently open connections. With concurrency of 10 req/sec, the throughput jumps to about 8 req/sec.

Locality

By introducing memoization into the JokeServer, we never do the same work twice. This is where caching comes into play, and can reduce the response time tremendously. Nick then adds a primitive cache to the joke server, that can only store 2 cached values (to simulate resource starvation). Since there are 10 total values, we expect roughly 20%-30% cache hit ratio. When tested, that value is achieved more or less. We’d prefer to get 99-100%, of course. By associating a certain class of request with a certain servers, we can achieve that goal. This takes advantage of locality (such as writing consecutive hard disk blocks is faster than random block).

To try this out, Nick uses a sticky proxy strategy. Similar requests are funneled to the same server consistently. Our throughput jumps to several hundred req/sec as our cache hit ratio gets close to 100%.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Chad Woolley

Writing Fast Ruby: Learning from Merb & Rails 3 – Carl Lerche

Chad Woolley
Saturday, April 18, 2009

Intro

Works for Engineyard. Getting to Rails 2.3, will be at RailsConf. Engineyard is hiring.

GoGaRuCo '09 - Carl Lerche

Does Ruby Scale?

Yes. So does LOLCODE.

Scaling != Speed

Is Ruby Fast?

Rub/Jruby around 23-30 in the Great Language Shootout.

In reality, Ruby is fast enough for the vast majority of use cases. Odds are slow code is your fault.

GoGaRuCo '09 - Carl Lerche

How do you write fast code?

Step 1. Write Slow Code

Don’t worry about performance the first time around. Odds are you don’t know what will be slow. Just write the codebase

Step 2. Use Science

Don’t stab in the dark. Use the scientific method. It is the most important tool science gives us, we should use it.

Scientific Method

Step 1. Define the Question (It needs to be specific)

Step 2. Gather Information

Step 3. Form a Hypothesis

Step 4. Perform experiment and collect data

Step 5. Analyze interpret

Step 6. Publish results and retest

Scientific Method reworded for code

  1. Why is my code so slow?
  2. Where is the time/memory being spent?
  3. Why is the chunk of code slow / a memory hog?
  4. Change code. Collect before/after metrics
  5. Compare metrics
  6. Deploy

Defining the Question

“My app feels slow” is not specific enough.

“Why is action X taking more than 100ms on average?” is a better question.

“Why is 60% of the merb dispatch cycle in content negotiation?” is a good one too.

“Why are my Mongrel processes growing to 300MB of memory?” (Gets laughs)

Our Scenario and Question

QUESTION: “Is route generation as fast in rack-router as it is in Merb and Rails?”

Gather Information

Tools: Rbench, ruby-prof/kcachegrind, EXPLAIN ANALYZE, log files, New Relic/Fiveruns, memory_usage_logger, bleak_house

He shows some data from the benchmarks, comparing rack-router and merb routing, and rails routing. This answers the question that Merb route generation is NOT as fast.

So, rephrase the question. How do you make it fast? Benchmarks aren’t good for this. He used Rubyprof which provides a lot of ways to set up test and output data. He uses the “call graph” output, which he can open in kcachegrind. It is available via macports.

He then shows the graphical output of kcachegring. Top left shows the methods which take longest, lower left is call stack, showing aggregate (’incl’) and individual time (’self’) spent in methods.

Sort by “self”, and it turns out Array::map is the one taking most individual time. Most of the calls occur in Rack::Router::Condition::generate_from_segments. This is a good place to look and spend time trying to speed up.

Hypothesis

Most of this logic can be removed and moved somewhere else. You can check Git logs to see how he did it.

Perform Experiment, collect data, analyze, interpret

Rewrote the code, it was faster.

Publish and retest

Twitter and let everyone know about it.

More Examples

Shows more kcachegrind examples. Shows how it can show source code annotated with performance data.

Remember – you don’t have to go through all the steps or feel bad because you didn’t in the past, just keep them in mind to figure out quicker where things are happening.

The Garbage Collector

Conservative mark and sweep. Every time it runs, none of your ruby gets executed. The goal is to get the garbage collector to run as little as possible.

The way object allocation works is ruby boots and gives you 8 meg of memory. When your code runs, it allocates memory. If it can’t, it will run the garbage collector.

Avoid creating unnecessary objects

Don’t need to do this:

records.dup.values # records is a Hash

Use DataMapper’s identity map. It will not create a new object if it doesn’t need to. This will drastically reduce the number of objects created.

Beware of modifying Large Strings

Don’t do parse time operations

For example, slash at end of line to split up code

Beware of closures

Be lazy

“No code is faster than no code” – merb motto:

  • Cookie handling
  • memoize in the reader
  • Procs as method arguments (instead of just arguments)

Lambdas

Sexy, but slow. Sometimes you need them, but keep in mind they are more expensive than method dispatch.

def my_method
  yield
end

vs.

def my_method(&block)
  yield
end

“Compiling” your code

  • Iterating is slow
  • Ruby’s AST is fast

class_eval For The Win.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Chad Woolley

GoGaRuCo '09 – Discussion Panel: Ruby Application Frameworks

Chad Woolley
Saturday, April 18, 2009

Intro

Josh Susser does the intro. He thought it would be interesting to get authors of popular frameworks together.

Speakers

  • Tim Elliott – Shoes
  • Greg Borenstein – RAD
  • Jay Phillips – Adhearsion
  • Blake Mizerany – Sinatra
  • Yehuda Katz – Merb /Rails
  • Josh Peek – Rails

GoGaRuCo '09 - Framework Creators Panel

Bios

Tim: Been a contributor to shoes since december. Kind of scary to put code out there, and speak publicly. One thing about Shoes is that people are very receptive. It is fun, a framework/toolkit for creating Gui apps. Written mostly in C, it embeds ruby. Lowers the barrier to programming by making it super easy to make fun apps. Shapes, animation, music. It is NOT an MVC framework, it is more like writing a script. You often just need one file, it is straightforward. It is written to be compiled and shared with your friends. Unlike a web app, your Ruby code actually runs on other peoples machines. That makes it more unpredictable due to library, OS and environment differences, which is a challenge to Shoes. Since ruby usually runs on Unix, there are challenges to running it on windows. It has a main loop, so since ruby uses green threads, if the OS does disk access, it can make your app pause, even if you are using ruby threads in your app.

Greg: RAD – Ruby Arduino Development – an open source hardware platform for hardware hacking. Can easily program physical devices, sensors, actuators, things that jump and blink and do things. It is based off the ‘processing’ project, which is in C++, a predecessor of Shoes. He wanted to to it in ruby, though, so it would be eash for new people without a lot of hardware experience to get started. Ruby is great for that. Unlike other frameworks, you have things above and below you in the stack. There are contributors who are old-school hardware guys who submit complex C code against a shoebox full of obscure hardware, so that is a difference as well. Hard to test, if things blink and beep in the right order, then the tests are passing. He wants to get experience about how to manage complexity, especially when other people add things to it.

Jay Phillips: Adhearsion. He was a web developer, but wanted to get in to telephony development, because it was a foriegn technology he knew he could do if he tried. One of the first things he did was hook into Arduino to control his locks remotely via telephony. He encourages everyone to think out of the box and play with new technologies. Adhearsion is a framework for backend telephony development – you call into a phone network, and it does stuff like recording, playing hold music, etc. Adhearsion rests above Asterisk, and a couple of other things. Asterisk has a couple of different options for servicing phone calls – peer-to-peer, UDP protocol. Conferencing as well, but that is more complicated. Sometimes need special hardware. One of the most interesting things about ruby is the ability to play with other people’s code. Adhearsion has a plugin system. The new version is really exciting, and it exemplifies the dynamism of Ruby. It lets you pick up anonymous modules based on file and directory conventions, and extend them with features from a bunch of other different objects. It is a really impressive technology, and the conventions and features of ruby support this. At a philosophical level, Adhearsion is trying to bring open source and modern software principles to the telephony industry, which is very old-skool and backward. It is a big opportunity and a new frontier for Ruby.

Blake: Sinatra. He created Sinatra because he had an itch, and he needed speed. He is a big fan of MVC, but it is not for everything. It was far too much for what he needed, and so Sinatra was born. Ruby is great for Sinatra because of closures. They are cool, and he wanted to make a framework which leveraged the power of closures in ruby. It is a microframework in the truest sense of the word, 800 lines right now. It is getting smaller too, by pulling logic out of Sinatra and moving it into Rack. It is a good rack citizen, and Sinatra doesn’t hide the power of Rack, it exposes it to you.

Yehuda: Merb. He didn’t create it, but was the lead for a year or so. The hardest thing about maintaining a framework is that they start with a strong sense of identity. Then, people try to do different things, and want the framework to do it. It is hard to make the decision of whether a given feature should be added to the framework/api, or whether it is application-specific code that only one guy ever needed and it will have to be supported forever for just a few people. One of the best things about ruby is that all code is executed code. This means you can define methods anywhere in your app, and have them do things. That is really powerful. Ruby is not a slow language, but nothing is free. For example, in C, an if statement is cheap. In ruby, however, ‘cheap’ things add up. As you add support for a lot of cases, you end up with a lot of code just testing for application edge cases. Unless you do a lot of mind-warping complex things to deal with these situations, you will end up with a lot of slow code.

Josh: Rack. Rails contributor, working on Rack integration to Rails. Best thing about rack is that even though it is a port of Python WSGI, it doesn’t split up methods and have nasty wrappers. In ruby, you can really distill the API into just a call and environment hash. This is really nice and simple, and shows the power of Ruby. Going forward in Rails 3.0, with the Merb/Rails merge, he is looking forward to figuring out how to share code with other frameworks. Prior to the merge, he was interested in how to share code with merb, but even though that is moot, it still strengthens the Ruby ecosystem of web frameworks if things work together.

Questions

Q: Josh Susser: What is the language feature about ruby that is most helpful in making a framework?
Blake: Metaclasses and Closures. Allows cool tricks like has_many, etc.
Yehuda: Languages without closures are not really an option for frameworks. Most languages in the world don’t have closures and that is painful.
A: Greg: We need to craft classes with just the methods we want, such as the main loop in Arduino. Being able to grab code and rip the guts out at the last minute would not be possible without method defintion and evaling.
A: Josh: Open classes. Really nice, such as the goodies ActiveSupport adds to Ruby.
A: Yehuda: Community. Not that the community is awesome, but the Rack thing. A grass-roots effort about sharing code. For example, WSJI is not supported fully by Django. In Ruby with Rack, everyone agreeing to standardize is awesome and powerful. That is a feature of the community in that we love to cooperate and share code.
A: Jay: Ruby has one of the best communities of any programming language. Rubygems is a great example. Agility, such as switching to Github. It is like a school of fish darting around, making “tron-like” right turns and chasing new and exciting stuff. Rspec is another example – people come to ruby without any Test-driven development experience, and new people can pick it up. Cucumber and other tools build on Rspec.
A: Tim: Ruby brings in amazing creative people, like Why the lucky stiff.
A: Yehuda: When rspec came out, it could hook into rails in a way that would not be possible with Java. You can make many little mutations quickly. With the little mutations, the good ones win quickly. In other languages, it is harder.

Q: Josh Susser: You’ve talked about community, and even non-rubyists recognize this. All these frameworks are open source. Is there anything about ruby that encourages open source contributaion?
A: Jay: It is a scripting language. Everything is open and revealed for the most part. This supports the Open source mindset.
A: Yehuda: Rails and Ruby is MIT licensed. This is corporate friendly, so big corporations get into it. Even if 20% of the big corporations contribute back, this is still more than GPL licensed communities. It is easy to just monkey patch and just make it work, without having a lot of protocols for contributing back

Q: Does the extensibility of the language itself have an effect, e.g. C extsiosn to MRI?
A: Jay: Extensibility of Jruby is great – you can write java code. That’s more powerful than C extensions
A: Greg: Jruby and rubinius are bringing new world of interoperability
A: Yehuda: When you make a new ruby implementation, it is expected that there is good integration with the lost language.

Q: You all write frameworks, and Rails has abstractions like ActiveSupport core extensions. Other frameworks have similar libraries. Do you have a library of core extensions for the framework, and do you have plans to share it.

A: Jay: Adhearsion will use ActiveSupport. It introduces a moving part, though, which has caused at least one bug. For example, Adhearsion can load a rails app, but this will pull in a random version of activesupport.
A: Josh: In the future, it will be easier to pick and choose parts of Activesupport
A: Yehuda: Merb created Xlib to solve this problem, which is a smaller more extensible activesupport. A precondition. Also, the gem version dependency problem SHOULD be licked in Rubygems 1.4.
A: Blake: Sinatra should be able to interoperate, and where it doesn’t, patches are accepted
A: Yehuda: Problem is that we operate in a global space. We need to make it easier to use ActiveSupport without pain.
A: Tim: Shoes it is a low-level C problem, hasn’t needed active support.

Q: Frameworks need to be modular and reusable. Yehuda talks about stable APIs. On Rspec, there’s no separation between DSL level and Library. how do you address that?
A: Blake: GOod question. At one point in Sinatra, we talked about what we wanted in 1.0, but the important thing is what we DIDN’T want in 1.0. If we trim things out prior to 1.0, you can always add them back. Hard to pick though.
A: Greg: in Rad, there is a balancing act between doing it in native ruby, and the C level. Hard decision. Would like to have a URL where the Arduino libraries live, and keep the logic out of the framework. Sometimes it is hard, because that is ugly and you want to simplify it. You can write inline assembler in a Rad class, which has different problems than other plugin architectures.
A: Yehuda: Biggest thing for frameworks is to write things like you would expect your plugin authors to do. Don’t write something massive with a plugin API. Instead, write something small, and use your own plugin API to do more complex things. It is hard to do, and tempting to make a “secret” thing that only you use. But then you are forced to give people access to your magic api.
A: Greg: I added a secret API for my talk (gets laughs)
A: Jay: Agreed. Also you need to allow people to write tests for their own extensions. If the framework changes (new rails version, for example), you want your clients to be able to catch it. Should be simple for app developers to write tests.
A: Blake: With the github fork queue, it is easy to accept patches, but then you get a flood. It is a balance between accepting stuff and protecting the framework.

Q: Have you ever wanted to overload a method?
A: Yehuda: You can use *args and option hashes and optional arguments.
A: Greg: In C you can do that, and I hate it.
A: Blake: Sometimes I want to, such as recursive operations in erlang.
A: Yehuda: If people could do overloading, they wouldn’t use optional arguments. It is easy to have one method that calls out to multiple methods. People would create scary ruby code because of idioms they bring from other languages.
A: Jay: You can unbind methods in Ruby too. This would be complex.
A: Yehuda: Ruby is guessable. If you add complexity, it is harder to find the right thing to do

Q: How do you feel about the prolifiration of language implementations
A: Totally badass
A: Greg: Rad could never work because of Ruby2C.
A: Tim: Exciting.
A: Yehuda: People got burned by Javascript. Ruby has done it right by holding all implementations to a high standard of compatibility. Jruby spends a lot of time to work with Rails and Merb, and other stuff. We have to hold them to a high standard of compatibility.
A: Yehuda: Ok to import Java’s concurrent hash. Not OK to hack around MRI implementations, or layers like Jquery or prototype for Ruby.

Q: Community – everyone moving to github. Why? Is it because Chris wrote github, or what?
A: Greg: Got a ping to stick Rad on Github. A month later, I got a pull request that did tons of work for me that got me started. The power of that completely sold me on Github.
A: Blake: Heard from Chris, saying it was cool when I saw him pushing it to my SVN repo. It was big to see Linus talk about it too.
A: Jay: Less excited by git than github. The social network is the important thing. When you put code on github, you are not creating an open source project, you are just putting code out there and letting the community know about it and how to find it. The ability to watch people, fork, submit patches, etc is good. There’s a tangible benefit to Adhearsion after the switch to Github, people see other people contributing and there was a big increase in patches.
A: Yehuda: It is a mistake to say it is better than Subversion, early merb started on subversion. but the big thing about distributed version control is that people can have copies of your repo and constantly pull changes. For example, I just merged months of changes back to Rails, and it wasn’t too bad; we wrote a script to help. It would not have been possible with Subversion, I would have gone home and cried. Github is awesome, but it isn’t git – it could have been Mercurial.

Q: What can we do to move community to Ruby 1.9?
A: Jay: Get rails on 1.9
A: Yehuda: Why do you want the community on Rails 1.9?
Q: Performance and speed improvements
A: Yehuda: You could use Jruby or Ruby 1.9. In some random cases, like ERB templates, 1.9 can be many times slower. I benchmark everything, and there is really wierd 1.9 behavior, outlying blips of slowness. We need to encourage people to look at all implementations and find one that works. It’s not obvious that there will be a big benefit of 1.9
A: Jay: Gem compatibility is a big thing. There’s a website “is it 1.9 compatible”. We’ve got ourselves into a rut with a lot of gem code out there written on 1.6. People should write for compatibility. If I wrote for 1.9 compatibility, I’d break Jruby support.
A: Yehuda: Jruby 1.9 should work for most things now.
A: Blake: This is a great job for a duplex proxy.
A: Yehuda: We should try to make sure our code works on as many interpreters as possible, but it is not the most importatnt thing.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Topics

  • agile (781)
  • rails (113)
  • testing (88)
  • ruby (83)
  • ruby on rails (70)
  • jobs (62)
  • javascript (55)
  • techtalk (44)
  • rspec (38)
  • ironblogger (32)
  • productivity (30)
  • activerecord (29)
  • gogaruco (29)
  • git (28)
  • nyc (27)
  • rubymine (26)
  • bloggerdome (23)
  • mobile (22)
  • process (21)
  • pivotal tracker (21)
  • cucumber (20)
  • design (19)
  • jasmine (19)
  • ios (18)
  • webos (17)
  • objective-c (17)
  • android (16)
  • tracker ecosystem (16)
  • palm (16)
  • "soft" ware (16)
  • fun (15)
  • ci (15)
  • cedar (15)
  • rails3 (14)
  • performance (14)
  • bdd (14)
  • gem (13)
  • css (13)
  • tdd (13)
  • selenium (12)
  • goruco (12)
  • bundler (12)
  • meetup (11)
  • railsconf (11)
  • nyc-standup (11)
  • capybara (10)
  • mac (10)
  • mojo (10)
  • chef (10)
  • api (10)
Subscribe to Community Feed
  1. ←
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. →
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >