Ian McFarland and I did an interview with John Cox of Network World for his recent article on Palm’s webOS.
You can read the article at:
http://www.networkworld.com/news/2009/041709-palm-pre-webos-lives-up-to-claims.html
Ian McFarland and I did an interview with John Cox of Network World for his recent article on Palm’s webOS.
You can read the article at:
http://www.networkworld.com/news/2009/041709-palm-pre-webos-lives-up-to-claims.html
Building Custom Web Proxies in Ruby
Ilya has recently fallen in love with the proxy server, and has build a cool one in ruby. He works for PostRank, an information finding and formatting system. They are currently using this sort of proxy servers in production.
Myth: All web framework are slow (rails, django, seaside…)
Reality: Independent of frameworks, application servers can scale using reverse proxy solutions by simply adding more instances. A proxy server provides horizontal scalability by allowing any number of application servers behind a single facade. 90% of proxies are transparent, in that the client is unaware of the proxy. They are also mostly cut-through, in that the proxy streams data in real time from the inside server to the outside client (no buffering or caching).
The remaining 10% case of non-transparent or caching proxies are also interesting and unappreciated. Ilya first ran across this when setting up a staging environment. It was difficult to duplicate a full production environment (complete with all components), and even more difficult to simulate realistic traffic flows. The traffic issue is typically solved by recording or guessing at traffic patterns and playing them back. Whatever simulations you run on a staging server, they are usually out of date for one reason or another.
Ilya wrote autoperf, a ruby tool for running httperf over and over with different concurrency options and parsing the output. It’s useful, but still relies on a realistic load scenario being passed to httperf. One way he tried to solve this was to record some traffic and write a text file that plays it back.
Finally, he discovered that a benchmarking proxy could be used to inject real traffic patterns into a staging server. Using EventMachine, he wrote a proxy server that listens on one port and duplicates the traffic onto multiple hosts at once. Each endpoint server answers the request, but the proxy server only returns one of the responses to the client. The client never knows that the requests are being routed to multiple machines at once.
This benchmarking proxy is called em-proxy and is available on Ilya’s github account: igrigorik. It’s 300 lines of code and simply does the following:
EventMachine makes the proxy server trivial by handling the connection cycle automatically. The interesting part of the proxy is what analysis gets run after the connections are terminated. In his example, Ilya just prints out the time differences between the response times from each machines.
Nontransparent proxies can also be useful. Benchmarking validation can even be performed by a proxy server, comparing performance numbers and determining if new code is slower than old code significantly. The data can even be modified, such as a SPAM detector between mail servers or an encryption system between S3.
At PostRank, they used Beanstalkd for job injection. They had 80 million jobs that needed to be done in memory, at 93 bytes each, which turns out to be 30GB of RAM! Beanstalk couldn’t use disk, so they were stuck with an in-memory solution. Each job was being rescheduled several times, 95% of them to a time more than 3 hours in the future. It was a waste of memory to keep all these far-future jobs in memory, so they added an intercepting proxy server. The proxy buffered “schedule” requests and archived them to MySQL for cheap storage. When the execution time got near, a background processes adds the jobs back to the queue. They chose to implement this by adding a new command to the beanstalk protocol to indicate that a job was being archived (though they could have worked within the existing protocol). At the end of the day, they had 79 of 80 million jobs stored in MySQL and only 1 million in Beanstalk.
Intercepting proxies can also be used for authentication, caching, and more!
Slides available at http://bit.ly/em-proxy
Q: How do you use this benchmarking idea with a real application?
A: We just benchmarked small incremental changes in our app, like testing out a new library.
Q: Have you played with sharding using proxies?
A: I haven’t done anything yet, it would be easy. You’re kind of crazy to do it, because it’s dependent on the particular database protocol and there are other proxy solutions out there (MySQL proxy)
Q: Does event machine help you deal with application layer protocols?
A: EventMachine is a translation of twisted, which means that there are existing implementation of protocols already out there (such as EmMySQL)
Q: How do you know that your benchmarking proxy server isn’t overloaded?
A: Benchmarking very basic EventMachine connection implementation will give you baseline numbers. We found about a 5% overhead by adding the proxy server for our application.
Here are some Rubyists at GoGaRuCo. Look for more pictures to be added to this slideshow later.
I THINK the following slideshow will eventually show everyone’s photos tagged with ‘gogarucorubyists’, but it doesn’t look like Flickr has them indexed yet:
Hampton is talking “roughly” about working with Ruby at Wikimedia. Everyone calls him “HAML Guy”
This is the least planned talk we’ve ever seen, but he does 4 things well.
He has a lot of opinions, so he’ll just show his philosophy.
How he got involved: A friend complained about not having wikipedia on the iphone, so he learned Objective C and wrote it up with a ruby backend. It was the first wikipedia app on the iPhone store, and made tens of dollars on it. Well, maybe a little more, enough to move away from Florida and work on it.
He called it “iWick”, which got him in trademark trouble with the Wikimedia lawyers. They were nice, and he eventually got in touch with the Wikimedia CTO. So, they bought it and hired him to be a lead developer – all within the span of a couple of months.
His job is to “make sure information gets into as many hands as possible”
So, he build a big Merb and Nokogiri backed platform – and they didn’t give him a hard time.
Well, the developers didn’t give him a hard time, but the IT guys did. They thought it would be a performance problem, need more servers, etc. So, that was a challenge to prove that Ruby could be fast. When you have millions of users, milliseconds matter.
Good news, it is really fast (Thanks wycats!)
This is a major part of wikipedia, and it will continue to get bigger.
“Visionary” is a big word, you get grants, etc. All that means, though, is that you “see” something. Everyone comes up with those things ALL THE TIME, but few people do it.
He didn’t even own an iPhone when he wrote the original app, but that didn’t stop him.
Checklist for being a visionary:
Yehuda writes a million things on Github. He is a great programmer, better than most people here. However, the important thing is that he DOES it. You can’t just fork a project, hack for 20 minutes and give up. You JUST HAVE TO DO IT. GROW SOME BALLS. Just make it happen. It’s THAT simple.
If you are a new developer, it might suck, but just do it. Do it again and again. You will get better!
If it is a BIG idea that needs a lot of people to do it, then that’s not a great one. Some people make money doing a blog post, but that is labor intensive. Find ideas where you don’t have to do a lot a work.
We don’t need business people. We are consumers, we know what people want, so go do it.
Find what you are passionate about. He is really excited about Wikipedia, so that’s what he does. If you are into Dog Breeding, then make a site about it. Whatever your weird hobby is, then do it!
He was a total nerd in high school, nobody liked him, he was really shy and didn’t talk.
He decided later that there was no difference between cool people and not-cool people, the difference is just doing it. When he walked into Wikipedia, he was going for an iPhone developer, even though he didn’t have a lot of experience in Objective C, etc. But, he told himself “I can do that!”, and he did. He still gets nervous after a talk, but that’s OK. If you have an idea or passion, write it down, and just do it.
Say there was no Ebay. You could write an auction site in a weekend – not a great one, but the basic functionality. What if nobody had thought of ebay yet? It would be a good idea. It is not that hard.
Just do it!
Take a weekend of your life. A lot of caffeine. A lot of alcohol. A lot of cigarettes. Whatever does it for you, just do it!
If you want to learn programming, start from scratch. Many people learn programming in a company, but that isn’t always good – you get constrained. It is better to write something on your own, and learn it on your own.
Just do it!
Q: I was told you are going to sing a song
A: Goes into “Space Oddity” by David Bowie
Q: What is the most awesome code you ever wrote, and the worst code you ever wrote?
A: The iPhone stuff is the best. But then he talked about the worst code – there was some really bad code (logic in views, etc) he wrote early on which was about to get taken over by some other developers. So, he volunteered to rewrite and clean it up, so people wouldn’t twitter about the bad code that “The HAML Guy” wrote.
Q: Mediawiki is php and mysql, how do you plug into it?
A: It is all on the server farm, so it pulls it and parses it with Nokogiri. Currently with no caching. Server is 1.5 million hits a day, load is less than zero.
He also took a couple of questions about Rhodes which made it easy for him to write cross-platform code. He can write one bit of code and have it run on 90% of the phones in the market. One thing with Rhodea is that you can’t eval – you can’t download and execute code (other than Javascript)
I personally am really excited about this talk. I worked on the OSDV prototype at Pivotal last year, when we made a small prototype in a few weeks. This was subsequently presented to congress. It was an incredible experience. As a programmer, you write a lot of code which isn’t that exciting, counting beans or Yet Another Social Networking Website.
OSDV, however, is something that is REALLY important. It has the potential to revolutionize the way Democracy works, and really change the world for the better.
Here goes the talk, with Matthew Douglass running the slides and Gregory Miller talking.
First is a video about how democracy used to work, when we trusted the outcome of votes. Now, after the 2000 Presedential Election, people lost confidence.
Now, states are getting funding to update their voting system. However, now that we are past the “Hanging Chad”, we are seeing MORE, not fewer problems. The companies that make proprietary digital voting do not make the required investment to make their machines trustworthy, and rely on PC technology and proprietary code.
Shouldn’t we be able to say “I count”? We should not expect the Government or Private Sector to fix this. It must be a Grass-roots movement, something big. We need to completely rethink the lifecycle of our ballots.
We have to shift away from companies guarding proprietary, black box voting to a world of “glass-box” voting. Blueprints and designs are freely available.
We need the Open Source Digital Voting Foundation.
it is not just another thinktank or group of lobbyists. It is technology professionals teaming up with volunteers. Everyone can see, touch, and try it out.
This is a digital public works project, calling people from all over the country and world to help out, take a hands-on approach, and do something.
We are the real stakeholders in our Democracy. We can all make our votes count. The time to begin is NOW.
Q: Federal guidleines for how votes are counted?
A: FALSE
Q: California’s absentee ballots always counted?
A: FALSE
Q: Major voting vendors system rely on commodity Hardware/Software
A: Sort Of. They use “Windows 95″.
He then shows “Clippy” helpfully offering to finish your vote for you…
Horribly dysfunctional market. There are FOUR vendors of voting systems in the US, there may be two by the end of year
Very high barriers to entry, hard to get it approved and legal.
When you have no competition and barriers to entry, there is no incentive to innovate. You end up with closed proprietary systems with inconsistencies and irregularities. There is a natural conflict of interest between shareholder interest and public interest.
Guess who wins every time when shareholder interest meets public interest?
The pillar of democracy is transparency, and the substance of the pillar is technology.
“Sunlight is the best disinfectant”
This stuff is so imperative and essential to our Democracy, it needs to be lifted up to the level of a public works project.
Why not commercial sector? They will do as little as possible, and have conflict of interest
Why not the government? Slow, and at risk of losing funding.
Bringing together two approaches – fault tolerance and high-availability computing, with the dynamics of open source community.
Rather than being a think tank, they have a group of people in Silicon Valley making things that we can see and touch.
Public Technology Repository – State and local govt, Fed govt, Commercial Vendors, test suites, dynamic continuous testing, everyone is giddy!
Two commercial vendors who are deploying with a commercial deployment license, and are being delivered open source solutions based on draft standards that the consortium is building.
Rails is a major part of their work. They are assembling a great core team.
It has been below the radar, but it will be more public in the future.
Q: How do we advance or improve the system?
A: Yes, look over the horizon at what the future looks like – Instant runoff, etc. However, there is another half of the question. They DON’T want to build the ‘perfect’ system, and have it be a relic. They have to be driven by real requirements and real adoption. They have to take the EXISTING processes, and make them better. That will get their attention, and drive adoption.
Q: Are the Hardware and Interface designs open source?
A: Absolutely everything is open. Everything will be transparent and funneled through the RFC process. The goal is to build an entire software ecosystem that runs against a known, virgin, commodity hardware system. Then they will examine on a device-by-device basis to plug in new parts. “Open Source Hardware” has never been done, but they will try.
Q: What are the obstacles (e.g. politicians)
A: Lots of them, but their position is that they are technologists, making the best solutions. Senator Patrick Leahy said “please don’t waste time trying to change systems, make things that people can touch and try”.
There are “horrifying” ways the system is designed to preserve incumbency. If this works, it really changes the landscape in a big way.
Q: What percentage of elections are corrupt?
A: They have been doing due diligence, and have found “remarkable” inconsistencies, some of which have resulted in criminal elections. We may think that Obama got elected, things are great, but we dodged a bullet. We are 170 days into the congressional session, and no senator from Minnesota is seated. Politicians will no longer be able to hide and say “the box did it”.
Q: It seems like a huge complex problem to solve, shouldn’t it be bite-sized?
A: They thought about componentizing it, but the only way to do it right is to start with a clean slate. Forget incumbency, and legacy. We need open data and open processes. They are partitioning the process to different buckets, and have different teams working on them. They are laying the foundation for a pluggable, XML-based framework. They are going in a procedural fashion, and really focusing on the 2010 election.
Rapid prototyping, Agile Development approaches with Structured Approaches.
HUGE APPLAUSE AND WHISTLES!
Playing With Fire: Running Uploaded Ruby Code in a Sandbox – David Stevenson
It is still new, but we will get a chance to interact with it live. There will be a competition to see who can compromise the sandbox first.
The prize is a Cupcake, but he has not bought it yet, because he doesn’t think anyone will break out.
Rules are you must break out of the sandbox itself, not compromise his box or the OS.
Say you want to make a decision about which folder to use for a user’s mail? You can write a bunch of complex rules, or you could allow your users to upload code to do it.
He makes a reference to the Neal Stephenson book about the Metaverse, where everyone uploads code.
Second Live also has a C metalanguage which allows players to create their own code and three-dimensional objects. In this type of game, the sky is the limit.
Google’s AppEngine is another example. Users can write their own code and run it in a sandbox, but Google handles all the scalability and hidden bits.
Dangerous operations: Code could have errors, or not finish. Someone will upload an infinite loop almost immediately, you need to deal with it.
Knowledge: Are users programmers? Maybe they don’t want to learn a language, even one as easy and nice as Ruby.
API Manipulation: Maybe there are ways that users could manipulate your API in ways you have not even thought of yet…
Freaky-freaky sandbox gem (MRI ruby): By why the lucky stiff with some contributions from David, written in C. It is a big hack, a bit of a disaster, but it works. We’ll get to play with it.
JavaSand gem (JRuby): Same API as Freaky-freaky, but not as much of a hack. JRuby provides more hooks into the internals, so you can do some of the same things that Freaky-freaky does, but without as much hackery and violation of internals.
Rubinius in the future? – Sub-virtual-machines could be used to create a sandbox, maybe even 20 lines of Rubinius. The C implementation is about 2000 lines.
Expression Evaluator: 2+2 -> 4, etc.
He is creating the rails application from scratch, hopefully the bandwidth holds up. He’s not using Sinatra, because he doesn’t know how to get something scaffolded fast enough in the time constraints of a presentation.
Some dangerous things are NOT accessible in the sandbox, such as File and Kernel.
require 'test_helper'
require 'redgreen'
class ExprTest < ActiveSupport::TestCase
test "two plus to equals four" do
assert_equal 4, Expr.new(:expr => "2 + 2").value
end
end
class Expr < ActiveRecord::Base
def value
Sandbox.safe.eval(expr)
end
end
He then goes on to implement exception handling (test driven, of course), and also implements code to prevent infinite loops with a timeout.
He then wraps up the coding of the initial app, and he is exposing it to the audience. He has to do the standard rails stuff to make a new app work, delete index.html, set up routes, etc.
Now, the fun begins. Here’s some examples that are showing up within a minute:
Listing exprs
Expr
open testfile
return
`ls`
context.freeze
Dir.entries('.')
while true; end
`ls`
self.instance_eval{while true; end}
1/0
`rm -rf ./'
arr = ['a'] * 0xFFFFFFFFFFFFFFFFFFFFF
ObjectSpace.count_objects
p=lambda { 'yo' }; p.call
`sudo reboot`
a = 2; a+3
4*4
%x[tail log/production.log]
File.new
$*
"HELLO GOGARUCO. YOUR ZIPPER IS DOWN. YES YOU. YEAH, ON THE RIGHT"
while true; puts 'are we there yet'; end
New expr
David is now discussing the restricted set of objects in the sandbox. The problem is that you need to reference things like Net::HTTP, but that is not in the set of restricted objects.
The solution is to reference some classes into the sandbox, and copy others. It runs the unsafe things “outside” of the sandbox, but users still cannot access these restricted classes.
http://hangman.sandbox.flouri.sh/
# API methods def word def guesses def guess!(char) def all_words
There is a cron job. Every minute, all the algorithms run, and everyone can make a guess.
Sorry, no backtick:
Expr: `rm -rf ./` Value: "#Sandbox::Exception: SyntaxError: (eval):3:in `_eval': compile errorn(eval):3: unterminated string meets end of file"
He then ask if anyone has broken out of the sandbox. NO hands go up. Win! Everyone claps. Great preso!
Aaron starts out with a really bad joke about ‘eyhelp’, which puts him in good standing with me, at least (I like bad jokes).
He’s from Brooklyn, NY, which gets a few woots.
Most everyone in the audience has used Sinatra. It is “The Classy Web Framework”
You can make a very simple controller, just require ‘sinatra’, and define a simple get to make the most basic app, and you can define routes as well.
There’s been a lot of effort to reorganize the Sinatra codebase. Now there is a base class, and you can define apps which extend Sinatra::Default. He mentions Pat Nakajima’s Rack::Flash for rack and flash message integration.
The nice thing about this is that you can define multiple classes and apps, which allows you to encapsulate things better.
Sinatra is not a framework. Rails is a framework. Rails makes a lot of assumptions about how you will write an app, convention over configuration. It is like a “shelving unit” with places to put everything you need.
Sinatra, on the other hand, is like a wooden board. It is NOT MVC, it is really simple and straightforward.
WDNNSP = “We Don’t Need No Stinkin’ Pattern”
Aaron would like to remove the idea of Sinatra as a framework, for doing “smaller” Rails app. That works, and you can use something like Sinatra Generator to make small apps. However, he’d like us to think of it as a different way to build apps.
We can have our awesome Ruby project, but as an aside, it has Sinatra and can run on the web. You can think about HTTP as a language. It is a protocol, but with REST and other conventions, you can think of HTTP as a way for two apps or distinct pieces of code to talk to each other.
It’s up to you, but if you do, then Sinatra is an easy way to do it. One box saying “GET”, and another box giving a “response”.
The first box on the “GET” side is a User (RestClient), and the other box is Rack and Sinatra. Sinatra is really good at handling the box on the “response” side.
Think of our local computers a place to interact with apps.
Aaron discusses CouchDB now. It has a local interface which lets you interact with it, and it is a simple way to interact with the app, which runs on the web on your local machine.
He’s now describing a scenario with “Jane” in the Accounting department of “Megacorp”. When she needs to run reports, she usually sends them to the developers who have to run them from the command line. Instead, why not just make a simple webapp with a field for Jane to type into and generate her report by herself.
He would like to see all developers turn their apps into Sinatra web apps – Gems, everything. What if all our gems and code came packaged with web interfaces?
An example of this that he wrote is “Gembox”. It is a simple interface for browsing your gems. 99% of the code is RubyGems, with just a little Sinatra file to present the data via the web. He shows the code layout. The important part just consists of a file which runs the “gem list” command, with some view helpers around it. There’s not a lot of assumptions around the directory tree.
“Super Simple Sinatra Starter”.
Vegas is a bin file which wraps your app, and allows you to run it on whatever Sinatra server you have available. It makes it really easy to write and start embedded apps.
Gems + Vegas as a distribution platform
Even easier workflow for mounting
Vegas as Central Brain
Distribute Tasks across local network
See more at http://code.quirkey.com
Q: You showed code: ‘set :sessions, true’. Can you talk more about how sessions work, because they are really complex.
A: This is just a shortcut to including the Rack Sessions middleware in Sinatra. It is based on cookies, but you could write more complicated ones for distributed db sessions, etc.
Q: (from Alex Chaffee) Vegas is cool, but what about the security implications of running on localhost?
A: Yeah, it is insecure. But, you can do really cool things like running the command line. Maybe it could be sandboxed?