|
Topics
How well does the Bamboo router work?
In emulated networks under ModelNet, the Bamboo router
has shown the ability to successfully perform lookups under incredibly high
churn rates. For median session times as short as 1.4 minutes, it returns
consistent results 95% of the time, with a mean response time two seconds.
For much lighter churn (median session times of three hours), the router
returns consistent results 99.99% of the time, with a mean response time of
220 milliseconds. For comparison, the MIT Chord code from August 8, 2003 also
returns consistent results 99.99% of the time but has a mean lookup time of
780 milliseconds under the same test, and FreePastry 1.3 running over RMI
returns consistent results 99.9% of the time and has a mean lookup time of 1.8
seconds.
For more information about these tests, see our technical report in
the Publications section of this web site.
As far as resource usage goes, the router is also very frugal. In our tests,
we have run as many as 100 instances of the router per physical machine on a
cluster of 40 dual-processor, 1 GHz machines with 1 GB of memory each, all
while seeing 1-minute loads of less than 2.
We're also running all of Bamboo, including the router, on PlanetLab 24/7.
See the next question for details.
How well does the rest of Bamboo work?
The main component other than the router included with Bamboo is the put/get
layer, which provides a hash-table-like interface. Clients can put key-value
pairs into the DHT, and then later get them back out. All key-value pairs
have an associated time-to-live (TTL) in seconds, and are discarded after
being stored for that amount of time.
We have been running both the Bamboo router and put/get layer on PlanetLab
24/7 since April 2004, and measuring its performance continually since August
2004. To test the reliability of the layer, we put a new value into the DHT
every second from a client node at Intel Research, Berkeley. Values' sizes
are randomly chosen to be one of 32, 64, 128, 256, 512, or 1024~bytes, and
their TTLs are randomly chosen to be one of an hour, a day, or a week. As of
4 November 2004, we've gone eight weeks without losing a value, and we've gone
three weeks without any get RPCs timing out (i.e., the values were still
there, but temporarily unavailable).
Is Bamboo secure?
That depends on what you mean by secure. If you mean, "by running Bamboo on
my system, am I going to get hacked?", then the answer is probably no,
although I won't guarantee it. Bamboo is written in Java, so most of the
standard buffer overflow security holes just aren't possible.
On the other hand, a number of other attacks are possible. For example,
someone could certainly pick their DHT put() operations carefully so as to
fill up the disk partition where Bamboo stores its blocks. Future Bamboo
versions will include resource management to protect against this sort of
attack, but they certainly aren't in the code right now.
Another attack that is possible is to use Bamboo's function as a router to
forward packets to someone you don't like, in a sort of denial-of-service
style attack. This attack would be a little tricky to arrange, but I'm sure
it can be done. Security in DHTs is an active area of research, and we plan
to integrate new results into the Bamboo code as they're available, but the
bottom line for now is that this code should be used for research, rather than
production, environments.
Why does the router keep changing its neighbors?
This question is probably the one I hear most often about Bamboo: "I have a
network of Bamboo hosts with no churn. Why do the individual routers keep
changing their routing tables when no hosts are joining or leaving the
network?" In short, the answer is that they're trying to optimize their
routing tables, and you can turn this feature off by setting
"ignore_proximity" to true in the router's configuration settings. There's
more to the problem then that, though, so I'll take a few paragraphs to
explain the situation. I'll also present some less drastic solutions.
Bamboo implements what's called proximity neighbor selection (PNS),
which means that for all possible choices of a node to fill a given spot in
its routing table, a Bamboo node will try and use the one that is closest to
it in network latency. This technique is a great one in theory: it makes
lookups faster, lowers network stress, and allows for the construction of
efficient multicast trees on top of the Bamboo network (see the Bamboo,
Pastry, and Tapestry papers for explanations), but in practice it presents
several problems.
The first problem with PNS is how to find the latency between two network
hosts. In general, a single ping won't do the trick, since dynamic network
conditions such as queuing, cross traffic, and packet drops can all influence
the result. Using more pings and averaging the result works better, but if
many hosts all ping each other at the same time, the resulting load on the
network can skew the measurements.
Another problem is that many network hosts no longer respond to actual ICMP
pings for fear of denial of service attacks, so Bamboo has to do its pings
over UDP at user level, introducing another level of uncertainty into the
measurement, both from operating system scheduling issues and from garbage
collection. Because of these difficulties, Bamboo sees neighbor latencies
that vary over time, making it tricky to optimize its routing tables.
The goal with PNS is to make the router responsive enough to incorporate new
nodes that genuinely are closer than existing neighbors, but not so sensitive
that it abandons existing neighbors due to transient network conditions. In
practice, this balance has proven difficult to achieve. For now, I have two
solutions. First, as mentioned above, you can simply turn PNS off by setting
"ignore_proximity" to true in the router's configuration settings. If you do
so, Bamboo will never replace an existing neighbor until it experiences 15
consecutive message timeouts to that neighbor. (Currently, each successive
timeout is twice as long as the previous one, to a maximum of five seconds per
timeout, so 15 timeouts takes around 1-2 minutes. Bamboo will route around an
existing neighbor after only 5 consecutive timeouts, however, meaning that it
generally adapts to suspected outages after only a second or two.)
As a less drastic way to reduce how often Bamboo changes its neighbors, you
can change the value of "rt_scale" in the router's configuration settings.
When Bamboo goes to replace an existing neighbor with a newly discovered one,
it first multiplies the estimated latency to the existing neighbor by the
value of "rt_scale"; if the new node still has a lower latency than this
adjusted latency for the existing neighbor, then the replacement is performed;
otherwise, no change is made. On PlanetLab, I usually run Bamboo with
"rt_scale" set to 0.9.
How do I use the storage layer (put/get interface)?
First of all, make sure you're working with the February 13 snapshot or later.
All previous versions have pretty severe bugs. Next, let me say that at Intel
Research we've just started a new project, called OpenHash, to provide a
public DHT deployment, hosted on PlanetLab and available for use by
everyone. As such, the storage layer is getting a lot of my attention lately,
so you can expect it to be reasonably well supported.
If you run the location-test-menu.pl script (described in the User's Guide), it will now run a copy of the bamboo.dht.Gateway stage, which provides a Sun RPC gateway over
TCP to the put/get functionality of the DHT. I've written two simple tests,
one in C and one in Java. To run the C one, start up a node in the
location-test-menu.pl:
$ cd ~/bamboo/test
$ perl -w location-test-menu.pl
Menu:
1. Check node status
2. Check object pointers
3. Check for exceptions
4. Start a node
5. Stop a node
6. Quit
Your choice: [4]
Starting localhost:3630 with gateway localhost:3630.
cfg=/tmp/experiment-1855-localhost-3630.cfg
log=/tmp/experiment-1855-localhost-3630.log
pid=1856
Then from another prompt, do this:
$ ~/bamboo/src/bamboo/dht/gateway_test localhost 3632
Doing a null call.
Null call successful.
Doing a put
Put successful
Doing a get
Get successful.
Doing a put
Put successful
Doing a get
Get returned value Hello, world..
Get returned first value.
Doing a get
Get returned value Goodbye, world..
Get successful.
Doing a put
Put successful
Doing a get
Get successful.
To run the Java test, start up a node as before, and then do this:
$ ~/bamboo/bin/run-java bamboo.dht.GatewayTest localhost 3632 1 1
/home/srhea/bamboo/bin/run-java
PID: 1942
LD_LIBRARY_PATH=:/home/srhea/ns/ns-allinone-2.1b7a/otcl-1.0a6:/home/srhea/ns
/ns-allinone-2.1b7a/lib:/home/srhea/bamboo/bin/../lib/linux:/usr/local/lib
CLASSPATH=:/home/srhea/bamboo/bin/../src:/home/srhea/bamboo/bin/../lib/ostor
e-seda-emu.jar:/home/srhea/bamboo/bin/../lib/db-4.0.14.jar:/home/srhea/bambo
o/bin/../lib/diva-26Sep02.jar:/home/srhea/bamboo/bin/../lib/log4j-1.2.8.jar:
/home/srhea/bamboo/bin/../lib/jrpcgen.jar:/home/srhea/bamboo/bin/../lib/oncr
pc.jar
/usr/local/j2sdk1.4.1_03/bin/java -verify -mx4M -ea:bamboo... bamboo.dht.Gat
ewayTest localhost 3632 1 1
2004-02-17 12:56:37,786 INFO bamboo.dht.GatewayTest$ClientThread: Connectin
g to host.
2004-02-17 12:56:37,973 INFO bamboo.dht.GatewayTest$ClientThread: Put succe
ssful: size=687 key=35e9f1c7 ttl_sec=480 lat=21 ms
2004-02-17 12:56:42,033 INFO bamboo.dht.GatewayTest$ClientThread: Put succe
ssful: size=916 key=09eec5f6 ttl_sec=180 lat=5 ms
2004-02-17 12:56:48,263 INFO bamboo.dht.GatewayTest$ClientThread: Get succe
ssful: key=09eec5f6 lat=9 ms
The syntax for the put/get interface can be found in the RPC interface
definition in
bamboo/src/bamboo/dht/gateway.x
The semantics of the interface, at least for now, are as follows. All puts
are treated as append operations; two puts to the same key but with different
values will result in both values being stored in the DHT. All puts are
stored for as many seconds as specified in the TTL, with starting time
measured from the gateway node. If two puts have the same value, the TTL will
be extended to the longer of the two values, measured from the time the
corresponding put was received at the gateway.
A get should initially have a zero-byte placemark, with the all
field set to true. Up to maxvals will be returned. The get
response will also contain a placemark. If a get response contains at least
one value, than this placemark can be used to retrieve subsequent values.
Otherwise, all values have been returned.
This interface is evolving a bit, but we're not clear exactly how it will turn
out yet. Please email any suggested changes to the mailing list. Once it
settles down, I'll write up a more detailed user's guide and specification.
Can the router use multiple gateways?
Yes. Instead of a gateway parameter, you can specify a
gateway_count, then specify gateway_0,
gateway_1, etc. There's only one
problem: if a node has itself as the only gateway, it thinks it's the
root of the network and joins immediately. Otherwise, it throws itself
out of the list and tries to join through the others. So you have to
start one special node first--the node with only itself as a gateway. After
that, you can start the rest in any order, so long as they each contain the
original node as one of their gateways. I'm open to trying to work around
this, however, so let me know if it's a huge hassle for anyone.
Update (2004/02/08):
I have a new solution for this problem in the 2/4 CVS and later, but it
takes a bit of background to explain. Bamboo already has code to heal
the network from partitions that works as follows. Each node keeps a
set of the last 20 or so nodes that used to be its neighbors but have
since become unreachable. This list is called the down_nodes set.
Every minute, it randomly selects one of these nodes and sends it a join
message. If the node is reachable again, and it's part of the same
network or a smaller partition, then the leaf set returned in the join
response will not be useful to the first node and nothing will change.
On the other hand, if the node sending the join message is the one in
the smaller partition, then the returned leaf set will be made up of
nodes much closer to it (in the ID space) than its existing leaf set.
As a result, it will use those new nodes instead, healing the partition.
The trick to fixing the multiple gateways problem is to reuse this
mechanism. If you set the config variable immediate_join to true, on
startup the Router will add all of the gateways in the gateway list to
its down_nodes set (except itself, in the case where it's a gateway,
too) and then consider itself joined into a network of one node. It
also sends its first partition-healing join message immediately.
Eventually, it will contact one of the gateways and join that network,
and if all of the gateways come up, the partition-healing code
guarantees that they'll all eventually be in the same network. I've
been using this mechanism on PlanetLab for a week or so now and it seems
to work fine.
Does Bamboo use Vivaldi?
No. Bamboo comes with an implementation of Vivaldi, and our PlanetLab
deployment is running that implementation, but for curiosity only. Bamboo
uses recursive routing, which has no need for network coordinates. All
neighbor selection in Bamboo is done using direct latency measurements, and
all timeouts are calculated based on direct observations of past performance.
Our USENIX paper showed the negative impact that using
network coordinates for timeout estimation can have on lookup performance
under churn.
How does the storage layer maintain replicas?
Basically, the same way that DHASH does, as described in
"Robust and Efficient Data Management for a Distributed
Hash Table" by Josh Cates [Master's thesis, MIT, May 2003 (PDF)]. We uses replicas instead of erasure codes, and we sort
values by timestamp rather than key before building Merkle-tree summaries, but
other than that they're virtually identical. The (rather obscure) code is in
the class bamboo.dmgr.DataManager.
Bamboo crashed; can you help me?
Yes, I can, but I need
- a stack trace, if available
- the version of the code you are running
- any relevant .cfg files
For code versions, I prefer that you tell me the snapshot you're using. If you've forgotten which one you downloaded, search through the files mentioned in the stack track for the string "$Id" and send me the contents of that line. It might look like this:
$Id: faq.content,v 1.18 2005/09/29 13:16:42 srhea Exp $
Bamboo won't compile; can you help me?
Yes, I can, but I need
- the version of the code you are running
- the compiler output
For code versions, I prefer that you tell me the snapshot you're using. If
you've forgotten which one you downloaded, search through the files mentioned
in the compiler errors for the string "$Id" and send me the contents of that
line. It might look like this:
$Id: faq.content,v 1.18 2005/09/29 13:16:42 srhea Exp $
Last modified 2005/09/29 13:16:42.
|