Apache Drill Design Meeting

MapR folks invited me to participate in Apache Drill design meeting. Meetup site indicates that 60 people have been participated which sounds about right.

Tomer Shiran started the meeting with the overview of Apache Drill project. Then I (Camuel here) presented our team view for Apache Drill architecture. Jason Frantz of MapR continued touching technical aspects in follow on discussion. After a pizza break, Julian Hyde presented his view on logical/physical query plan separation and suggested using optiq framework for DrQL optimizer.

Overall my take away are as follows:

  1. There is very healthy interest in interactive querying for BigData.
  2. There were not even a single voice calling on making up vanilla Hadoop for this task.
  3. There is a general consensus on plurality of query languages and plurality of data formats.
  4. There is a general consensus that user always should be given freedom to supply manually authored physical query plan for execution, bypassing optimizer altogether and as opposed to hardcore hinting.
  5. Except me no one tried to challenge “common logical query model” concept. Since there are no real joins in Dremel and no indexes and only one data source with exactly one possibility – a single full table scan, I cannot see the justification for the complexity of optimizers and the logical query model. Dremel is an antidote concept to all this.

Thank you – MapR, for the Drill initiative, the great design meeting and the invitation.

Apache Drill Progress

We are continuing our efforts in contributing our OpenDremel code to Apache Drill project and look forward to be active with it right after that.

Right now the efforts are being put into our ANTLR-based parser, we want to make it work with the new grammar of BigQuery language. That should be done within a few days, the parser will be committed to the new Drill repository as a first phase of the OpenDremel-Drill merge.

Next on, we plan to refactor and contribute the Semantic Analyzer, which processes the output of the parser into an intermediate form, resolving references and rewriting (flattening) the query into single full table scan operation. That is expected within a week or two, it would depend when the Drill architecture doc will be published. We still don’t know what will be the schema language/format. Will it be Protobuf? Avro? OpenDremel supports Avro right now and has an initial support for Protobuf.

The final phase of OpenDremel – Drill merge, will be the contribution of the code generator based on the Apache Velocity templates. We have two sets of templates for now: one is a Java-based and executed with Janino executor and second one uses C/asm and executed with ZeroVM executor.

Everyone who wishes to help is welcome. The OpenDremel code resides in its usual Google code repo – http://code.google.com/p/dremel/. BE SURE TO LOCATE AND USE REPO COMBO BOX on the upper part of the page.

We probably will use https://github.com/ApacheDrill repo as a staging area or the Apache git repo directly, it all depends on what will be proposed by Ted Dunning – the Apache Drill Champion.

We also continue work on our generic execution backend built on top of OpenStack Swift and integrated with ZeroVM. We are contributing to both projects here.

We look ahead to Apache Drill with pluggable frontends and pluggable backends. So it would be able to run on top of a toy single-JVM Janino backend, or under YARN management on HDFS with Janino or ZeroVM backend, or even on a Zwift backend (that’s how we codenamed OpenStack Swift + ZeroVM combo).

On other hand the frontends will be pluggable too, so, in the future, support for new languages such as Apache Pig or Apache Hive can be added easily. Another option would be to create single frontend with pluggable language handlers, that would allow us to embed functionality from other projects such as Apache Mahout or R.

Apache Drill

We are not longer alone implementing Google Dremel and BigQuery technology. A proposal was made recently to Apache Foundation suggesting similar project. Moreover Ted Dunning kindly invited us to take part in the project.

The project is just starting now and there is no source code yet and not even a consensus design. So we sat together today evening and wrote a proposed design for Apache Drill. We already working for about two years on Dremel and BigQuery implementation. It was a fascinating journey and we have learned quite a lot and would be more than happy to share our experiences and accumulated knowledge.

All our code (OpenDremel/Dazo/ZeroVM) has Apache License from the beginning and used several Apache technologies from Avro to Velocity. Apache seems to be best home for Drill project and we are looking forward to contribute to it.

Start-Up Chile

I’ve been frequently asked about my experiences in Start-Up Chile program. For the past half year that I’ve been participating in the program I could say that it was interesting and fulfilling experience.

On top of provided seed capital you get a supporting framework of mentors and fellow startupists. You can literally “feel” the surrounding  entrepreneurial spirit. And despite me being unlucky to find peer support with my infrastructure BigData@Cloud idea (most folks were doing consumer web kind of startups) I did found the framework highly encouraging.

Provided capital is equity-free which is especially nice and makes negotiating next financing round easier. Getting the money is paperwork-intensive process but the staff are friendly and helping.

I found Chileans hospitable and friendly to foreigner. Yet minimal Spanish seems to be mandatory. I found myself speaking Spanish after a few month in Santiago and that was  unplanned initially.

Santiago is a nice mountain-surrounded modern city and pretty safe I would say. I cannot count how many times locals warned me on how unsafe Santiago really is, but except permanently going strike/riot in the central part of the city I never experienced and never witnessed or heard about any incident. And I’m usually working deep into the night and walk extensively before retiring to bed. I lived in Centro but especially enjoyed walking in west-northern part of the town. Underground transprtation is quite efficient to get around, a little hot during mid-day in February I remember. I was mostly fully consumed by my startup so haven’t enough time to tour the rest of the country, and even Santiago only from walking experience guided by GPS in my Nokia. I really should rent a car one weekend and get out for a couple of days… In fact I did one weekend in Vinna del Mar / Valparaiso and found it quite a nice and relaxing place.

The local entrepreneourship and geekish community is also thriving and this is not including very visible Start-Up Chile folks. Go to meetup.com and choose your favorite topic or technology and I bet you will find a packed santiago interest group there.

Apache Hadoop over OpenStack Swift

This is a post by Constantine Peresypkin and David Gruzman.
Lately we were working on integrating Hadoop with OpenStack Swift. Hadoop doesn’t need an introduction neither does OpenStack. Swift is an object-storage system and the technology behind RackSpace cloud-files (and quite a few others like Korea Telecom object storage, Internap and etc…)
Before we go into details of Hadoop-Swift integration let’s get some relevant background:
  1. Hadoop already have integration with Amazon S3 and is widely used to crunch S3-stored data. http://wiki.apache.org/hadoop/AmazonS3
  2. NameNode is a known SPOF in Hadoop. If it can be avoided it would be nice.
  3. Current S3 integration stages all data as temporary files on local disk to S3. That’s because S3 needs to know content length in advance it is one of the required headers.
  4. Current S3 also suffers form 5GB max file limitation which is slightly annoying.
  5. Hadoop requires seek support which means that HTTP range support is required if it is run over an object-storage . S3 supports it.
  6. Append file support is optional for Hadoop, but it’s required for HBase. S3 doesn’t have any append support thus native integration can not use HBase over S3.
  7. While OpenStack Swift is compatible with S3, RackSpace CloudFiles is not. It is because RackSpace CloudFiles disables S3-compatibility layer in Swift. This prevents existing Swift users from integration with Hadoop.
  8. The only information that is available on Internet on Hadoop-Swift integration is that with using Apache Whirr! it should work. But for best of our knowledge it is relevant only to rolling out Block FileSystem on top of Swift not a Native FileSystem. In other words we haven’t found any solution on how to process data that is already stored in RackSpace CloudFiles without costly re-importing.
So instrumented with above information let’s examine what we got here:
  1. In general we instrumented Hadoop to run over Swift naively, without resorting to S3 compatibility layer.  This means it works with CloudFiles which misses the S3-compatibility layer.
  2. CloudFiles client SDK doesn’t have support for HTTP range functionality. Hacked it to allow using HTTP range, this is a must for Hadoop to work.
  3. Removed the need for NameNode in similar way it is removed with S3 integration for Amazon.
  4. As opposed to S3 implementation we avoided staging files on local disk to and from CloudFiles/Swift. In other words data directly streamed to/from compute node RAM into CloudFiles/Swift.
  5. Though the data is still processed remotely. Extensive data shipping takes place between compute nodes and CloudFiles/Swift. As frequent readers of this blog know we are working on technology that will allow to run code snippets directly in Swift. Look here for more details: http://www.zerovm.com. As next step we plan to perform predicate-pushdown optimization to process most of data completely locally inside ZeroVM-enabled object-storage system.
  6. Support for native Swift large objects is planned also (something that’s absent in Amazon S3)
  7. We also working on append support for Swift (this could be easily done through Swift large object support which uses versioning) so even HBase will work on top of Swift, and this is not the case with S3 now.
  8. As it is the case with Hadoop S3, storing BigData in native format on Swift provides options for multi-site replication and CDN

Futility of “tooling” a proprietary cloud.

I’v been pitched by a lot of entrepreneurs trying to make a better-than-original “tooling” for a proprietary cloud, particularly for AWS. Ain’t the attempt futile from the beginning? Amazon is smart, innovative and working hard to make its cloud offering comprehensive and has much larger arsenal to overdo anyone who dare to compete on their own turf. That is their party, the invitation cannot be taken for granted.

Let’s take NoSQL data-stores and DBMS vendors as examples. There are VC-backed companies out-there which are exclusively focused on outdoing Amazon with running MySQL/NoSQL on their own cloud, Xeround comes to mind, but many others also hoping their product will catch fire on EC2.

Well, if just single branding and plain convenience is not enough , how about these two exclusive and “unfair” competative advatnages in Amazon arselnal:

  • [My unverified assumption is] that DynamoDB has storage integrated into its fabric whenever all the rest must use slower EBS.
  • Not it is just integrated, but as announced by Amazon, it uses SSD-backed storage. SSD-backed storage is not available, as of today, for DynamoDB competitors running on AWS. So competitors must continue to use ordinary EBS. That is in fact a double-kick, first the mere fact of using different hardware for a competitive advantage and second, the announcment itself as catalyst to trigger migration.

So, future EMR, may also have an integrated storage as well as other hardware optimization, making Hadoop more efficient on AWS and good if so.  Same goes to RDS and other current and future PaaS-related services.

Do I accuse Amazon on wrongdoing? Of course not! They brought the cloud to the main-street while others were only talking about it they and made large-scale computing affordable to all and continue dropping prices passing their economies-of-scale savings on customers and also keeps optimizing and enhancing their infrastructure constantly, and were good to their shareholders also. However, as any proprietary and monopolistic platform,  they do hinder some outside-of-Amazon innovation. No matter how good they are, we don’t want only one company in the world doing cloud-infrastructure stuff for the rest. That’s why, OpenStack are so extremely important for the industry. If OpenStack will be widely adopted then infrastructural and “tooling” kind of innovation could go directly into OpenStack for greater good and fairer monetization model for the author.









OpenDremel update and Dremel vs. Tenzing

I wasn’t blogged for whole 2011 year… I’m not dead, quite on contrary, we were pretty active with OpenDremel project in 2011. First, we are renaming it to Dazo to avoid using a trademarked name and second, we did a good job implementing a secure generic execution engine and integrating it into OpenStack Swift. It also came out, that the engine is actually quite useful virtualization technology in itself and it could potentially deserve a better fate than being buried as OpenDremel subcomponent. So, we do plan to release it as independent project and are quite busy with that now, so the work on OpenDremel is all but stalled unfortunately. As for storage infrastructure we settled with OpenStack Swift, we falled in love with Swift from the day it was released and now after we have integrated ZeroVM into it we even like it even more. So right now, we have fully salable storage backend with the unique capability to run securely any arbitrary native code inside, close to data. Now, what’s left is to take our old Metaxa Query Compiler and integrate it with that backend and then after many iterations it would bake into something pretty similar to Google Dremel and BigQuery. Even better, it will always process data locally (not sure BigQuery does it now) and it will not be limited to BQL on nested records, but for any query on any data and with full multi-tenant semantics. That’s how interesting 2011 was…

It was a preamble now back to the main feature:

Google released a paper on Tenzing last year on VLDB. Tenzing is an SQL query-system implemented on top of MapReduce infrastructure and it can be thought as Google-way to do Hive and as always full of juicy details. There is already a quality post on this published and another one here. On top of that my additional takeways are:

1. It is possible to build MPP-grade system on top of MapReduce with relatively low-latency (10 seconds). However, it would requires quite a number of patches to MapReduce. Hive and Hadoop has certainly a lot to learn from Tenzing.

2. Even with Google version of a patched and leaner-than-Hadoop implementation of MapReduce getting it to Dremel latencies was not achievable. On other hand 10 seconds as minimal latency is not that bad and in same ball park as Netezza/Greenplum/Aster and other MPP gear.

3. As general Sawzall vs. Dremel vs. Tenzing comparison there is an nice youtube-datawarehousing presentation published. In fact, Dremel beats both of them on latency and if not only for limited expressive power of its query language it would end up as complete winner on all metrics considered there. Sawzall having imperative query language scores highest on the power metric. I guess when OpenDremel will be released it will be a unique combination of low-latency querying with the full expressive power of imperatively-augmented SQL.

4. Tenzing can query MySQL databases as many other popular data formats. What we witnessing here is that query-engines is being decoupled from storage engines. 10 years ago it was only the case for MySQL ecosystem and anyone who tried Oracle external table interface knows how friendly past DBMSes were to external data sources.  Dremel columnar encoding component was released internally in Google as separate ColumnIO storage engine. Then Google open-sourced their key-value LevelDB engine a-la Hadoop’s RCFiles. So we can learn here of emergence of multiple storage-engines working with multiple query engines, quite interesting phenomenon.

5. The query is compiled into native code (with LLVM) and this gave significant acceleration by factor from six to twelve. This means that SQL to native code compilation is a must for high-performance BigData query engines.


Upcoming hardware renaissance era: part #2.

Some examples of upcoming hardware renaissance era:

1. Virtually all server vendors are pitching modularized data centers by now. MDC are boxes resembling shipping containers accommodating complete vritualized data-center inside. With MDC one just connects power, network and chilled water and gets access to the cloud in the box. Most MDC are good to be deployed outside and have built-in protection against weather elements. Of course all current offering are based on x86 commodity servers but here is a hint: once competition moves to comparing whose shipping container can stuff more storage and computing power inside and who has better price/performance and energy efficiency, we will see innovation in hardware skyrocketing.

2. On processor front…. ARM architecture has all ingredients to become next Intel. If I am not mistaken, ARM processors are outnumbering x86 10 to 1 with tens of billions of processors shipped. 95% of cellphones and advanced gadgets use ARM. ARM power efficiency puts x86 in shame. However, till now ARM was focused on gadgets and dismissed data-center market. Not anymore! With newer Cortex-A15 ARM took aim at x86 on datacenter territory. Calxeda already got ~$50M in venture money for commercializing ARM in datacenter. However, ARM is not alone here, Tilera with their server-vendor partner Quanta are already shipping 512 core server in 2U form-factor. Tilera took lean MIPS processor core and put some 100 of them into single die together with x8 10Gbit Ethernet channels and four DDR3 memory channels. Nvidia also claimed that they are not GPU-only vendor anymore and are readying general-purpose processors based on ARM architecture with ample amount of GPU muscle inside. That said Intel and AMD are also far from stagnating and moving into heterogeneous many-core designs. I think we never witnessed more innovation in processor space than now.

3. On memory front… Flash is making inroads to claim space in memory-hierarchy between DRAM and HDD. Disrupting DRAM market and high-performance 15K RPM HDD market. I think 15K RPM HDD and DRAM-based SSD products are already safe to be declared dead. Same about HDD smaller than 2.5 inch form-factor. I even think 2.5 HDD are also in risk. Only capacity-optimized HDDs would survive. Even without flash, the DRAM got such capacity that most datasets fits in RAM completely. And if not in RAM of single server than it surely fits in shared cluster RAM. This solid-memory advancements in DRAM and Flash disrupts storage market, especially making high-performance SAN redundant. The only storage tomorrow server will need is capacity-optimized and energy-optimized ones. That fact among other forced EMC to move into computing… and provide complete cloud in the box instead just storage in the box like it did in the past.

4. Networking… in my view networking is most stagnating hardware market here. Infiniband finally moves into mainstream and it is good. Does it? Or it will succumb to 10GbE? Remains to be seen. My bet is on Infiniband due to architectural superiority. Networking virtualization is still on whiteboards… unfortunately. So in networking there is no signs of renaissance but the potential is there.

Emerging Proprietary Hardware Renaissance

I cannot count number of times I heard that cloud computing means innovation stagnation in the proprietary hardware business and that with cloud computing, hardware doesn’t matter anymore and will succumb sooner or later into boring razor-thin-margins oligopolistic commodity industry.

Why folks think like that? Well… there is one reason that dominates their thinking – hardware products became components and worse of all they became a well standardized components. And as such, certain low-wage countries can quickly master how to assemble them in large quantities and win competition purely on cost and nothing else seriously matters in component business. No one carries about component extended enterprise feature set and premium brand and long list of ISV partnerships and etc… what matters is very well defined functionality and price, price and price. In fact it already happened to low-end gear like entry-level basic servers and routers. The situation is very different for enterprise IT products. I estimate that if IT costs will be tripled overnight for most enterprises it would not matter in their bottom line. So IT departments of most enterprises are cost-insensitive regarding IT gear. They will not blindly overpay in most cases but cost is not high in their priority list either. IT for most enterprises is more or less fixed cost amortized among very large number of its products or services. I guess if Coca Cola IT was tripled the price of single can of coke should be elevated for less than a cent. Unlikely that it is life-threatening to their business. Therefore the game was and largely still is to market hardware products directly to enterprise IT departments competing on enterprise feature set rather than price. Now with emerging cloud computing paradigm hardware products are components and are marketed to cloud operators which are essentially server farmers. And they are extremely cost-sensitive and marketing to them premium computing gear will be as successful as marketing a premium booze as fuel to alcohol-fueled car owner. If for cloud operator server costs will be tripled overnight the next morning it would be out of business. So without a doubt it is a game over for fat margins in hardware manufacturing/assembly business. What happened to enterprise software 5 years ago is happening to enterprise hardware right now – commoditization.

So the common thinking goes, in the boring commodity business no one is going to invest in innovation so no one is going to invest much into proprietary hardware because no cloud vendor is going to buy premium hardware. The only hope is private cloud as a freshly invented loophole to continue to sell premium gear to enterprises. Well lets consider the following situation. XYZ startup manages to build a proprietary appliance which is essentially a cloud-in-the-box solution that through tight internal integration and optimization for one particular task achieves an order-of-magnitude better price-performance.Lets say it is an KV-store appliance. Would cloud-operators be interested? I bet they will. From outside it doesn’t matter if the functionality are backed up by cassandra-on-generic hardware or custom-hardware-appliance. So the cloud provider quietly rolling out such appliances can compete well both on price and on functionality (like latency) with other cloud providers running software-based KV-store on generic hardware. Other startup ABC may produce a computing appliance that can run ruby-on-rail or java or pyton applications order-of-magnitude more efficiently than a generic hardware and a cloud provider deploying such appliances from ABC startup could compete better serving RoR clients. Yet another startup may build a custom rack-sized box filled with fermi chip specifically designed for video processing. I can bring more examples but the trend is obvious. Use large chunk of dedicated hardware to do one specific task and do it extremely efficiently and you can have nice margins as hardware manufacturer. Before cloud – hardware must be generic because single enterprise server should be able to run a variety of different workloads. With cloud computing it is not longer the case. A vendor can build a dedicated hardware appliance optimized to do one specific workload and serve the whole world with it raising high barriers for the competitors. So despite popular belief I think cloud computing presents unique opportunities to creative hardware engineers, tough not in premium enterprise-feature set area as it used to be but in extreme-efficiency, acceleration and specialization areas.