@ebyhr

Posts

Showing posts from 2019

Presto Conference Tokyo 2019

7/11に開催されたPresto Conference Tokyo 2019について書こう書こうと思いつつ放置していたところ、ちょうど3ヶ月後の10/11にコミッターになったので、ご報告もかねて下書きを開きました。この記事では当日話そうと思ってスライドから削った部分や最近のコミュニティについて書きたいと思います。現在 Presto Software Foundation (PSF)とPresto Foundationという2つの組織があり、前者はPrestoを最初に作り始めたクリエイター達および Starburst のメンバーを中心に、 Arm Treasure Data 、 Varada 、 Qubole などその他にも多くの企業・開発者から支持されながら運営されています。後者はFacebookを中心にTwitter, Uber, Alibabaが支持していて、Linux Foundationにホストされることが先日発表されました。こう書くとどららを選ぶべきか悩むかもしれませんが、前者の方が開発の速度は早くコミュニティが非常に活発に動いてるので、特別な理由がなければPSF側のPrestoを使用することやコミュニティへの参加をお勧めします。Facebook側のリポジトリやSlackも見るようにしているのですが、対応が遅く残念な気持ちになります。メーリングリストは両者で同じアドレスが使用されているのですが、回答者の多くはPSFのコミュニティメンバーなのでSlackで直接質問するとすぐ回答を得られます。 PSFのSlackにはこちらのページにあるリンクから参加できます。 https://prestosql.io/slack.html チャンネルは結構多くて戸惑いそうですが、個人的にお勧めするチャンネルは以下の通りです。 #troubleshooting #generalで質問しても問題ないのですがトラブル等はこちらで質問すると素早く回答を得ることができます #community-announcement ミートアップなどの情報がポストされます #dev 開発に興味がある方はぜひ！ #general-jp 日本語で気軽に話せるチャンネルです実際に開発に参加しなくても、もっと日本からコミュニティに参加してくれる方が増えてくれるとと...

Matsushima 2019

7月に休みをとって宮城の松島海岸に1泊2日で旅行してきました。いつも一人で旅行に行くときは本を持っていくのですが、今回は本屋で目にとまった「マチネの終わりに」と共に。恋愛小説を読むのは初めてでしたが、いわゆるドロドロとした内容で読んでいて辛い部分もありつつも先が気になる展開が続き、結局初日の夕ご飯前には読み切ってしまいました。作中で何度か出てくる「未来は常に過去を変えている」という文章がとても心に残っています。良かった思い出がふとしたきっかけで悲しい思い出になったり、その逆もあったりしますもんね。ギタリストと国際ジャーナリストの恋愛ということもあり、芸術、イラク情勢、原爆やサブプライムローンなど様々な話題が散りばめられていて、読んだあとに自分でもう一度学びたいと感じる本でした。11月1日には映画が公開される予定とのことで、できれば初日に観に行きたいなぁと思っています。松島湾福浦橋福浦島遊覧船からの眺め松島自体は初日は小雨が降っていて少し残念でしたが綺麗なアジサイの写真が撮れて満足です。行きの新幹線と仙台駅でご飯を食べ過ぎたこともあり早めにホテルへチェックインしてのんびりしてました。2日目は朝露天風呂に入っていたら松島湾が眩し過ぎて目がちゃんと開かないぐらいには天気が良かったです。チェックアウトしてからは海岸通りをぶらぶらして福浦島に向かいました。木が生い茂っていてちょっとしたジャングルみたいで散策を楽しめました。松島湾を1周する遊覧船に乗ろうと思ってたのですが、受付の方に仙台方面に戻るのであれば電車が少ないので塩川港まで行くルートがお勧めですよと教えていただき、そっちに乗ってみました。最後に仙台駅でお寿司を食べて新幹線で帰宅です。お寿司以外にもたくさん美味しいもの、牛タン、穴子、牡蠣、笹かまぼこ、ずんだジェラート、ずんだシェイクなどなどを食べて終始満腹でした。

Machine Learning Connector in Presto

This is quick tutorial for presto-ml connector. The connector isn't maintenanced actively and the supported model is only SVM. You can see below sample query in the test directory. As the same as Teradata Aster and BigQuery ML, there're two kinds of functions. learn_classifier: receives training data and generates the model classify: receives the model & test data and returns the prediction SELECT classify (features(1, 2), model) FROM ( SELECT learn_classifier (labels, features) AS model FROM ( VALUES (1, features(1, 2))) t(labels, features) ) t2 → 1 SELECT classify (features(1, 2), model) FROM ( SELECT learn_classifier (labels, features) AS model FROM ( VALUES ('cat', features(1, 2))) t(labels, features) ) t2 → 'cat' Let's try using Iris data sets. CREATE TABLE iris ( id int , sepal_length double , sepal_width double , petal_length double , petal_width double , species varchar ) INSERT INT...

INSERT OVERWRITE in Presto

If you are hive user and ETL developer, you may see a lot of INSERT OVERWRITE. Though it's not yet documented, Presto also supports OVERWRITE mode for partitioned table. Currently, there are 3 modes, OVERWRITE, APPEND and ERROR. OVERWRITE overwrites existing partition. APPEND appends rows in existing partition. ERROR fails when the partition already exists. You can change the mode by set session command. set session hive.insert_existing_partitions_behavior = 'overwrite'; set session hive.insert_existing_partitions_behavior = 'append'; set session hive.insert_existing_partitions_behavior = 'error'; The enhanced feature for an unpartitioned table is ongoing in this PR ( https://github.com/prestosql/presto/pull/648 ) by James Xu . The enhancement was merged as https://github.com/prestosql/presto/pull/924

MSCK in Trino

Presto SQL release 304 contains new procedure system.sync_partition_metadata() developed by @luohao . This is similar to hive's MSCK REPAIR TABLE . Document about Hive Connector Procedures is https://prestosql.io/docs/current/connector/hive.html#procedures The syntax is `system.sync_partition_metadata(schema_name, table_name, mode)`. The supported mode are add, drop and full. Example query is call system.sync_partition_metadata('default', 'test_partition', 'add'); call system.sync_partition_metadata('default', 'test_partition', 'drop'); call system.sync_partition_metadata('default', 'test_partition', 'full'); # Mode DROP hive> create table default.test_partition (c1 int) partitioned by (dt string); hive> insert overwrite table default.test_partition partition(dt = '20190101') values (1); hive> dfs -mv hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/dt=20190101 /...