@ebyhr

Posts

Showing posts from April, 2019

Machine Learning Connector in Presto

This is quick tutorial for presto-ml connector. The connector isn't maintenanced actively and the supported model is only SVM. You can see below sample query in the test directory. As the same as Teradata Aster and BigQuery ML, there're two kinds of functions. learn_classifier: receives training data and generates the model classify: receives the model & test data and returns the prediction SELECT classify (features(1, 2), model) FROM ( SELECT learn_classifier (labels, features) AS model FROM ( VALUES (1, features(1, 2))) t(labels, features) ) t2 → 1 SELECT classify (features(1, 2), model) FROM ( SELECT learn_classifier (labels, features) AS model FROM ( VALUES ('cat', features(1, 2))) t(labels, features) ) t2 → 'cat' Let's try using Iris data sets. CREATE TABLE iris ( id int , sepal_length double , sepal_width double , petal_length double , petal_width double , species varchar ) INSERT INT...

INSERT OVERWRITE in Presto

If you are hive user and ETL developer, you may see a lot of INSERT OVERWRITE. Though it's not yet documented, Presto also supports OVERWRITE mode for partitioned table. Currently, there are 3 modes, OVERWRITE, APPEND and ERROR. OVERWRITE overwrites existing partition. APPEND appends rows in existing partition. ERROR fails when the partition already exists. You can change the mode by set session command. set session hive.insert_existing_partitions_behavior = 'overwrite'; set session hive.insert_existing_partitions_behavior = 'append'; set session hive.insert_existing_partitions_behavior = 'error'; The enhanced feature for an unpartitioned table is ongoing in this PR ( https://github.com/prestosql/presto/pull/648 ) by James Xu . The enhancement was merged as https://github.com/prestosql/presto/pull/924