注:本記事は(2021年10月19日)に公開された(Bringing More to the Table: Azure and UDTF Support with Snowpark)を翻訳して公開したものです。

6月、SnowflakeはAWSでJava UDFとSnowpark APIがプレビュー版で提供開始されたことを発表しましたが、そのプレビュー版にいくつかの追加事項があります。本日はその件についてお知らせいたします。Snowparkの提供地域と機能が拡大します。



本日の発表は、その目標に向けた当社の新たな一歩となる内容です。Java UDFとSnowpark APIがプレビュー版としてAzure全体でご利用いただけるようになりました。



Java UDFへのサポートも大きく拡大し、AWSとAzureの両方でテーブル関数のプレビュー版が提供されることになります。



  • 各入力行に対して複数行を返す
  • 複数行に渡って状態を維持する
  • 行のグループに対して単一の結果を返す






Moby DickHerman MelvilleCall me Ishmael.
Billy Budd, SailorHerman MelvilleIn the time before steamships, or then more frequently than now, a stroller along the docks of any considerable seaport would occasionally have his attention arrested by a group of bronzed mariners, man-of-war’s men or merchant sailors in holiday attire ashore on liberty.
Gravity’s RainbowThomas PynchonA screaming comes across the sky.
Inherent ViceThomas PynchonShe came along the alley and up the back steps the way she always used to.
A Portrait of the Artist as a Young ManJames JoyceOnce upon a time and a very good time it was there was a moocow coming down along the road and this moocow that was coming down along the road met a nicens little boy named baby tuckoo.
UlyssesJames JoyceStately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed.



  • システムがテーブル関数のスキーマを発見するために使用する静的なgetOutputClass()メソッド。(これが必要な理由は、Javaの型消去に起因します。つまり Stream<OutputRow>は事実上、Stream<Object>にコンパイルされます。これについては別の機会に取り上げたいと思います)。
  • クエリ内の各パーティションに関してシステムが呼び出すデフォルトのコンストラクタ。スカラー関数では、ハンドラメソッドが同時に呼び出される場合があり、クラスの状態(state)が変更されることはありませんが、テーブル関数のprocess()メソッドは、パーティション内の各行に対して連続的に呼び出され、状態を累積する可能性があります。
  • process()メソッド。1行を受け取り、それに対して必要な処理を実行し、潜在的に空の結果のストリームを返します。その後システムがこれを入力行と関連付けます。
  • endPartition()メソッド。パーティションの最後で呼び出され、行のストリームを返します。システムは、それらの行をパーティション内の特定の行ではなく、パーティション全体と関連付けます。




import java.util.HashMap;
import java.util.Map;
import java.util.stream.Stream;
public class WordCount {
 // A class to define the schema of our output: pairings of words
 // and counts.
 static class OutputRow {
   public final String word;
   public final int count;
   public OutputRow(String word, int count) {
     this.word = word;
     this.count = count;
 // The getOutputClass routine is what actually sets this class as
 // the return type for the table function.
 public static Class getOutputClass() {
   return OutputRow.class;



 // Keeps partition-wide counts for each word seen.
 private final Map wordCounts;
 // Each partition will have a new instance of WordCount
 // constructed, and we are allowed to maintain state over the
 // partition. In this case, we're going to start the partition with
 // an empty map, and keep running counts as we go.
 public WordCount() {
   wordCounts = new HashMap<>();


 // The process method is called on each record. In this case, we'll
 // split up the line and add the words to our partition-wide count.
 // This method could return per-row values if we wished, but in our
 // case, we'll just return an empty Stream because our results are
 // really for the partition as a whole.
 public Stream process(String text) {
   // Update the counts with the words in this line.
   for (String word : text.toLowerCase().split("[.,!\"\\s]+")) {
     // If we don't have an entry for the word, set the count to 1.
     // Otherwise, increment the count.
     wordCounts.compute(word, (key, value) ->
       (value == null) ? 1 : value + 1);
   // We're waiting until the end of the partition to return per-
   // word counts and return nothing here.
   return Stream.empty();


 // The endPartition routine is called at the end of the partition.
 // In our case, this will return the total counts across all lines
 // in the input.
 public Stream endPartition() {
   // Stream back the word counts for the whole partition. Calling
   // stream() on the keySet enables us to iterate lazily over the
   // contents of the map.
   return wordCounts.keySet().stream().map(word ->
     new OutputRow(word, wordCounts.get(word)));


create or replace function wordcount(s string)
returns table(word string, count int)
language java
handler = 'WordCount'
import java.util.HashMap;
import java.util.Map;
import java.util.stream.Stream;
public class WordCount {
 // A class to define the schema of our output: pairings of words
 // and counts.
 static class OutputRow {
   public final String word;
   public final int count;
   public OutputRow(String word, int count) {
     this.word = word;
     this.count = count;
 // The getOutputClass routine is what actually sets this class as
 // the return type for the table function.
 public static Class getOutputClass() {
   return OutputRow.class;
 // Keeps partition-wide counts for each word seen.
 private final Map wordCounts;
 // Each partition will have a new instance of WordCount
 // constructed, and we are allowed to maintain state over the
 // partition. In this case, we're going to start the partition with
 // an empty map, and keep running counts as we go.
 public WordCount() {
   wordCounts = new HashMap<>();
 // The process method is called on each record. In this case, we'll
 // split up the line and add the words to our partition-wide count.
 // This method could return per-row values if we wished, but in our
 // case, we'll just return an empty Stream because our results are
 // really for the partition as a whole.
 public Stream process(String text) {
   // Update the counts with the words in this line.
   for (String word : text.toLowerCase().split("[.,!\"\\s]+")) {
     // If we don't have an entry for the word, set the count to 1.
     // Otherwise, increment the count.
     wordCounts.compute(word, (key, value) ->
       (value == null) ? 1 : value + 1);
   // We're waiting until the end of the partition to return per-
   // word counts and return nothing here.
   return Stream.empty();
 // The endPartition routine is called at the end of the partition.
 // In our case, this will return the total counts across all lines
 // in the input.
 public Stream endPartition() {
   // Stream back the word counts for the whole partition. Calling
   // stream() on the keySet enables us to iterate lazily over the
   // contents of the map.
   return wordCounts.keySet().stream().map(word ->
     new OutputRow(word, wordCounts.get(word)));


select author, word, count
from books,
   table(wordcount(books.first_line) over (partition by author))
order by count desc;





Javaテーブル関数の使用を開始するには、ドキュメンテーションをご確認ください。また、Java UDFSnowpark APIに関するドキュメンテーションも併せてご確認ください。

Happy hacking!