In Feuertip #22, I introduced a really special and useful type of function: the table function. It’s a function you can query from as if it were a relational table. How cool is that?
In Feuertip #23 (this one right below), I move on to an even more special type of table function: the pipelined table function.
I’ll refer to it as PTF for the duration of this post.
A PTF addresses two significant drawbacks to the “regular” table function:
- The SQL engine has to stop and wait for the table function to execute and return its collection before continuing. That’s to be expected, but what if you are executing a parallel query? It can wreak havoc by causing serialization.
- You build and return a collection. If you have lots of data, you have a big collection….which is stored in your session-specific memory, a.k.a., Process Global Area or PGA. These have limits. You could blow the limit and make users unhappy.
The basic idea behind a PTF is that instead of building and returning the collection, which is then traded like a table, you “pipe” back each piece of data to the calling query instead of adding it to the collection.
Let’s take a look at the differences between the table function and the PTF. Here’s a very simple demonstration:
create or replace type list_of_names_t is table of varchar2(100) / create or replace function big_brained_species return list_of_names_t is l_list list_of_names_t := list_of_names_t (); begin l_list.extend(3); l_list(1) := 'Octopus'; l_list(2) := 'Raven'; l_list(3) := 'Human'; return l_list; end; /
And here’s that same silly function in its pipelined version:
create or replace function big_brained_species return list_of_names_t pipelined is begin pipe row ('Octopus'); pipe row ('Raven'); pipe row ('Human'); return; end; / select column_value species_name from table (big_brained_species ()) order by species_name desc / SPECIES_NAME ------------ Raven Octopus Human
And from this very basic example, you see all the key features of a PTF:
- The function header must include the pipelined keyword
- No need to declare a local collection (and consume PGA)
- Use pipe row to send that data back to the calling query
- Execute a return statement that returns control but no data
Of course, you can pipe out more than a scalar value. In fact, pipelined table functions usually return multiple columns of information. This is often done by relying on a nested table of object types.
Here’s an example:
create type ticker_ot as object ( ticker varchar2(20), pricedate date, pricetype varchar2(1), price number ); / create type tickers_nt as table of ticker_ot; / create or replace package stock_mgr authid definer is type stocks_rc is ref cursor return stocks%ROWTYPE; end stock_mgr; / create or replace function doubled_pl ( rows_in stock_mgr.stocks_rc ) return tickers_nt pipelined authid definer is type stocks_aat is table of stocks%ROWTYPE index by pls_integer; l_stocks stocks_aat; begin loop fetch rows_in bulk collect into l_stocks limit 100; exit when l_stocks.COUNT = 0; for l_row in 1..l_stocks.COUNT loop pipe row ( ticker_ot(l_stocks(l_row).ticker, l_stocks(l_row).trade_date, 'O', l_stocks(l_row).open_price) ); pipe row ( ticker_ot(l_stocks(l_row).ticker, l_stocks(l_row).trade_date, 'C', l_stocks(l_row).close_price) ); end loop; end loop; return; end; /
Performance and memory impact of PTFs
I don’t have a parallel-enabled database handy, but I can still show you the difference a PTF makes over a regular, old table function.
I’m not going to include all the code in this post. It’s available in the LiveSQL script down below. The “executive summary” is that I am using a table function to move rows from the stocks table to the tickers table, doubling the count of rows along the way. In my test block, I will simply add a “rownum < 10” where clause:
begin utils.initialize('Pipelined'); insert into tickers select * from table ( doubled_pl(cursor( select * from stocks )) ) where rownum < 10; utils.show_results('First 9 rows'); utils.initialize('Not Pipelined'); insert into tickers select * from table ( doubled_nopl(cursor( select * from stocks )) ) where rownum < 10; utils.show_results('First 9 rows'); end; /
Here are the results:
Pipelined Ticker row count: 9 "First 9 rows" completed in: 18 centisecs; pga at: 327680 bytes Not Pipelined Ticker row count: 9 "First 9 rows" completed in: 517 centisecs; pga at: 1287847936 bytes
It should be clear what’s going on here: with the pipelined table function, each row is piped immediately back. So as soon as the calling query gets 9 rows, it stops executing the PTF and proceeds. With the non-pipelined version, the calling query has to wait and wait and wait while all 1M rows in stocks are doubled to 2M rows, then they are passed back to the query.
At which point the SQL engine says: “Just want those first 9, thanks!” and ignores all the rest. But oh, the price that is paid along the way …
That’s rather convincing, is it not? Lots less memory and lots faster to get the first 9 rows!
Always go pipelined?
So why not always use a pipelined table function? I can think of these possible reasons why not:
- You can only call it from inside the from clause of a select statement. You cannot call it directly in PL/SQL.
- You don’t need the pipelining (no parallel query, PGA consumption not an issue)
- The pipe row etc. syntax is weird, and you don’t want future developers maintaining your code feeling weird about the code that they have to maintain
- If a pipelined table function is slower than a regular table function
OK, remember, I said “possible,” so first let’s examine #4 – could it be that a PTF pays a performance penalty?
I modified the previous performance test block to repeatedly query from the pipelined and non-pipelined table functions.
declare l_iterations integer := 10000; l_count integer := 0; begin utils.initialize('Pipelined'); for indx in 1..l_iterations loop for rec in ( select * from table ( doubled_pl(cursor( select * from stocks where rownum < 100 )) ) ) loop l_count := l_count + 1; end loop; end loop; dbms_output.put_line(l_count); utils.show_results(l_iterations|| ' invocations'); utils.initialize('Not Pipelined'); for indx in 1..l_iterations loop for rec in ( select * from table ( doubled_nopl(cursor( select * from stocks where rownum < 100 )) ) ) loop l_count := l_count + 1; end loop; end loop; dbms_output.put_line(l_count); utils.show_results(l_iterations|| ' invocations'); end; /
Here are the results:
Pipelined 1980000 Ticker row count: 0 "10000 invocations" completed in: 753 centisecs; pga at: 65536 bytes Not Pipelined 3960000 Ticker row count: 0 "10000 invocations" completed in: 1004 centisecs; pga at: 56098816 bytes
Now, I confess this is a rough and incomplete performance test, but still, I believe it gives us sufficient confidence to say that you will not sacrifice performance by switching to pipelining, even if you do not benefit directly from the asynchronous return of data to the calling query.
How about the pipelining syntax being weird and off-putting to future developers?
First, I doubt it will be that much of an issue. Second, professional software developers should know how to make the most of the language they work with. If they see unfamiliar syntax, they should take it in stride and learn what that syntax does for them. In short order, it will be more than familiar enough.
LiveSQL tutorial demonstrating performance benefit
Note: the performance code will not actually compile and run in LiveSQL since it uses some v$ views to show memory allocation. Specifically: v$mystat and v$statname