Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality for Real-Time Data Processing

DBSP: Automatic Incremental View Maintenance
Kostas Mparmparousis
University of Athens
Athens, Greece
mpkostas@uoa.gr
Panagiotis Dimakopoulos
University of Athens
Athens, Greece
panosdimako@uoa.gr
Abstract
In the evolving landscape of data processing, incremental computa-
tion plays a crucial role in optimizing performance and efficiency.
The DBSP (Database Stream Processor) framework offers a compre-
hensive solution for incremental computation by processing data
streams through a specialized language and algorithms. Building
upon the principles of DBSP, the Feldera Continuous Analytics
Platform (Feldera Platform) advances this capability by providing a
high-performance computational engine for continuous analytics
over dynamic data. Feldera allows users to configure data pipelines
as standing SQL programs (DDLs), which are continuously evalu-
ated as new data arrives, enabling real-time data analytics.
A distinguishing feature of Feldera is its ability to evaluate arbi-
trary SQL programs incrementally, which enhances both expressive-
ness and performance compared to traditional streaming engines.
This functionality abstracts the complexities of querying changing
data, allowing software engineers and data scientists to focus on
business logic rather than the intricacies of incremental computa-
tion.
As part of this project, our contributions include enhancing the
platform’s User-defined Functions (UDFs) to support inline table
queries and extending the functionality of INSERT INTO statements
to incorporate aggregate functions. These enhancements provide
users with greater flexibility and power in defining complex data
transformations and analytics directly within SQL. By supporting
these advanced operations, the platform further optimizes perfor-
mance and scalability. Additionally, we have attempted to integrate
Rust-based UDFs into the Feldera platform, in order to enable de-
velopers to leverage Rust’s performance benefits directly within
SQL programs.
These improvements significantly bolster Feldera’s capability
to handle sophisticated real-time data analytics, making it a more
robust solution for continuous data processing needs.
CCS Concepts
• Information systems → Stream management; Database views;
Database query processing; • Software and its engineering →
Real-time systems software.
Keywords
Incremental View Maintenance, DBSP, Feldera Platform, Real-time
Streaming Analytics, UDFs, Insert Into with Aggregates, Rust-Based
Functions
1 Introduction
In the realm of database management, incremental view mainte-
nance (IVM) stands as a critical challenge. The task involves main-
taining the contents of a view, defined by a query on a database, effi-
ciently as the database undergoes changes. Traditional approaches
often reevaluate the entire query, but with large databases, this
can be inefficient. Hence, there is a need for more sophisticated
methods that optimize the computation over incremental changes.
This report explores a novel approach to IVM through the Data-
base Stream Processor (DBSP) framework and its application within
the Feldera Continuous Analytics Platform. DBSP leverages princi-
ples from Digital Signal Processing (DSP) to model changes over
time as streams, providing an efficient and expressive way to com-
pute incremental views.
The Feldera Continuous Analytics Platform builds upon DBSP,
offering a robust engine for continuous analytics over dynamic data
streams. It enables users to configure data pipelines as standing
SQL programs (DDLs) that are continuously evaluated with incom-
ing data, thereby facilitating real-time analytics and data-driven
decision-making.
1.1 Our Contribution
Within the context of the Feldera platform and DBSP framework,
our team has made significant contributions aimed at enhancing
data processing capabilities:
• Enhanced User-defined Functions (UDFs): We extended
UDFs to support inline table queries, enabling more com-
plex and flexible data transformations directly within SQL.
• Extended INSERT INTO Statements: We introduced sup-
port for aggregate functions in INSERT INTO statements,
allowing for sophisticated data manipulations and analyt-
ics.
• Integration of Rust-based UDFs: We aimed to advance
the platform by enabling User-defined Functions (UDFs)
to be written in Rust. This capability opens new doors for
performance-oriented functions and integrates seamlessly
into the existing SQL-to-DBSP compiler workflow.
These contributions enhance the utility and performance of the
Feldera platform, empowering users to leverage advanced data
processing techniques seamlessly. By combining theoretical foun-
dations with practical implementations, our work contributes to
the evolution of incremental computation and real-time analytics.
2 Current Implementations and Limitations in
Feldera
Feldera is a robust platform that supports various functionalities
including User-defined Functions (UDFs), INSERT INTO statements,
and Rust-based UDFs. This section explores the current capabilities
and limitations of each feature within the Feldera ecosystem.

Mparmparousis and Dimakopoulos
2.1 User-defined Functions (UDFs)
User-defined Functions (UDFs) in Feldera allow developers to ex-
tend SQL capabilities with custom logic. Currently, Feldera sup-
ports UDFs written in SQL, enabling complex computations and
data transformations directly within SQL queries. Here are some
key aspects of UDFs in Feldera:
2.1.1 UDFs in SQL. UDFs in SQL within Feldera can be defined
and utilized as follows:
-- Example: Define a UDF to calculate the area of a circle
CREATE FUNCTION CalculateArea(radius DECIMAL)
RETURNS DECIMAL
AS (3.14159 * radius * radius);
UDFs like CalculateArea can then be used in SQL queries:
SELECT id, CalculateArea(radius) AS area
FROM circles;
2.1.2 Limitations. While powerful, UDFs in Feldera have certain
limitations:
• They cannot contain SQL queries within their bodies.
• They are inline functions, meaning the compiler incorpo-
rates their logic directly into the calling SQL code, which
can affect performance and maintainability.
These limitations restrict the complexity and types of operations
UDFs can perform within the Feldera platform.
2.2 INSERT INTO Statements
INSERT INTO statements in Feldera are essential for adding new
data into tables. They support various forms of data insertion:
2.2.1 Supported Operations.
• Table Scans: Directly inserting data from another table.
Example:
INSERT INTO target_table (column1, column2)
SELECT source_column1, source_column2
FROM source_table;
• Value Insertion: Inserting specific values into a table.
Example:
INSERT INTO target_table (column1, column2)
VALUES (value1, value2), (value3, value4);
2.2.2 Limitations. However, there are limitations to INSERT INTO
statements in Feldera:
• They do not support using aggregate functions in their SE-
LECT statements directly within the INSERT INTO clause.
Example of unsupported operations:
-- Unsupported: Using DISTINCT in INSERT INTO
INSERT INTO tmp (user_age)
SELECT DISTINCT age FROM persons;
-- Unsupported: Using COUNT(*) and GROUP BY in
INSERT INTO
INSERT INTO tmp (user_age)
SELECT COUNT(*) FROM persons GROUP BY age;
These restrictions ensure data integrity and align with Feldera’s
architecture but may limit certain advanced data manipulation
tasks.
2.3 UDFs in Rust
Feldera is also exploring the integration of User-defined Functions
(UDFs) written in Rust, a systems programming language known
for its performance and safety guarantees.
2.3.1 Rust-based UDFs. Here’s an example of how Rust-based
UDFs might be integrated into Feldera:
use sqllib::*;
pub fn calculate_average(numbers: &[i32]) -> f64 {
let sum: i32 = numbers.iter().sum();
let count = numbers.len() as f64;
sum as f64 / count
}
With Feldera’s SQL-to-DBSP compiler, Rust-based UDFs can
potentially be integrated as follows:
./sql-to-dbsp program.sql --udf rust_functions.rs --output
program.dbsp
This feature is under development and not yet available on
Feldera’s web platform or API.
2.4 Conclusion
In conclusion, this section has provided an overview of the cur-
rent implementations and limitations of Feldera concerning User-
defined Functions (UDFs), INSERT INTO statements, and the in-
tegration of Rust-based UDFs. Despite the constraints observed
with INSERT INTO statements and the ongoing development of
Rust-based UDFs, Feldera has demonstrated robust capabilities in
handling UDFs within SQL, particularly with our enhancement to
support inline table queries. This advancement allows for more
intricate and adaptable data transformations directly within SQL,
underscoring our contribution to extending the functionality of
UDFs in Feldera.
In the subsequent sections, we will delve deeper into each contri-
bution, detailing the methodologies employed, challenges encoun-
tered, and the impact of these enhancements within the broader
framework of Feldera.
3 Enhancing UDFs in Feldera
Initially, our goal was to enable robust support for SQL table queries
and multi-statement capabilities within User-Defined Functions

(UDFs) in Feldera. However, upon delving into Feldera’s UDF com-
pilation process, we discovered that functions are presently re-
stricted to inline methods without accommodating intermediate
representations. This realization prompted us to pivot strategically
by concentrating on enhancing UDFs through the direct integration
of inline table queries within SQL programs.
3.1 Understanding Feldera’s UDF Compilation
Feldera employs a proxy-based method for compiling User-Defined
Functions (UDFs), which involves creating intermediary structures
to manage user-defined logic within SQL queries. Here’s a detailed
explanation of how this approach works:
3.2 Function Definition and Compilation
When defining a UDF in Feldera, you specify:
CREATE FUNCTION fun(a type0, b type1) RETURNS type2 AS
expression
• Function Name and Parameters: The function is named
fun, and it accepts parameters a of type type0 and b of
type type1.
• Function Body (expression): This contains the logic that
computes the result based on the input parameters.
After defining the function, Feldera sets up proxy structures to
manage its input and output:
CREATE TABLE tmp(a type0, b type1);
CREATE VIEW TMP0 AS
SELECT expression FROM tmp;
• Proxy Table (tmp): This table temporarily stores the func-
tion’s input arguments (a, b). It acts as a placeholder to
capture the values provided when the function is invoked.
• Proxy View (TMP0): This view encapsulates the execution
logic (expression) operating on the data stored in tmp,
computing the function’s output based on the input param-
eters stored within.
This proxy-based approach enhances flexibility and performance
in executing UDFs within Feldera, seamlessly integrating with SQL
querying capabilities.
3.3 Proxy Relations Logic
The logic behind our inline table queries follows the established
proxy method used for inline functions, ensuring seamless integra-
tion and efficient data processing. Here’s how it works:
• We create a proxy table (COUNTUSERBYAGE_INPUT) to man-
age the function’s input arguments:
CREATE TABLE COUNTUSERBYAGE_INPUT("USERAGE" INT64);
• A proxy view (COUNTUSERBYAGE_OUTPUT) is established to
store the function’s output:
CREATE VIEW COUNTUSERBYAGE_OUTPUT AS
SELECT COUNT(1)
FROM PERSON, COUNTUSERBYAGE_INPUT
WHERE (PERSON.AGE = COUNTUSERBYAGE_INPUT.USERAGE) AND
(PERSON.PRESENT = TRUE)
GROUP BY USERAGE;
And when the function is invoked, we seamlessly integrate it
into view creation:
• We insert the arguments into the input table:
INSERT INTO COUNTUSERBYAGE_INPUT(USERAGE)
SELECT DISTINCT AGE FROM PERSON;
• Finally, we fetch the function output from the view:
CREATE VIEW PERSONAGECOUNTS AS
SELECT USERAGE AS AGE, (SELECT * FROM
COUNTUSERBYAGE_OUTPUT) AS function_output
FROM COUNTUSERBYAGE_INPUT;
This approach not only enhances the versatility of UDFs in
Feldera but also streamlines the integration of complex SQL opera-
tions, marking a significant advancement in database management
capabilities.
4 INSERT INTO Statement Enhancements
During the development process, we identified a significant limita-
tion with the INSERT INTO statements in Feldera. Initially, INSERT
INTO statements were restricted to basic operations such as:
• Table Scans:
INSERT INTO table
SELECT * FROM otherTable;
• Value Insertion:
INSERT INTO table
VALUES (A, B, C), (X, Y, Z);
These limitations prevented the use of aggregate functions within
INSERT INTO statements, rendering the following operations in-
valid:
INSERT INTO TMP(USERAGE)
SELECT COUNT(*) FROM PERSON
GROUP BY AGE;

4.1 Utilizing Z-Sets for Enhanced INSERT INTO
Statements
To overcome this limitation, we leveraged the power of Z-sets
within DBSP programs. Z-sets are an abstraction that associates
each unique record with a weight indicating its frequency in the
dataset. This feature enables more sophisticated data manipulations.
For instance, a single occurrence of a record is represented as
(Joe, 25, active) -> 1, while duplicates are indicated by higher
weights, such as (Alice, 19, inactive) -> 2 for two occur-
rences.
To aggregate data based on a person’s age, we can derive a new
collection Z-set from the existing one:
Original Z-Set:
(Joe, 25, active) -> 1
(Alice, 19, inactive) -> 2
(Bob, 25, active) -> 1
Aggregated Z-Set by Age:
(25) -> 2
(19) -> 2
This approach utilizes the inherent structure of Z-sets to enable
complex data manipulation operations previously infeasible with
standard INSERT INTO statements.
4.2 Supported Aggregation Functions
With these enhancements, INSERT INTO statements in Feldera
now support a range of aggregation functions, expanding their
capabilities significantly. The supported functions include:
• DISTINCT:
• COUNT(*):
INSERT INTO TMP(USERCOUNT)
SELECT COUNT(*) FROM PERSON
GROUP BY AGE;
• COUNT(column):
INSERT INTO TMP(USERAGECOUNT)
SELECT AGE, COUNT(NAME) FROM PERSON
GROUP BY AGE;
• MIN(column):
INSERT INTO TMP(MINAGE)
SELECT MIN(AGE) FROM PERSON;
• MAX(column):
INSERT INTO TMP(MAXAGE)
SELECT MAX(AGE) FROM PERSON;
• SUM(column):
INSERT INTO TMP(TOTALAGE)
SELECT SUM(AGE) FROM PERSON;
• AVG(column):
INSERT INTO TMP(AVERAGEAGE)
SELECT AVG(AGE) FROM PERSON;
4.3 Testing the New INSERT INTO TMP1 (SELECT
aggregate() FROM TMP2) Command
During testing, we encountered a limitation with the INSERT INTO
TMP1 (SELECT aggregate() FROM TMP2) command in Feldera’s
web console environment. Unfortunately, any attempt to use INSERT
INTO statements to populate tables through the web console proved
ineffective. This issue is likely a bug that may be addressed in
upcoming platform updates.
Despite this limitation, you can successfully test these features
using a compiler that translates SQL into DBSP programs.
4.3.1 Executing the Test. To evaluate the INSERT INTO with aggre-
gate functionality and review the results, execute the sql-to-dbsp
script:
cd feldera/sql-to-dbsp-compiler/SQL-compiler/
mvn clean && mvn package -DskipTests
./sql-to-dbsp insertInto/tests.sql --handles -o
../temp/src/lib.rs -q
Each aggregation result will manifest as a Z-set of tuples format-
ted as Tup1::new(((dataType)value), => weight,).
Note: It’s essential to ensure that both the source column and
the target column share the same data type and are either both
nullable or non-nullable.
These enhancements make INSERT INTO statements in Feldera
more versatile, enabling the execution of complex queries and data
transformations directly within SQL. This improvement is a sig-
nificant step forward in enhancing Feldera’s capability to handle
real-time data analytics and continuous data processing.
5 Rust-based UDFs Intergration
The objective was to enhance the existing pipeline manager by
introducing an API feature enabling users to create and compile
SQL functions using Rust, through a User-Defined Function (UDF)
mechanism. Here are the steps and changes made:
5.1 UDF Request and Response Structures
During the enhancement process, a new file, udf.rs, was intro-
duced to define two critical structures: UdfRequest and UdfResponse.
The UdfRequest structure captures essential details about the user-
defined function (UDF), such as its name and the corresponding

Rust code that implements its logic. Meanwhile, the UdfResponse
structure provides feedback to users regarding the status of their
UDF creation request, signaling success or any encountered errors.
These structures play a fundamental role in facilitating seamless
interaction between the client and server for UDF operations.
5.2 Implementing UDF Creation Endpoint
In the udf.rs file, we implemented the create_udf function to
manage the creation of User Defined Functions (UDFs). This func-
tion executes several key steps:
(1) Writing the UDF Definition to a File: The function ex-
tracts the UDF name and definition from the UdfRequest
structure and saves this information in a file named after
the UDF.
(2) Executing an External Command: This command com-
piles the SQL function along with its corresponding Rust
implementation, seamlessly integrating the new UDF into
the existing system.
(3) Providing Feedback: Depending on the outcome of the
command execution, the function delivers either a suc-
cess response or an error response encapsulated within
the UdfResponse structure. This feedback informs users
whether the UDF creation process was successful or en-
countered any errors.
5.3 Route Configuration
mod.rs serves as the main module file in our Rust project, func-
tioning as the entry point for defining and managing the project’s
modules and routes. It consolidates and configures various applica-
tion components such as API endpoints, middleware, and services.
By centralizing configuration in mod.rs, we maintain routing logic
and module definitions in a unified location, enhancing project
manageability and scalability.
In mod.rs, we updated the route configuration to include a new
endpoint for UDF creation, involving the following steps:
(1) Adding init_routes Function: Function that centralizes
route configuration.
(2) Configuring the /udf Endpoint: Within init_routes,
we incorporated the route for the /udf endpoint using
web::post. This ensures that the UDF creation functional-
ity is accessible via a POST request.
(3) Including UDF Creation Endpoint: We integrated the
create_udf function from udf.rs into init_routes to
handle requests directed to the /udf endpoint.
These updates ensure seamless integration of the new UDF cre-
ation feature into the application’s routing logic, enabling users to
add custom SQL functions implemented in Rust via API access.
5.4 Server Setup
Finally, in mod.rs, we configured the server to initialize routes and
start listening for incoming requests on port 8080. This process
included:
(1) Initializing Routes: We added the init_routes function
to configure API endpoints. This included setting up the
new /udf endpoint specifically for UDF creation.
(2) Starting the Server: Implemented the start_server func-
tion to establish and run the Actix Web server. This function
binds the server to port 8080, ensuring it listens for incom-
ing requests and processes them accordingly.
These configurations enable the server to effectively handle
requests and manage the new functionalities seamlessly.
5.5 Other minor additions
program.rs
In the program.rs file, the focus is on managing program-related
API endpoints. Changes were implemented to introduce User-Defined
Function (UDF) handling capabilities, seamlessly integrating these
new features into the existing API structure. This included:
• Adding necessary imports and dependencies to support
UDF functions.
• Ensuring the system can compile and manage user-defined
SQL functions effectively.
service.rs
The service.rs file oversees service-related operations and
configurations within the API. Updates were applied to ensure
compatibility with UDF creation and management. Key adjustments
included:
• Integrating UDF functionalities with existing service oper-
ations.
• Adapting service endpoints and handlers to accommodate
UDF-related requests.
These changes were essential for maintaining a cohesive service
management system while incorporating the new UDF features.
error.rs
In error.rs, which defines the API’s error handling mecha-
nisms and custom error responses, extensions were made to cover
potential UDF-related errors. Specific enhancements included:
• Adding error messages and types for UDF creation and
compilation failures.
• Enhancing the error handling infrastructure to effectively
manage new UDF operations.
These improvements ensure that errors related to the expanded UDF
functionality are captured and communicated effectively within
the API.
5.6 Alternative Approach
An alternative approach involves reading the udf declaration and
the subsequent rust code directly from a JSON request. This method
simplifies the process by embedding the UDF’s Rust code and SQL
definition within the API request, potentially streamlining develop-
ment and deployment workflows. However, due to time constraints,
this approach has not been fully explored or implemented in the
current version. It is documented in the openapi.json file, high-
lighting its potential to enhance flexibility and efficiency in inte-
grating custom Rust logic into SQL programs. Further exploration
and development are needed to fully realize its benefits.

5.7 Challenges and Future Directions
Despite successfully compiling the code, the newly implemented
API feature for UDF creation did not function as intended. The
process involved significant changes, including:
• Creating new request and response structures.
• Implementing the UDF creation logic.
• Configuring routes and server settings.
However, due to limited time constraints, we were unable to fully
troubleshoot and resolve the issues preventing the feature from
running correctly. Future work will focus on debugging the API
endpoint and ensuring the UDF functionality integrates seamlessly
into the system. This addition has the potential to elevate Feldera
to another level by making it more user-friendly and versatile.
6 Conclusion
Given the enhancements and advancements made to the Feldera
Continuous Analytics Platform, particularly in the areas of User-
defined Functions (UDFs), INSERT INTO statements, and the inte-
gration of Rust-based UDFs, it is evident that these developments
significantly bolster the platform’s capability for real-time data
analytics and continuous data processing.
The introduction of enhanced UDFs, supporting inline table
queries and expanding INSERT INTO statements to include ag-
gregate functions, represents a crucial leap forward in functional-
ity. These features empower users to perform more complex data
transformations directly within SQL, streamlining workflows and
enhancing overall efficiency.
Moreover, the potential integration of Rust-based UDFs intro-
duces a new dimension of performance optimization, leveraging
Rust’s capabilities for high-performance computing directly within
SQL programs. This integration not only enhances computational
efficiency but also broadens the scope of applications that can ben-
efit from Feldera’s analytical capabilities.
In conclusion, these enhancements underscore Feldera’s com-
mitment to innovation in data processing technologies, offering a
robust platform capable of meeting the demands of modern data-
driven enterprises. By combining theoretical advancements with
practical implementations, Feldera continues to pave the way for
more sophisticated and efficient data analytics solutions.

Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality for Real-Time Data Processing

Recommended

Recommended

More Related Content

Similar to Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality for Real-Time Data Processing

Similar to Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality for Real-Time Data Processing (20)

Recently uploaded

Recently uploaded (20)

Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality for Real-Time Data Processing