88% of respondents use online communities as their primary learning method. See what else they had to say about the state of dev today.
Looking to streamline existing DB processes? Learn key techniques on how to integrate database deployments into your CI/CD workflows.
Fixtures (Built-in Fixtures) in Playwright
Dynamic Forms With Camunda and Spring StateMachine
Developer Experience
With tech stacks becoming increasingly diverse and AI and automation continuing to take over everyday tasks and manual workflows, the tech industry at large is experiencing a heightened demand to support engineering teams. As a result, the developer experience is changing faster than organizations can consciously maintain.We can no longer rely on DevOps practices or tooling alone — there is even greater power recognized in improving workflows, investing in infrastructure, and advocating for developers' needs. This nuanced approach brings developer experience to the forefront, where devs can begin to regain control over their software systems, teams, and processes.We are happy to introduce DZone's first-ever Developer Experience Trend Report, which assesses where the developer experience stands today, including team productivity, process satisfaction, infrastructure, and platform engineering. Taking all perspectives, technologies, and methodologies into account, we share our research and industry experts' perspectives on what it means to effectively advocate for developers while simultaneously balancing quality and efficiency. Come along with us as we explore this exciting chapter in developer culture.
Apache Cassandra Essentials
Identity and Access Management
The challenges of testing various elements on a webpage, such as headings and links, noting that traditional code-based tests can become complex and brittle. I am not a huge fan of using screenshots for testing, as they can break with minor layout changes. Playwright’s ARIA Snapshots offer a powerful method to validate the accessibility tree of your web applications, ensuring that content is both accessible and correctly structured. By capturing the hierarchical representation of accessible elements, you can compare snapshots over time to detect unintended changes. This article delves into how to utilize ARIA Snapshots for content validation, explores the features provided by Playwright’s recorder, and provides practical examples to enhance your testing strategy. Playwright offers a feature to streamline updating expectations in test cases with npx playwright test --update-snapshots, which creates a rebaseline — a git patch file — to apply necessary changes. This automates the adjustment of expectations when fundamental changes occur in the application. Understanding ARIA Snapshots An ARIA Snapshot in Playwright provides a YAML representation of a page’s accessibility tree. This snapshot details the roles, names, and attributes of accessible elements, reflecting the structure that assistive technologies perceive. By comparing these snapshots, you can verify that the page’s accessible structure remains consistent or meets defined expectations. Plain Text - banner: - heading /Playwright enables reliable/ [level=1] - link "Get started" - link "Star microsoft/playwright on GitHub" - main: - img "Browsers (Chromium, Firefox, WebKit)" - heading "Any browser • Any platform • One API" In this snapshot, each node represents an accessible element, with indentation indicating hierarchy. The format includes the role, accessible name, and any pertinent attributes. Validating Page Content With ARIA Snapshots To validate page content, Playwright provides the expect(locator).toMatchAriaSnapshot() assertion method. This method compares the current accessibility tree of a specified locator against a predefined ARIA snapshot template. Example: JavaScript await expect(page.locator('body')).toMatchAriaSnapshot(` - heading "Welcome to Our Site" [level=1] - navigation: - link "Home" - link "About" - link "Contact" `); In this example, the test verifies that the body of the page contains a level 1 heading with the text “Welcome to Our Site” and a navigation section with links to “Home,” “About,” and “Contact.” Plain Text Aria Snapshot | Accesibility | Chrome Dev Tools Partial Matching ARIA Snapshots support partial matching, allowing you to validate specific parts of the accessibility tree without requiring an exact match. This is particularly useful for dynamic content. Example: JavaScript await expect(page.locator('nav')).toMatchAriaSnapshot(` - link "Home" - link - link "Contact" `); Here, the test checks that the navigation contains links to “Home” and “Contact,” with the middle link’s accessible name being irrelevant to the test. Updating Area Snapshot When using the Playwright test runner (@playwright/test), you can automatically update snapshots by running tests with the --update-snapshots flag: JavaScript npx playwright test --update-snapshots This command regenerates snapshots for assertions, including aria snapshots, replacing outdated ones. It’s useful when application structure changes require new snapshots as a baseline. Note that Playwright will wait for the maximum expected timeout specified in the test runner configuration to ensure the page is settled before taking the snapshot. It might be necessary to adjust the --timeout if the test hits the timeout while generating snapshots. Utilizing Playwright’s Recorder Features Playwright’s recorder is a tool that allows you to generate tests by recording your interactions with a web application. It captures clicks, typing, submissions, and navigation, generating code that can be used for testing. Key Features Action recording – Captures user interactions to generate test scripts.Live code generation – Displays the generated code in real-time as actions are performed.Editing capabilities – Allows deletion and reordering of recorded actions.Cross-browser support – Compatible with multiple browsers for comprehensive testing. Example Workflow Start recording. Launch the recorder to begin capturing interactions.Perform actions. Navigate through the application, performing actions such as clicking buttons or entering text.Review generated code. Observe the real-time generation of code corresponding to your actions.Edit as needed. Modify the recorded actions by deleting unnecessary steps or reordering them.Export and use. Save the generated code for integration into your testing framework. For a visual demonstration, you can watch the following video: Practical Examples Example 1: Validating a Login Form Suppose you have a login form with username and password fields and a submit button. ARIA Snapshot template: Plain Text - form: - textbox "Username" - textbox "Password" - button "Log In" Test implementation: JavaScript await expect(page.locator('form#login')).toMatchAriaSnapshot(` - textbox "Username" - textbox "Password" - button "Log In" `); This test ensures that the login form contains the appropriate fields and buttons with the correct accessible names. Example 2: Validating a Dynamic List Consider a dynamic list where items can be added or removed. ARIA Snapshot template: Plain Text - list: - listitem "Item 1" - listitem "Item 2" - listitem "Item 3" Test implementation: JavaScript await expect(page.locator('ul#dynamic-list')).toMatchAriaSnapshot(` - listitem "Item 1" - listitem "Item 2" - listitem "Item 3" `); This test verifies that the list contains three items with the specified accessible names. Conclusion ARIA Snapshots in Playwright provide a robust mechanism for validating the accessible structure of your web applications. Using the recorder’s features, you can efficiently generate and manage tests, ensuring that your content remains accessible and correctly structured. Incorporating these tools into your testing workflow enhances the reliability and inclusivity of your applications.
Fraud detection has become a top priority for businesses across industries. With fraudulent activities growing more sophisticated, traditional rule-based approaches often fall short in addressing the constantly evolving threats. Detecting and preventing fraud — be it financial scams, identity theft, or insurance fraud — is crucial, especially when global fraud losses run into billions of dollars annually. This guide discusses how deep learning can improve fraud detection with AWS SageMaker and AWS Glue. Deep learning, a branch of machine learning, excels at uncovering complex patterns in large datasets and adapting to emerging fraud techniques. AWS SageMaker, a comprehensive machine learning platform, equips businesses with the tools to build, train, and deploy advanced deep learning models, while AWS Glue streamlines data preparation and integration with its fully managed ETL capabilities. Together, these AWS services enable organizations to create end-to-end fraud detection pipelines that are robust, scalable, and seamlessly fit within existing data ecosystems. In this document, I'll walk you through the process of developing a powerful fraud detection system using deep learning on AWS. We'll cover best practices, common challenges, and real-world applications to help you build an efficient and future-ready solution. AWS SageMaker for Deep Learning With XGBoost AWS SageMaker offers a powerful platform for developing machine learning models, and when it comes to fraud detection, XGBoost (Extreme Gradient Boosting) stands out as a particularly effective algorithm. Let's dive into how SageMaker and XGBoost work together to create robust fraud detection systems. XGBoost on AWS SageMaker is a popular machine learning algorithm known for its speed and performance, especially in fraud detection scenarios. SageMaker provides a built-in XGBoost algorithm optimized for the AWS ecosystem. High accuracy. XGBoost often outperforms other algorithms in binary classification tasks like fraud detection.Handles imbalanced data. Fraud cases are typically rare, and XGBoost can handle this imbalance effectively.Feature importance. XGBoost provides insights into which features are most crucial for detecting fraud.Scalability. SageMaker's implementation can handle large datasets efficiently. Architecture Data sources – Various input streams, including transactional databases, log files, and real-time data feeds that provide raw information for fraud detection.AWS Glue – A fully managed ETL (Extract, Transform, Load) service that catalogs your data, cleans it, enriches it, and moves it reliably between various data stores and data streams.Amazon S3 (processed data) – Highly scalable object storage service that stores both the processed data from Glue and the model artifacts from SageMaker.AWS SageMaker (XGBoost Model) – A fully managed machine learning platform that will be leveraged to build, train, and deploy the XGBoost machine learning model.Sagemaker endpoints – A scalable compute environment for deploying the model for real-time fraud detection.API Gateway – Acts as a protection layer in front of the model deployed in the SageMaker endpoint. It helps secure calls, track requests, and manage model deployments with minimal impact on the consuming application.Fraud detection application – A custom application or serverless function that integrates the deployed model into the business workflow, making fraud predictions on new transactions. Implementation Here is the small snippet of the data that I am considering for training the ML model in CSV format. The data has the following columns transaction_id – Unique identifier for each transactioncustomer_id – Identifier for the customer making the transactionmerchant_id – Identifier for the merchant receiving the paymentcard_type – Type of card used (credit or debit)amount – Transaction amounttransaction_date – Date and time of the transactionis_fraud – Binary indicator of whether the transaction is fraudulent (1) or not (0) transaction_idcustomer_idmerchant_idcard_typeamounttransaction_dateis_fraud1001C5678M345credit123.452023-04-15 09:30:1501002C8901M567debit45.672023-04-15 10:15:2201003C2345M789credit789.012023-04-15 11:45:3011004C6789M123credit56.782023-04-15 13:20:4501005C3456M234debit234.562023-04-15 14:55:1001006C9012M456credit1234.562023-04-15 16:30:0511007C4567M678debit23.452023-04-15 17:45:3001008C7890M890credit345.672023-04-15 19:10:2001009C1234M012credit567.892023-04-15 20:25:1501010C5678M345debit12.342023-04-15 21:40:550 The Glue ETL script in Pyspark below demonstrates how I processed and prepared data for fraud detection using XGBoost in SageMaker. I am using transactional data and performing some basic data cleaning and feature engineering. Remember to replace the bucket name in the code snippet below. Also, the source database and table names are as configured in the Glue Crawler: Python import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from pyspark.sql.functions import col, when, to_timestamp, hour, dayofweek # Initialize the Glue context args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) # Read input data datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "fraud_detection_db", # Database name configured in the glue crawler table_name = "raw_transactions" # Table name configured in the glue crawler ) # Convert to DataFrame for easier manipulation df = datasource0.toDF() # Data cleaning df = df.dropDuplicates().na.drop() # Feature engineering df = df.withColumn("amount", col("amount").cast("double")) df = df.withColumn("timestamp", to_timestamp(col("transaction_date"), "yyyy-MM-dd HH:mm:ss")) df = df.withColumn("hour_of_day", hour(col("timestamp"))) df = df.withColumn("day_of_week", dayofweek(col("timestamp"))) # Encode categorical variables df = df.withColumn("card_type_encoded", when(col("card_type") == "credit", 1).otherwise(0)) # Calculate statistical features df = df.withColumn("amount_zscore", (col("amount") - df.select(mean("amount")).collect()[0][0]) / df.select(stddev("amount")).collect()[0][0] ) # Select final features for model training final_df = df.select( "transaction_id", "amount", "amount_zscore", "hour_of_day", "day_of_week", "card_type_encoded", "is_fraud" # this is my target variable ) # Write the processed data back to S3 output_dir = "s3://<bucket_name>/processed_transactions/" glueContext.write_dynamic_frame.from_options( frame = DynamicFrame.fromDF(final_df, glueContext, "final_df"), connection_type = "s3", connection_options = {"path": output_dir}, format = "parquet" ) job.commit() Now, once the processed data is stored in S3, the next step will be to use Sagemaker Jupyter Notebook to train the model. Again, remember to replace your bucket name in the code. Python import boto3 import sagemaker from sagemaker import get_execution_role from sagemaker.amazon.amazon_estimator import get_image_uri from sagemaker.session import Session import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Set up the SageMaker session role = get_execution_role() session = sagemaker.Session() bucket = session.default_bucket() prefix = 'fraud-detection-xgboost' # Specify the S3 location of your processed data s3_data_path = 's3://<bucket_name> /processed_transactions/' # Read the data from S3 df = pd.read_parquet(s3_data_path) # Split features and target X = df.drop('is_fraud', axis=1) y = df['is_fraud'] # Split the data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Scale the features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Combine features and target for SageMaker train_data = pd.concat([pd.DataFrame(X_train_scaled), y_train.reset_index(drop=True)], axis=1) test_data = pd.concat([pd.DataFrame(X_test_scaled), y_test.reset_index(drop=True)], axis=1) # Save the data to S3 train_path = session.upload_data(path=train_data.to_csv(index=False, header=False), bucket=bucket, key_prefix=f'{prefix}/train') test_path = session.upload_data(path=test_data.to_csv(index=False, header=False), bucket=bucket, key_prefix=f'{prefix}/test') # Set up the XGBoost estimator container = get_image_uri(session.boto_region_name, 'xgboost', '1.0-1') xgb = sagemaker.estimator.Estimator(container, role, instance_count=1, instance_type='ml.m5.xlarge', output_path=f's3://<bucket_name>}/{prefix}/output', sagemaker_session=session) # Set hyperparameters xgb.set_hyperparameters(max_depth=5, eta=0.2, gamma=4, min_child_weight=6, subsample=0.8, objective='binary:logistic', num_round=100) # Train the model xgb.fit({'train': train_path, 'validation': test_path}) # Deploy the model xgb_predictor = xgb.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') # Test the deployed model test_data_np = test_data.drop('is_fraud', axis=1).values.astype('float32') predictions = xgb_predictor.predict(test_data_np) # Evaluate the model from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score y_pred = [1 if p > 0.5 else 0 for p in predictions] print(f"Accuracy: {accuracy_score(y_test, y_pred)}") print(f"Precision: {precision_score(y_test, y_pred)}") print(f"Recall: {recall_score(y_test, y_pred)}") print(f"F1 Score: {f1_score(y_test, y_pred)}") # Clean up xgb_predictor.delete_endpoint() Result Here is the output of the test run on the model. Plain Text Accuracy: 0.9982 Precision: 0.9573 Recall: 0.8692 F1 Score: 0.9112 Let me break down what these metrics mean in the context of fraud detection: Accuracy: 0.9982 (99.82%) – This high accuracy might look impressive, but in fraud detection, it can be misleading due to class imbalance. Most transactions are not fraudulent, so even always predicting "not fraud" could give high accuracy.Precision: 0.9573 (95.73%) – This indicates that when the model predicts a transaction is fraudulent, it's correct 95.73% of the time. High precision is important to minimize false positives, which could lead to legitimate transactions being blocked.Recall: 0.8692 (86.92%) – This shows that the model correctly identifies 86.92% of all actual fraudulent transactions. While not as high as precision, it's still good. In fraud detection, we often prioritize recall to catch as many fraudulent transactions as possible.F1 Score: 0.9112 – This is the harmonic mean of precision and recall, providing a balanced measure of the model's performance. An F1 score of 0.9112 indicates a good balance between precision and recall. These metrics suggest a well-performing model that's good at identifying fraudulent transactions while minimizing false alarms. However, there's always room for improvement, especially in recall. In a real-world scenario, you might want to adjust the model's threshold to increase recall, potentially at the cost of some precision, depending on the specific business requirements and the cost associated with false positives versus false negatives. Conclusion The implementation of a deep learning-based fraud detection system using AWS SageMaker and Glue represents a significant leap forward in the fight against financial fraud. Using XGBoost algorithms and the scalability of cloud computing, businesses can now detect and prevent fraudulent activities with unprecedented accuracy and efficiency. Throughout this document, we've explored the intricate process of building such a system, from data preparation with AWS Glue to model training and deployment with SageMaker. The architecture we've outlined has shown how the combination of AWS SageMaker, Glue, and XGBoost offers a formidable weapon in the arsenal against fraud.
Hey, DZone Community! We have an exciting year ahead of research for our beloved Trend Reports. And once again, we are asking for your insights and expertise (anonymously if you choose) — readers just like you drive the content we cover in our Trend Reports. Check out the details for our research survey below. Comic by Daniel Stori Generative AI Research Generative AI is revolutionizing industries, and software development is no exception. At DZone, we're diving deep into how GenAI models, algorithms, and implementation strategies are reshaping the way we write code and build software. Take our short research survey ( ~10 minutes) to contribute to our latest findings. We're exploring key topics, including: Embracing generative AI (or not)Multimodal AIThe influence of LLMsIntelligent searchEmerging tech And don't forget to enter the raffle for a chance to win an e-gift card of your choice! Join the GenAI Research Over the coming month, we will compile and analyze data from hundreds of respondents; results and observations will be featured in the "Key Research Findings" of our Trend Reports. Your responses help inform the narrative of our Trend Reports, so we truly cannot do this without you. Stay tuned for each report's launch and see how your insights align with the larger DZone Community. We thank you in advance for your help! —The DZone Content and Community team
Scalability is a fundamental concept in both technology and business that refers to the ability of a system, network, or organization to handle a growing amount of requests or ability to grow. This characteristic is crucial for maintaining performance and efficiency as need increases. In this article, we will explore the definition of scalability, its importance, types, methods to achieve it, and real-world examples. What Is Scalability in System Design? Scalability encompasses the capacity of a system to grow and manage increasing workloads without compromising performance. This means that as user traffic, data volume, or computational demands rise, a scalable system can maintain or even enhance its performance. The essence of scalability lies in its ability to adapt to growth without necessitating a complete redesign or significant resource investment Why This Is Important Managing growth. Scalable systems can efficiently handle more users and data without sacrificing speed or reliability. This is particularly important for businesses aiming to expand their customer base.Performance enhancement. By distributing workloads across multiple servers or resources, scalable systems can improve overall performance, leading to faster processing times and better user experiences.Cost-effectiveness. Scalable solutions allow businesses to adjust resources according to demand, helping avoid unnecessary expenditures on infrastructure.Availability assurance. Scalability ensures that systems remain operational even during unexpected spikes in traffic or component failures, which is essential for mission-critical applications.Encouraging innovation. A scalable architecture supports the development of new features and services by minimizing infrastructure constraints. Types of Scalability in General Vertical scaling (scaling up). This involves enhancing the capacity of existing hardware or software components. For example, upgrading a server's CPU or adding more RAM allows it to handle increased loads without changing the overall architecture.Horizontal scaling (scaling out). This method involves adding more machines or instances to distribute the workload. For instance, cloud services allow businesses to quickly add more servers as needed. Challenges Complexity. Designing scalable systems can be complex and may require significant planning and expertise.Cost. Initial investments in scalable technologies can be high, although they often pay off in the long run through improved efficiency.Performance bottlenecks. As systems scale, new bottlenecks may emerge that need addressing, such as database limitations or network congestion. Scalability in Spring Boot Projects Scalability refers to the ability of an application to handle growth — whether in terms of user traffic, data volume, or transaction loads — without compromising performance. In the context of Spring Boot, scalability can be achieved through both vertical scaling (enhancing existing server capabilities) and horizontal scaling (adding more instances of the application). Key Strategies Microservices Architecture Independent services. Break your application into smaller, independent services that can be developed, deployed, and scaled separately. This approach allows for targeted scaling; if one service experiences high demand, it can be scaled independently without affecting others.Spring cloud integration. Utilize Spring Cloud to facilitate microservices development. It provides tools for service discovery, load balancing, and circuit breakers, enhancing resilience and performance under load. Asynchronous Processing Implement asynchronous processing to prevent thread blockage and improve response times. Utilize features like CompletableFuture or message queues (e.g., RabbitMQ) to handle long-running tasks without blocking the main application thread. Asynchronous processing allows tasks to be executed independently of the main program flow. This means that tasks can run concurrently, enabling the system to handle multiple operations simultaneously. Unlike synchronous processing, where tasks are completed one after another, asynchronous processing helps in reducing idle time and improving efficiency. This approach is particularly advantageous for tasks that involve waiting, such as I/O operations or network requests. By not blocking the main execution thread, asynchronous processing ensures that systems remain responsive and performant. Stateless Services Design your services to be stateless, meaning they do not retain any client data between requests. This simplifies scaling since any instance can handle any request without needing session information. There is no stored knowledge of or reference to past transactions. Each transaction is made as if from scratch for the first time. Stateless applications provide one service or function and use a content delivery network (CDN), web, or print servers to process these short-term requests. Database Scalability Database scalability refers to the ability of a database to handle increasing amounts of data, numbers of users, and types of requests without sacrificing performance or availability. A scalable database tackles these database server challenges and adapts to growing demands by either adding resources such as hardware or software, by optimizing its design and configuration, or by undertaking some combined strategy. Type of Databases 1. SQL Databases (Relational Databases) Characteristics: SQL databases are known for robust data integrity and support complex queries.Scalability: They can be scaled both vertically by upgrading hardware and horizontally through partitioning and replication.Examples: PostgreSQL supports advanced features like indexing and partitioning. 2. NoSQL Databases Characteristics: Flexible schema designs allow for handling unstructured or semi-structured data efficiently.Scalability: Designed primarily for horizontal scaling using techniques like sharding.Examples: MongoDB uses sharding to distribute large datasets across multiple servers. Below are some techniques that enhance database scalability: Use indexes. Indexes help speed up queries by creating an index of frequently accessed data. This can significantly improve performance, particularly for large databases. Timescale indexes work just like PostgreSQL indexes, removing much of the guesswork when working with this powerful tool.Partition your data. Partitioning involves dividing a large table into smaller, more manageable parts. This can improve performance by allowing the database to access data more quickly. Read how to optimize and test your data partitions’ size in Timescale.Use buffer cache. In PostgreSQL, buffer caching involves storing frequently accessed data in memory, which can significantly improve performance. This is particularly useful for read-heavy workloads, and while it is always enabled in PostgreSQL, it can be tweaked for optimized performance.Consider data distribution. In distributed databases, data distribution or sharding is an extension of partitioning. It turns the database into smaller, more manageable partitions and then distributes (shards) them across multiple cluster nodes. This can improve scalability by allowing the database to handle more data and traffic. However, sharding also requires more design work up front to work correctly.Use a load balancer. Sharding and load balancing often conflict unless you use additional tooling. Load balancing involves distributing traffic across multiple servers to improve performance and scalability. A load balancer that routes traffic to the appropriate server based on the workload can do this; however, it will only work for read-only queries.Optimize queries. Optimizing queries involves tuning them to improve performance and reduce the load on the database. This can include rewriting queries, creating indexes, and partitioning data. Caching Strategies Caching is vital in enhancing microservices' performance and firmness. It is a technique in which data often and recently used is stored in a separate storage location for quicker retrieval from the main memory, known as a cache. If caching is incorporated correctly into the system architecture, there is a marked improvement in the microservice's performance and a lessened impact on the other systems. When implementing caching: Identify frequently accessed data that doesn't change often — ideal candidates for caching.Use appropriate annotations (@Cacheable, @CachePut, etc.) based on your needs.Choose a suitable cache provider depending on whether you need distributed capabilities (like Hazelcast) or simple local storage (like ConcurrentHashMap).Monitor performance improvements after implementing caches to ensure they're effective without causing additional overheads like stale data issues. Performance Optimization Optimize your code by avoiding blocking operations and minimizing database calls. Techniques such as batching queries or using lazy loading can enhance efficiency. Regularly profile your application using tools like Spring Boot Actuator to identify bottlenecks and optimize performance accordingly. Steps to Identify Bottlenecks Monitoring Performance Metrics Use tools like Spring Boot Actuator combined with Micrometer for collecting detailed application metrics.Integrate with monitoring systems such as Prometheus and Grafana for real-time analysis. Profiling CPU and Memory Usage Utilize profilers like VisualVM, YourKit, or JProfiler to analyze CPU usage, memory leaks, and thread contention.These tools help identify methods that consume excessive resources. Database Optimization Analyze database queries using tools like Hibernate statistics or database monitoring software.Optimize SQL queries by adding indexes, avoiding N+1 query problems, and optimizing connection pool usage. Thread Dump Analysis for Thread Issues Use jstack <pid> command or visual analysis tools like yCrash to debug deadlocks or blocked threads in multi-threaded applications. Distributed Tracing (If Applicable) For microservices architecture, use distributed tracing tools such as Zipkin or Elastic APM to trace latency issues across services. Common Bottleneck Scenarios High Latency Analyze each layer of the application (e.g., controller, service) for inefficiencies. Scenario Tools/Techniques High CPU Usage VisualVM, YourKit High Memory Usage Eclipse MAT, VisualVM Slow Database Queries Hibernate Statistics Network Latency Distributed Tracing Tools Monitoring and Maintenance Continuously monitor your application’s health using tools like Prometheus and Grafana alongside Spring Boot Actuator. Monitoring helps identify performance issues early and ensures that the application remains responsive under load Load Balancing and Autoscaling Use load balancers to distribute incoming traffic evenly across multiple instances of your application. This ensures that no single instance becomes a bottleneck. Implement autoscaling features that adjust the number of active instances based on current demand, allowing your application to scale dynamically Handling 100 TPS in Spring Boot 1. Optimize Thread Pool Configuration Configuring your thread pool correctly is crucial for handling high TPS. You can set the core and maximum pool sizes based on your expected load and system capabilities. Example configuration: Java spring.task.execution.pool.core-size=20 spring.task.execution.pool.max-size=100 spring.task.execution.pool.queue-capacity=200 spring.task.execution.pool.keep-alive=120s This configuration allows for up to 100 concurrent threads with sufficient capacity to handle bursts of incoming requests without overwhelming the system. Each core of a CPU can handle about 200 threads, so you can configure it based on your hardware. 2. Use Asynchronous Processing Implement asynchronous request handling using @Async annotations or Spring WebFlux for non-blocking I/O operations, which can help improve throughput by freeing up threads while waiting for I/O operations to complete. 3. Enable Caching Utilize caching mechanisms (e.g., Redis or EhCache) to store frequently accessed data, reducing the load on your database and improving response times. 4. Optimize Database Access Use connection pooling (e.g., HikariCP) to manage database connections efficiently. Optimize your SQL queries and consider using indexes where appropriate. 5. Load Testing and Monitoring Regularly perform load testing using tools like JMeter or Gatling to simulate traffic and identify bottlenecks. Monitor application performance using Spring Boot Actuator and Micrometer. Choosing the Right Server Choosing the right web server for a Spring Boot application to ensure scalability involves several considerations, including performance, architecture, and specific use cases. Here are key factors and options to guide your decision: 1. Apache Tomcat Type: Servlet containerUse case: Ideal for traditional Spring MVC applications.Strengths: Robust and widely used with extensive community support.Simple configuration and ease of use.Well-suited for applications with a straightforward request-response model.Limitations: May face scalability issues under high loads due to its thread-per-request model, leading to higher memory consumption per request 2. Netty Type: Asynchronous event-driven frameworkUse case: Best for applications that require high concurrency and low latency, especially those using Spring WebFlux.Strengths: Non-blocking I/O allows handling many connections with fewer threads, making it highly scalable.Superior performance in I/O-bound tasks and real-time applications.Limitations: More complex to configure and requires a different programming model compared to traditional servlet-based application. 3. Undertow Type: Lightweight web serverUse case: Suitable for both blocking and non-blocking applications; often used in microservices architectures.Strengths: High performance with low resource consumption.Supports both traditional servlet APIs and reactive programming models.Limitations: Less popular than Tomcat, which may lead to fewer community resources available 4. Nginx (As a Reverse Proxy) Type: Web server and reverse proxyUse case: Often used in front of application servers like Tomcat or Netty for load balancing and serving static content.Strengths: Excellent at handling high loads and serving static files efficiently.Can distribute traffic across multiple instances of your application server, improving scalability Using the Right JVM Configuration 1. Heap Size Configuration The Java heap size determines how much memory is allocated for your application. Adjusting the heap size can help manage large amounts of data and requests. Shell -Xms1g -Xmx2g -Xms: Set the initial heap size (1 GB in this example).-Xmx: Set the maximum heap size (2 GB in this example). 2. Garbage Collection Choosing the right garbage collector can improve performance. The default G1 Garbage Collector is usually a good choice, but you can experiment with others like ZGC or Shenandoah for low-latency requirements. Shell -XX:+UseG1GC For low-latency applications, consider using: Shell -XX:+UseZGC # Z Garbage Collector -XX:+UseShenandoahGC # Shenandoah Garbage Collector 3. Thread Settings Adjusting the number of threads can help handle concurrent requests more efficiently. Set the number of threads in the Spring Boot application properties: Properties files server.tomcat.max-threads=200 Adjust the JVM’s thread stack size if necessary: Shell -Xss512k 4. Enable JIT Compiler Options JIT (Just-In-Time) compilation can optimize the performance of your code during runtime. Shell -XX:+TieredCompilation -XX:CompileThreshold=1000 -XX:CompileThreshold: This option controls how many times a method must be invoked before it's considered for compilation. Adjust according to profiling metrics. Hardware Requirements To support 100 TPS, the underlying hardware infrastructure must be robust. Key hardware considerations include: High-performance servers. Use servers with powerful CPUs (multi-core processors) and ample RAM (64 GB or more) to handle concurrent requests effectively.Fast storage solutions. Implement SSDs for faster read/write operations compared to traditional hard drives. This is crucial for database performance.Network infrastructure. Ensure high bandwidth and low latency networking equipment to facilitate rapid data transfer between clients and servers. Conclusion Performance optimization in Spring Boot applications is not just about tweaking code snippets; it's about creating a robust architecture that scales with growth while maintaining efficiency. By implementing caching, asynchronous processing, and scalability strategies — alongside careful JVM configuration — developers can significantly enhance their application's responsiveness under load. Moreover, leveraging monitoring tools to identify bottlenecks allows for targeted optimizations that ensure the application remains performant as user demand increases. This holistic approach improves user experience and supports business growth by ensuring reliability and cost-effectiveness over time. If you're interested in more detailed articles or references on these topics: Best Practices for Optimizing Spring Boot Application PerformancePerformance Tuning Spring Boot ApplicationsIntroduction to Optimizing Spring HTTP Server Performance
Theming is not just a choice but an important feature in modern web and mobile applications, as it allows for a more personalized, seamless, and tailored user experience. Theming becomes a must-have feature for SaaS applications integrating into other applications. Traditional theming techniques often involve hardcoded styles and rigid code configurations. This creates issues in the maintenance and scaling of applications. Furthermore, complexity grows even more when an application supports multiple themes across web and mobile devices. Modern CSS frameworks come with built-in theming support, but often, the themes they provide are fixed and don’t fit the needs of enterprise applications. The Shift Towards CSS Variables for Theming In recent years, CSS variables have emerged as a front-runner choice for dynamic theming. Unlike CSS preprocessors such as SASS, which compile at build time, CSS variables are applied at runtime. This makes them ideal for handling real-time updates to styles — a necessary prerequisite for responsive theming. Furthermore, they enhance code maintenance by centralizing theme definitions and provide the flexibility of applying themes based on user preference or system settings. In this article, you will learn to create dynamic themes using CSS variables in a Next.js-based data visualization application. You will learn to: Create a dynamic theme switcher between dark and light modesSave user theme preferencesCreate dynamic data visualizations based on the user-selected theme Note on Prerequisites The code developed throughout the article is available at this GitHub repo. However, you are encouraged to follow along for a deeper understanding. To develop code locally, you will need the below prerequisites: Node.js and NPM are installed on your machine. You can verify this by running the following commands in your terminal node --v and npm --v.A code editor of your choice. This application was developed using Visual Studio Code, which is the recommended editor.Experience with JavaScript, CSS, and React. Setting Up a New Next.js Application You will be creating an expense tracker application. This application will have two pages — a login page and an expense tracker dashboard page with data visualization of your expenses for the last month. You will leverage the Next.js ecosystem to build this application. Next.js uses React under the hood and provides clean route management, server-side rendering, and other cool features. Run the following commands to create a new Next.js application and install the necessary dependencies. PowerShell npx create-next-app@latest theme-visualization-app cd theme-visualization-app npm install chart.js react-chartjs-2 js-cookie npm install --save-dev @types/js-cookie During the creation of the Next.js app, please ensure to select presets as shown in the illustration below. You will notice that additional packages were installed during the setup process. These packages will allow you to build chart data visualizations and access application cookies, which will be useful during actual application development. You will learn more about these in detail soon.On your terminal, use the command npm run dev to start the development server, then navigate to http://localhost:3000 in your browser. You will see the default Next.js application displayed. Implementing the Expense Tracker in Next.js Now, you are ready to add theming logic using CSS variables and build the expense tracker functionality. In the first part of this section, you will set up the theming logic. Set up CSS variables, which will define the look and feel of the user interface (UI) under dark and light themes. Update the src\app\globals.css file under the project root with the following code. CSS /* src\app\globals.css */ @import "tailwindcss"; @layer base { :root { /* Light theme variables */ --background-color: #ffffff; --text-color: #0f172a; --card-bg: #ffffff; --card-text: #0f172a; --border-color: #e2e8f0; --primary-color: #3b82f6; --primary-text: #ffffff; --secondary-color: #f1f5f9; --secondary-text: #1e293b; --muted-color: #f1f5f9; --muted-text: #64748b; --accent-color: #f1f5f9; --accent-text: #1e293b; --destructive-color: #ef4444; --destructive-text: #ffffff; --input-border: #e2e8f0; --ring-color: #3b82f6; /* Chart Colors - Light Theme */ --chart-color-1: #3b82f6; --chart-color-2: #8b5cf6; --chart-color-3: #d946ef; --chart-color-4: #ec4899; --chart-color-5: #f59e0b; --chart-color-6: #10b981; --chart-text: #0f172a; --chart-grid: #e2e8f0; } [data-theme="dark"] { --background-color: #0f172a; --text-color: #f8fafc; --card-bg: #0f172a; --card-text: #f8fafc; --border-color: #1e293b; --primary-color: #3b82f6; --primary-text: #0f172a; --secondary-color: #1e293b; --secondary-text: #f8fafc; --muted-color: #1e293b; --muted-text: #94a3b8; --accent-color: #1e293b; --accent-text: #f8fafc; --destructive-color: #7f1d1d; --destructive-text: #f8fafc; --input-border: #1e293b; --ring-color: #3b82f6; /* Chart Colors - Dark Theme */ --chart-color-1: #60a5fa; --chart-color-2: #a78bfa; --chart-color-3: #e879f9; --chart-color-4: #f472b6; --chart-color-5: #fbbf24; --chart-color-6: #34d399; --chart-text: #f8fafc; --chart-grid: #1e293b; } } @layer components { body { background-color: var(--background-color); color: var(--text-color); transition: background-color 0.3s ease-in-out, color 0.3s ease-in-out; } .app-header { border-bottom: 1px solid var(--border-color); } .app-footer { border-top: 1px solid var(--border-color); } .theme-toggle { background-color: var(--secondary-color); color: var(--secondary-text); border-radius: 0.375rem; cursor: pointer; transition: opacity 0.2s; } .theme-toggle:hover { opacity: 0.9; } .text-muted { color: var(--muted-text); } .text-primary { color: var(--primary-color); } .themed-card { background-color: var(--card-bg); color: var(--card-text); border: 1px solid var(--border-color); border-radius: 0.5rem; box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06); } .themed-table { width: 100%; text-align: left; } .themed-table th { color: var(--muted-text); border-bottom: 1px solid var(--border-color); padding-bottom: 0.5rem; } .themed-table td { border-bottom: 1px solid var(--border-color); padding: 0.75rem 0; } .themed-table tr:last-child td { border-bottom: none; } .themed-input { background-color: var(--background-color); color: var(--text-color); border: 1px solid var(--input-border); border-radius: 0.375rem; padding: 0.5rem; width: 100%; transition: all 0.2s; } .themed-input:focus { outline: none; border-color: var(--primary-color); box-shadow: 0 0 0 2px var(--ring-color); } .themed-label { color: var(--muted-text); display: block; font-size: 0.875rem; font-weight: 500; margin-bottom: 0.25rem; } .themed-button { background-color: var(--primary-color); color: var(--primary-text); border: none; border-radius: 0.375rem; padding: 0.5rem 1rem; cursor: pointer; transition: opacity 0.2s; } .themed-button:hover { opacity: 0.9; } .themed-button-secondary { background-color: var(--secondary-color); color: var(--secondary-text); } } This file defines various CSS properties, for example, background color and text color for both themes. The theme style is applied based on the data-theme custom property, which will be applied to the body tag. The theme setup also includes styling for UI components such as cards, buttons, tables, etc. Finally, the file also imports TailwindCSS, which will be leveraged for layout and spacing between components.Next, set up a theme context and theme toggler for the application. Create two files: src\context\ThemeContext.tsx src\components\ThemeToggle.tsx These files will allow setting up the context and the toggle button. Update them with the following code. TypeScript-JSX // src\context\ThemeContext.tsx 'use client'; import React, { createContext, useContext, useState, useEffect, ReactNode } from 'react'; import Cookies from 'js-cookie'; type Theme = 'light' | 'dark'; interface ThemeContextType { theme: Theme; toggleTheme: () => void; } const ThemeContext = createContext<ThemeContextType | undefined>(undefined); export const ThemeProvider = ({ children }: { children: ReactNode }) => { const [theme, setTheme] = useState<Theme>('light'); useEffect(() => { const savedTheme = Cookies.get('theme') as Theme; if (savedTheme) { setTheme(savedTheme); } else if (window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches) { setTheme('dark'); } }, []); useEffect(() => { document.body.dataset.theme = theme; Cookies.set('theme', theme, { expires: 365 }); }, [theme]); const toggleTheme = () => { setTheme((prevTheme) => (prevTheme === 'light' ? 'dark' : 'light')); }; return ( <ThemeContext.Provider value={{ theme, toggleTheme }> {children} </ThemeContext.Provider> ); }; export const useTheme = () => { const context = useContext(ThemeContext); if (context === undefined) { throw new Error('useTheme must be used within a ThemeProvider'); } return context; }; TypeScript-JSX // src\components\ThemeToggle.tsx import React from 'react'; import { useTheme } from '@/context/ThemeContext'; export const ThemeToggle = () => { const { theme, toggleTheme } = useTheme(); return ( <button onClick={toggleTheme} className="theme-toggle p-2" aria-label={`Switch to ${theme === 'light' ? 'dark' : 'light'} mode`} > {theme === 'light' ? ( <svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" strokeWidth={1.5} stroke="currentColor" className="w-5 h-5" > <path strokeLinecap="round" strokeLinejoin="round" d="M21.752 15.002A9.718 9.718 0 0118 15.75c-5.385 0-9.75-4.365-9.75-9.75 0-1.33.266-2.597.748-3.752A9.753 9.753 0 003 11.25C3 16.635 7.365 21 12.75 21a9.753 9.753 0 009.002-5.998z" /> </svg> ) : ( <svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" strokeWidth={1.5} stroke="currentColor" className="w-5 h-5" > <path strokeLinecap="round" strokeLinejoin="round" d="M12 3v2.25m6.364.386l-1.591 1.591M21 12h-2.25m-.386 6.364l-1.591-1.591M12 18.75V21m-4.773-4.227l-1.591 1.591M5.25 12H3m4.227-4.773L5.636 5.636M15.75 12a3.75 3.75 0 11-7.5 0 3.75 3.75 0 017.5 0z" /> </svg> )} </button> ); }; ThemeContext leverages React Context hook to pass theme values down the component tree of the application without the need for explicit sharing. It uses a state variable conveniently named as theme to save the applied theme. On the load of the application, this context provider also reads the cookie and checks for any previously applied theme or system defaults. This is where the previously installed dependency js-dom comes in handy.ThemeToggle provides a button for toggling between the light and dark themes, switching between them using the export provided by ThemeContext.The next step is to build the layout. Create a new component ClientWrapper and add the following code to it. TypeScript-JSX // src\components\ClientWrapper.tsx 'use client'; import React, { useEffect, useState } from 'react'; import { useTheme } from '@/context/ThemeContext'; import { ThemeToggle } from './ThemeToggle'; export const ClientWrapper = ({ children }: { children: React.ReactNode }) => { const { theme } = useTheme(); const [mounted, setMounted] = useState(false); useEffect(() => { setMounted(true); }, []); if (!mounted) { return null; } return ( <div className="min-h-screen flex flex-col"> <header className="app-header p-4"> <div className="container mx-auto flex justify-between items-center"> <h1 className="text-xl font-bold">Expense Tracker</h1> <div className="flex items-center gap-4"> <div className="text-sm text-muted"> Current theme: {theme} </div> <ThemeToggle /> </div> </div> </header> <main className="flex-grow container mx-auto p-4"> {children} </main> <footer className="app-footer p-4"> <div className="container mx-auto text-center text-sm text-muted"> © {new Date().getFullYear()} Expense Tracker - Dynamic Themes Demo </div> </footer> </div> ); }; ClientWrapper wraps the page components that you will implement soon. Having this component ensures consistent styling across the entire application. The component also provides a nice footer for your application. You are ready to build the login page. Add the code below to the src\app\page.tsx file. TypeScript-JSX // src\app\page.tsx 'use client'; import React, { useState } from 'react'; import { useRouter } from 'next/navigation'; import { ClientWrapper } from '@/components/ClientWrapper'; export default function Login() { const [username, setUsername] = useState(''); const [password, setPassword] = useState(''); const router = useRouter(); const handleSubmit = (e: React.FormEvent) => { e.preventDefault(); router.push('/dashboard'); }; return ( <ClientWrapper> <div className="flex justify-center items-center min-h-[80vh]"> <div className="themed-card w-full max-w-md p-6"> <h2 className="text-2xl font-bold mb-6 text-center">Login</h2> <form onSubmit={handleSubmit}> <div className="mb-4"> <label htmlFor="username" className="themed-label"> Username </label> <input type="text" id="username" className="themed-input" value={username} onChange={(e) => setUsername(e.target.value)} required /> </div> <div className="mb-6"> <label htmlFor="password" className="themed-label"> Password </label> <input type="password" id="password" className="themed-input" value={password} onChange={(e) => setPassword(e.target.value)} required /> </div> <button type="submit" className="themed-button w-full py-2" > Login </button> </form> </div> </div> </ClientWrapper> ); } The Login page provides a simple login form, and upon entering username and password, it redirects the user to the dashboard page.Next in the journey, create the dashboard page to display the expense tracker. The page will need two additional components, ExpenseChart and ExpenseTable. Here, you will see the use of the previously installed Chart.js dependency. Additionally, dummy data to show expenses have been added. Create these three files in their respective locations and augment them with the following code. TypeScript-JSX // src\app\dashboard\page.tsx 'use client'; import React from 'react'; import { ClientWrapper } from '@/components/ClientWrapper'; import { ExpenseChart } from '@/components/ExpenseChart'; import { ExpenseTable } from '@/components/ExpenseTable'; const expenseData = [ { category: 'Housing', amount: 1200 }, { category: 'Food', amount: 400 }, { category: 'Transportation', amount: 250 }, { category: 'Utilities', amount: 180 }, { category: 'Entertainment', amount: 150 }, { category: 'Savings', amount: 300 }, ]; export default function Dashboard() { return ( <ClientWrapper> <div className="mb-6"> <h1 className="text-2xl font-bold">Monthly Expense Dashboard</h1> <p className="text-muted mt-1"> Overview of your expenses for the current month </p> </div> <div className="grid md:grid-cols-2 gap-6"> <ExpenseChart data={expenseData} /> <ExpenseTable data={expenseData} /> </div> </ClientWrapper> ); } TypeScript-JSX // src\components\ExpenseChart.tsx 'use client'; import React, { useEffect, useState } from 'react'; import { Chart as ChartJS, ArcElement, Tooltip, Legend } from 'chart.js'; import { Pie } from 'react-chartjs-2'; import { useTheme } from '@/context/ThemeContext'; ChartJS.register(ArcElement, Tooltip, Legend); export type ExpenseData = { category: string; amount: number; }; interface ExpenseChartProps { data: ExpenseData[]; } export const ExpenseChart: React.FC<ExpenseChartProps> = ({ data }) => { const { theme } = useTheme(); const [chartData, setChartData] = useState<any>(null); const [chartOptions, setChartOptions] = useState<any>(null); useEffect(() => { const getThemeColor = (variable: string) => { const computedStyle = getComputedStyle(document.body); return computedStyle.getPropertyValue(variable).trim(); }; const chartColors = [ getThemeColor('--chart-color-1'), getThemeColor('--chart-color-2'), getThemeColor('--chart-color-3'), getThemeColor('--chart-color-4'), getThemeColor('--chart-color-5'), getThemeColor('--chart-color-6'), ]; setChartData({ labels: data.map(item => item.category), datasets: [ { data: data.map(item => item.amount), backgroundColor: chartColors, borderColor: theme === 'dark' ? 'rgba(0, 0, 0, 0.1)' : 'rgba(255, 255, 255, 0.1)', borderWidth: 1, }, ], }); setChartOptions({ responsive: true, plugins: { legend: { position: 'bottom' as const, labels: { color: getThemeColor('--chart-text'), font: { size: 12, }, }, }, tooltip: { backgroundColor: getThemeColor('--card-bg'), titleColor: getThemeColor('--card-text'), bodyColor: getThemeColor('--card-text'), borderColor: getThemeColor('--border-color'), borderWidth: 1, padding: 12, displayColors: true, callbacks: { label: function(context: any) { const label = context.label || ''; const value = context.raw || 0; const total = context.dataset.data.reduce((a: number, b: number) => a + b, 0); const percentage = Math.round((value / total) * 100); return `${label}: $${value} (${percentage}%)`; } } }, }, }); }, [data, theme]); if (!chartData || !chartOptions) { return <div className="themed-card p-6 flex items-center justify-center h-full">Loading chart...</div>; } return ( <div className="themed-card p-6"> <h2 className="text-xl font-bold mb-4">Monthly Expenses Breakdown</h2> <div className="h-80"> <Pie data={chartData} options={chartOptions} /> </div> </div> ); }; TypeScript-JSX // src\components\ExpenseTable.tsx import React from 'react'; import { ExpenseData } from './ExpenseChart'; interface ExpenseTableProps { data: ExpenseData[]; } export const ExpenseTable: React.FC<ExpenseTableProps> = ({ data }) => { const totalExpenses = data.reduce((sum, item) => sum + item.amount, 0); return ( <div className="themed-card p-6"> <h2 className="text-xl font-bold mb-4">Expense Details</h2> <div className="overflow-x-auto"> <table className="themed-table"> <thead> <tr> <th>Category</th> <th className="text-right">Amount</th> <th className="text-right">Percentage</th> </tr> </thead> <tbody> {data.map((item, index) => { const percentage = ((item.amount / totalExpenses) * 100).toFixed(1); return ( <tr key={index}> <td>{item.category}</td> <td className="text-right">${item.amount.toLocaleString()}</td> <td className="text-right">{percentage}%</td> </tr> ); })} <tr className="font-bold"> <td>Total</td> <td className="text-right">${totalExpenses.toLocaleString()}</td> <td className="text-right">100%</td> </tr> </tbody> </table> </div> </div> ); }; Now, putting everything together, update the root layout component with the ThemeContext so the entire application is aware of the theming. Update the layout component as per below. TypeScript-JSX // src\app\layout.tsx import type { Metadata } from 'next'; import { Inter } from 'next/font/google'; import './globals.css'; import { ThemeProvider } from '@/context/ThemeContext'; const inter = Inter({ subsets: ['latin'] }); export const metadata: Metadata = { title: 'Expense Tracker with Dynamic Themes', description: 'A Next.js application with dynamic theming for data visualizations', }; export default function RootLayout({ children, }: { children: React.ReactNode; }) { return ( <html lang="en"> <body className={inter.className}> <ThemeProvider> {children} </ThemeProvider> </body> </html> ); } Start the application with npm run dev if it is not already running, and navigate to http://localhost:3000/. You should see the application loaded as given in the illustration below. Notice the theme value in the upper right corner. Conclusion Congratulations on learning about the importance of CSS variables, theming in applications, and building an entire application to use them. You are now fully equipped with the knowledge to leverage CSS variables for theming. Using them in enterprise applications requires some special considerations, like modularity, consistent variable naming, defining base variables, and more for scalability and maintainability. Following these best practices ensures that theming remains flexible for large applications.
Let's look at how to integrate Redis with different message brokers. In this article, we will point out the benefits of this integration. We will talk about which message brokers work well with Redis. We will also show how to set up Redis as a message broker with practical code examples, and discuss how to handle message persistence and how to monitor Redis when it is used as a message broker. What Are the Benefits of Using Redis With Message Brokers? Using Redis with message brokers gives many good benefits. These benefits help improve how we handle messages. Here are the main advantages: High performance. Redis works in memory. This means it has very low delays and can handle many requests at once. This is great for real-time messaging apps where speed is very important.Pub/Sub messaging. Redis has a feature called publish/subscribe (Pub/Sub). This lets us send messages to many subscribers at the same time without needing direct connections. It is helpful for chat apps, notifications, or event-driven systems.Data structures. Redis has many data structures like strings, lists, sets, sorted sets, and hashes. We can use these structures for different messaging tasks. For example, we can use lists for queues and sets for unique message IDs.Scalability. Redis can grow by using clustering. This helps it manage more work by spreading data across many nodes. This is good for apps that need to be available all the time and handle problems.Persistence options. Redis has different options to save data, like RDB snapshots and AOF logs. This helps keep message data safe even if something goes wrong. We can balance good performance with saving data.Ease of use. Redis commands are simple. There are also many good client libraries for different programming languages like Python, Java, and Node.js. This makes it easy to add Redis to our apps.Monitoring and management. Redis has tools to check how well it is working. Tools like Redis CLI and RedisInsight help us improve the message broker setup and find problems.Lightweight. Redis uses fewer resources compared to older message brokers like RabbitMQ or Kafka. This makes it a good choice for microservices and container setups.Support for streams. Redis Streams is a strong feature that lets us work with log-like data. This helps with complex message processing and managing groups of consumers. It is useful for event sourcing and CQRS patterns. By using these benefits, we can build strong and efficient messaging systems with Redis. For more information on what Redis can do, you can check What is Redis? and What are Redis Streams? Which Message Brokers Are Compatible With Redis? We can use Redis with many popular message brokers. This makes them work better and faster. Here are some main message brokers that work well with Redis: RabbitMQ We can use Redis to store messages for RabbitMQ. By using Redis for message storage, RabbitMQ can handle its tasks better. This is especially useful when we need quick access to message queues. Apache Kafka Kafka can use Redis for keeping messages temporarily. With Redis streams, Kafka producers can save messages before sending them to consumers. This can help increase throughput. ActiveMQ We can set up ActiveMQ to use Redis for storing messages in a queue. This can make retrieving and processing messages faster. NATS NATS can use Redis to keep messages safe and to manage state in a distributed system. This lets us store messages in Redis for later use. Celery Celery is a tool for managing tasks. We can use Redis as a broker for Celery. This helps us manage background tasks and scheduling better. Code Example for Using Redis With Celery To connect Redis as a message broker with Celery, we can set it up in the Celery configuration like this: Python from celery import Celery app = Celery('tasks', broker='redis://localhost:6379/0') @app.task def add(x, y): return x + y This code shows a simple Celery task using Redis as the broker. This lets us do asynchronous message processing very well. Apache Pulsar Like Kafka, Apache Pulsar can also use Redis for caching and quick message retrieval. This can make message processing more efficient. How Do I Set Up Redis As a Message Broker? To set up Redis as a message broker, we can follow these steps: 1. Install Redis First, we need to make sure Redis is installed on our server. We can check the Redis installation guide. 2. Configure Redis Next, we open the Redis configuration file. This file is usually called redis.conf. We need to set these properties for message brokering: Shell # Enable persistence for durability save 900 1 save 300 10 save 60 10000 # Set the max memory limit maxmemory 256mb maxmemory-policy allkeys-lru # Enable Pub/Sub messaging notify-keyspace-events Ex 3. Start the Redis server Now, we can start Redis with this command: Shell redis-server /path/to/redis.conf 4. Use Redis for Pub/Sub We can publish and subscribe to channels using the Redis CLI or client libraries. Here is an example using Python: Python import redis # Connect to Redis r = redis.StrictRedis(host='localhost', port=6379, db=0) # Subscriber def message_handler(message): print(f"Received message: {message['data']}") pubsub = r.pubsub() pubsub.subscribe(**{'my-channel': message_handler}) # Listen for messages pubsub.run_in_thread(sleep_time=0.001) # Publisher r.publish('my-channel', 'Hello, Redis!') 5. Use Message Queues For task queues, we can use Redis lists. Here is how we can make a simple queue: Producer Python r.lpush('task_queue', 'Task 1') r.lpush('task_queue', 'Task 2') Consumer Python while True: task = r.brpop('task_queue')[1] print(f'Processing {task.decode()}') By following these steps, we can easily set up Redis as a message broker. We can use both Pub/Sub and list-based message queuing. For more insights on Redis data types, we can check the article on Redis data types. Practical Code Examples for Integrating Redis With Message Brokers Integrating Redis with message brokers helps us improve messaging abilities. We can use Redis's speed and efficiency. Below, we show simple code examples for using Redis with popular message brokers like RabbitMQ and Kafka. Example 1: Using Redis with RabbitMQ In this example, we will use Python with the pika library. We will send and receive messages through RabbitMQ, using Redis to store data. Installation Shell pip install pika redis Producer Code Python import pika import redis # Connect to RabbitMQ connection = pika.BlockingConnection(pika.ConnectionParameters('localhost')) channel = connection.channel() channel.queue_declare(queue='task_queue', durable=True) # Connect to Redis redis_client = redis.Redis(host='localhost', port=6379, db=0) message = 'Hello World!' # Publish message to RabbitMQ channel.basic_publish(exchange='', routing_key='task_queue', body=message, properties=pika.BasicProperties( delivery_mode=2, # make message persistent )) # Store message in Redis redis_client.lpush('messages', message) print(" [x] Sent %r" % message) connection.close() Consumer Code Python import pika import redis # Connect to RabbitMQ connection = pika.BlockingConnection(pika.ConnectionParameters('localhost')) channel = connection.channel() channel.queue_declare(queue='task_queue', durable=True) # Connect to Redis redis_client = redis.Redis(host='localhost', port=6379, db=0) def callback(ch, method, properties, body): message = body.decode() print(" [x] Received %r" % message) # Store received message in Redis redis_client.lpush('processed_messages', message) ch.basic_ack(delivery_tag=method.delivery_tag) channel.basic_consume(queue='task_queue', on_message_callback=callback) print(' [*] Waiting for messages. To exit press CTRL+C') channel.start_consuming() Example 2: Using Redis With Kafka In this example, we will use Java with Apache Kafka and Redis to send messages. Dependencies (Maven) XML <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>3.2.0</version> </dependency> <dependency> <groupId>redis.clients</groupId> <artifactId>jedis</artifactId> <version>4.0.1</version> </dependency> Producer Code Java import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import redis.clients.jedis.Jedis; import java.util.Properties; public class RedisKafkaProducer { public static void main(String[] args) { Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); KafkaProducer<String, String> producer = new KafkaProducer<>(props); Jedis jedis = new Jedis("localhost"); String message = "Hello Kafka!"; producer.send(new ProducerRecord<>("my-topic", message)); jedis.lpush("messages", message); producer.close(); jedis.close(); } } Consumer Code Java import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.KafkaConsumer; import redis.clients.jedis.Jedis; import java.time.Duration; import java.util.Collections; import java.util.Properties; public class RedisKafkaConsumer { public static void main(String[] args) { Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Collections.singletonList("my-topic")); Jedis jedis = new Jedis("localhost"); while (true) { for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) { System.out.printf("Consumed message: %s%n", record.value()); jedis.lpush("processed_messages", record.value()); } } } } These examples show how we can connect Redis with message brokers like RabbitMQ and Kafka. This gives us strong messaging solutions. How Do I Handle Message Persistence in Redis? To handle message persistence in Redis, we can use two main ways: RDB (Redis Database Backup) and AOF (Append-Only File). RDB Persistence RDB saves snapshots of your data at set times. This is good for backups. But if Redis crashes between snapshots, we can lose some data. Configuration Shell save 900 1 # Save the DB if at least 1 key changed in 900 seconds save 300 10 # Save the DB if at least 10 keys changed in 300 seconds AOF Persistence AOF logs every write action the server gets. This helps us recover data more up-to-date. But the files become larger. Configuration Shell appendonly yes appendfsync everysec # Fsync every second for a balance of performance and durability Command for Enabling Persistence To turn on persistence, we can change the `redis.conf` file or use commands in the Redis CLI: Shell # Enable RDB CONFIG SET save "900 1" # Enable AOF CONFIG SET appendonly yes Choosing Between RDB and AOF RDB is good when speed is very important and losing some data is okay.AOF is better when we need to keep data safe. We can also use both methods together. RDB will take snapshots and AOF will log changes. Monitoring Persistence We can check the status and performance of persistence using Redis commands: Shell INFO persistence This command shows us the current state of RDB and AOF. It includes the last save time and AOF file size. For more details on Redis persistence, we can look at what Redis persistence is and learn how to set up RDB and AOF well. How Do I Monitor Redis in a Message Broker Setup? Monitoring Redis in a message broker setup is very important. It helps us make sure Redis works well and is reliable. We have many tools and methods to monitor Redis. These include built-in commands, external tools, and custom scripts. Built-in Monitoring Commands Redis has some built-in commands for monitoring: INFO This command gives us server stats and config. Shell redis-cli INFO MONITOR This command shows all commands that the Redis server gets in real-time. Shell redis-cli MONITOR SLOWLOG This command shows slow commands. It helps us find performance problems. Shell redis-cli SLOWLOG GET 10 External Monitoring Tools Redis monitoring tools. We can use tools like RedisInsight, Datadog, or Prometheus with Grafana. These tools help us see important data like memory use and command run time.Redis Sentinel. This tool helps with high availability and monitoring. It can tell us when there are failures and can do automatic failovers. Key Metrics to Monitor Memory usage. We need to watch memory use to avoid running out of memory.CPU usage. We should track CPU use to use resources well.Command latency. We measure how long commands take to run. This helps us find slow commands.Connection count. We need to monitor active connections to stay within limits.Replication lag. If we use replication, we should check the lag between master and slave instances. Example Monitoring Setup With Prometheus To set up Prometheus for Redis monitoring, we can use the Redis Exporter. 1. Install Redis Exporter Shell docker run -d -p 9121:9121 --name=redis-exporter oliver006/redis_exporter 2. Configure Prometheus We add the following job in our prometheus.yml: YAML scrape_configs: - job_name: 'redis' static_configs: - targets: ['localhost:9121'] 3. Visualize in Grafana We connect Grafana to our Prometheus and create dashboards to see Redis data. Custom Monitoring Scripts We can also make our own scripts using Python with the redis library. This helps us check and alert automatically. Python import redis client = redis.StrictRedis(host='localhost', port=6379, db=0) info = client.info() # Check memory usage if info['used_memory'] > 100 * 1024 * 1024: # 100 MB threshold print("Memory usage is too high!") Using these monitoring methods help us keep our Redis environment healthy in our message broker setup. For more info on Redis commands and settings, check Redis CLI usage. Frequently Asked Questions 1. What are the best practices for integrating Redis with message brokers? To integrate Redis with message brokers, we should follow some best practices. First, we can use Redis Pub/Sub for messaging in real-time. Also, we can use Redis Streams for message queuing. We need to set up message persistence correctly. It is also good to use Redis data types in a smart way. 2. How do I ensure message persistence when using Redis with message brokers? To keep messages safe in Redis, we can set it to use RDB (Redis Database Backup) or AOF (Append-Only File) methods. RDB snapshots help us recover data fast. AOF saves every write action to make sure we do not lose any data. 3. Is Redis a reliable message broker? Yes, Redis can work as a reliable message broker if we set it up right. It has low delay and high speed, so it is good for real-time use. But we need to add things like acknowledgments and re-sending messages to make sure it is reliable. 4. Which programming languages support Redis integration with message brokers? Redis works with many programming languages. Some of them are Python, Java, Node.js, PHP, and Ruby. Each language has its own Redis client libraries. This makes it easy to connect with message brokers. 5. How can I monitor Redis performance in a message broker setup? It is important for us to check how Redis performs in a message broker setup. We can use tools like Redis Insights or the built-in Redis commands to see key things like memory use, command stats, and delay.
In this blog post, we'll explore how to build a service that interacts with a PostgreSQL database and uses OpenAI's GPT-4 model to generate SQL queries based on natural language input. This service called NorthwindServicefromDB is designed to make it easier for users to query a database without needing to write SQL themselves. We'll walk through the code step by step, explaining each component and how it fits into the overall architecture. Overview The NorthwindServicefromDB service is a .NET class that provides a method called AnswerFromDB. This method takes a natural language query as input, generates a corresponding SQL query using OpenAI's GPT-4 model, executes the SQL query against a PostgreSQL database, and returns the results. The service is designed to work with the Northwind database, a sample database often used for learning and testing. Key Features Natural language to SQL conversion. The service uses OpenAI's GPT-4 model to convert natural language queries into SQL queries.Database schema retrieval. The service dynamically retrieves the schema of all tables in the database to provide context for the GPT-4 model.SQL query execution. The service executes the generated SQL query and returns the results in a structured format.Security. The service only allows SELECT queries to be executed, preventing any modifications to the database. Code Walkthrough Let's dive into the code and understand how each part works. 1. Setting Up the Environment The service uses environment variables to securely store sensitive information, such as the API key for OpenAI. The DotNetEnv package is used to load these variables from a .env file. C# Env.Load(".env"); string githubKey = Env.GetString("GITHUB_KEY"); 2. Initializing the OpenAI Chat Client The service uses the AzureOpenAIClient to interact with OpenAI's GPT-4 model. The client is initialized with the API endpoint and the API key. C# IChatClient client = new AzureOpenAIClient( new Uri("<https://models.inference.ai.azure.com>"), new AzureKeyCredential(githubKey)) .AsChatClient(modelId: "gpt-4o-mini"); 3. Retrieving Database Schema To generate accurate SQL queries, the service needs to know the structure of the database. The GetAllTableSchemas method retrieves the schema of all tables in the database. C# static async Task<string> GetAllTableSchemas() { using var connection = new NpgsqlConnection(_connectionString); await connection.OpenAsync(); var tableNames = new List<string>(); using (var command = new NpgsqlCommand("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'", connection)) using (var reader = await command.ExecuteReaderAsync()) { while (await reader.ReadAsync()) { tableNames.Add(reader["table_name"].ToString()); } } var allSchemas = new StringBuilder(); foreach (var tableName in tableNames) { allSchemas.AppendLine(await GetTableSchema(tableName)); } return allSchemas.ToString(); } The GetTableSchema method retrieves the schema for a specific table, including the column names and data types. C# static async Task<string> GetTableSchema(string tableName) { using var connection = new NpgsqlConnection(_connectionString); await connection.OpenAsync(); using var command = new NpgsqlCommand($"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{tableName}'", connection); using var reader = await command.ExecuteReaderAsync(); var schema = new StringBuilder(); schema.AppendLine($"Table: {tableName}"); schema.AppendLine("Columns:"); while (await reader.ReadAsync()) { schema.AppendLine($"- {reader["column_name"]} ({reader["data_type"]})"); } return schema.ToString(); } 4. Generating SQL Queries With OpenAI The AnswerFromDB method combines the database schema with the user's natural language query and sends it to the GPT-4 model to generate a SQL query. C# var response = await client.CompleteAsync($"{allTableSchemas}\\\\n{query}"); var sqlQuery = ExtractSqlQuery(response.Message.Text); Console.WriteLine("Generated Query: " + sqlQuery); The ExtractSqlQuery method extracts the SQL query from the model's response, which is expected to be enclosed in triple backticks (```sql). C# static string ExtractSqlQuery(string response) { var startIndex = response.IndexOf("```sql", StringComparison.OrdinalIgnoreCase); if (startIndex == -1) return ""; startIndex += 7; // Move past "```sql" var endIndex = response.IndexOf("```", startIndex, StringComparison.OrdinalIgnoreCase); if (endIndex == -1) return ""; return response.Substring(startIndex, endIndex - startIndex).Trim(); } 5. Executing the SQL Query Once the SQL query is generated, the service checks if it is a SELECT query (to prevent any modifications to the database) and then executes it. C# if (sqlQuery.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase)) { var result = await ExecuteQuery(sqlQuery); return $"Query result: {JsonSerializer.Serialize(result)}"; } else { return "Only SELECT queries are supported."; } The ExecuteQuery method executes the SQL query and returns the results as a list of dictionaries, where each dictionary represents a row in the result set. C# static async Task<List<Dictionary<string, object>>> ExecuteQuery(string query) { using var connection = new NpgsqlConnection(_connectionString); await connection.OpenAsync(); using var command = new NpgsqlCommand(query, connection); using var reader = await command.ExecuteReaderAsync(); var results = new List<Dictionary<string, object>>(); while (await reader.ReadAsync()) { var row = new Dictionary<string, object>(); for (int i = 0; i < reader.FieldCount; i++) { row[reader.GetName(i)] = reader.GetValue(i); } results.Add(row); } return results; } Conclusion The NorthwindServicefromDB service is a powerful tool that bridges the gap between natural language and database queries. Using OpenAI's GPT-4 model, users can interact with a database using plain English, making it accessible to non-technical users. The service is designed with security in mind, ensuring that only read-only queries are executed. Potential Enhancements Error handling. Add more robust error handling to manage cases where the GPT-4 model generates invalid SQL queries.Caching. Implement caching for the database schema to reduce the number of database calls.User authentication. Add user authentication and authorization to restrict access to the service. This service is a great example of how AI can be integrated into traditional software development to create more intuitive and user-friendly applications.
DZone events bring together industry leaders, innovators, and peers to explore the latest trends, share insights, and tackle industry challenges. From Virtual Roundtables to Fireside Chats, our events cover a wide range of topics, each tailored to provide you, our DZone audience, with practical knowledge, meaningful discussions, and support for your professional growth. DZone Events Happening Soon Below, you’ll find upcoming events that you won't want to miss. DevOps for Oracle Applications: Automation nd Compliance Made Easy Date: March 11, 2025Time: 1:00 PM ET Register for Free! Join Flexagon and DZone as Flexagon's CEO unveils how FlexDeploy is helping organizations future-proof their DevOps strategy for Oracle Applications and Infrastructure. Explore innovations for automation through compliance, along with real-world success stories from companies who have adopted FlexDeploy. Make AI Your App Development Advantage: Learn Why and How Date: March 12, 2025Time: 10:00 AM ET Register for Free! The future of app development is here, and AI is leading the charge. Join Outsystems and DZone, on March 12th at 10am ET, for an exclusive Webinar with Luis Blando, CPTO of OutSystems, and John Rymer, industry analyst at Analysis.Tech, as they discuss how AI and low-code are revolutionizing development.You will also hear from David Gilkey, Leader of Solution Architecture, Americas East at OutSystems, and Roy van de Kerkhof, Director at NovioQ. This session will give you the tools and knowledge you need to accelerate your development and stay ahead of the curve in the ever-evolving tech landscape. Developer Experience: The Coalescence of Developer Productivity, Process Satisfaction, and Platform Engineering Date: March 12, 2025Time: 1:00 PM ET Register for Free! Explore the future of developer experience at DZone’s Virtual Roundtable, where a panel will dive into key insights from the 2025 Developer Experience Trend Report. Discover how AI, automation, and developer-centric strategies are shaping workflows, productivity, and satisfaction. Don’t miss this opportunity to connect with industry experts and peers shaping the next chapter of software development. Unpacking the 2025 Developer Experience Trends Report: Insights, Gaps, and Putting it into Action Date: March 19, 2025Time: 1:00 PM ET Register for Free! We’ve just seen the 2025 Developer Experience Trends Report from DZone, and while it shines a light on important themes like platform engineering, developer advocacy, and productivity metrics, there are some key gaps that deserve attention. Join Cortex Co-founders Anish Dhar and Ganesh Datta for a special webinar, hosted in partnership with DZone, where they’ll dive into what the report gets right—and challenge the assumptions shaping the DevEx conversation. Their take? Developer experience is grounded in clear ownership. Without ownership clarity, teams face accountability challenges, cognitive overload, and inconsistent standards, ultimately hampering productivity. Don’t miss this deep dive into the trends shaping your team’s future. Accelerating Software Delivery: Unifying Application and Database Changes in Modern CI/CD Date: March 25, 2025Time: 1:00 PM ET Register for Free! Want to speed up your software delivery? It’s time to unify your application and database changes. Join us for Accelerating Software Delivery: Unifying Application and Database Changes in Modern CI/CD, where we’ll teach you how to seamlessly integrate database updates into your CI/CD pipeline. Petabyte Scale, Gigabyte Costs: Mezmo’s ElasticSearch to Quickwit Evolution Date: March 27, 2025Time: 1:00 PM ET Register for Free! For Mezmo, scaling their infrastructure meant facing significant challenges with ElasticSearch. That's when they made the decision to transition to Quickwit, an open-source, cloud-native search engine designed to handle large-scale data efficiently. This is a must-attend session for anyone looking for insights on improving search platform scalability and managing data growth. What's Next? DZone has more in store! Stay tuned for announcements about upcoming Webinars, Virtual Roundtables, Fireside Chats, and other developer-focused events. Whether you’re looking to sharpen your skills, explore new tools, or connect with industry leaders, there’s always something exciting on the horizon. Don’t miss out — save this article and check back often for updates!
The traditional way of building Docker images using the docker build command is simple and straightforward, but when working with complex applications consisting of multiple components, this process can become tedious and error-prone. This is where Docker Bake comes in — a powerful and flexible tool for organizing multi-stage and parallel image building. In this article, we'll look at the capabilities of Docker Bake, its advantages over the standard approach, and practical examples of its use for various development scenarios. What Is Docker Bake? Docker Bake is a BuildKit feature that allows you to organize and automate the Docker image-building process using configuration files. The main advantages of Docker Bake: Declarative syntax. Instead of multiple commands in scripts, you describe the desired result in HCL (HashiCorp Configuration Language), JSON, or YAML (Docker Compose files).Parallel building. BuildKit automatically performs image building in parallel where possible.Cache reuse. Efficient use of cache between different builds.Grouping and targeted builds. Ability to define groups of images and build only the targets needed at the moment.Variables and inheritance. A powerful system of variables and property inheritance between build targets.CI/CD integration. Easily integrates into continuous integration and delivery pipelines. Anatomy of a Bake File Let's look at the main components of a bake file: 1. Variables Variables allow you to define values that can be used in different parts of the configuration and easily redefined at runtime: Shell variable "TAG" { default = "latest" } variable "DEBUG" { default = "false" } Variables can be used in other parts of the configuration through string interpolation: ${TAG}. 2. Groups Groups allow you to combine multiple targets for simultaneous building: Shell group "default" { targets = ["app", "api"] } group "backend" { targets = ["api", "database"] } 3. Targets Targets are the main units of building, each defining one Docker image: Shell target "app" { dockerfile = "Dockerfile.app" context = "./app" tags = ["myorg/app:${TAG}"] args = { DEBUG = "${DEBUG}" } platforms = ["linux/amd64", "linux/arm64"] } Main target parameters: dockerfile – path to the Dockerfilecontext – build contexttags – tags for the imageargs – arguments to pass to the Dockerfileplatforms – platforms for multi-platform buildingtarget – target for multi-stage building in Dockerfileoutput – where to output the build resultcache-from and cache-to – cache settings 4. Inheritance One of the most powerful features of Bake is the ability to inherit parameters: Shell target "base" { context = "." args = { BASE_IMAGE = "node:16-alpine" } } target "app" { inherits = ["base"] dockerfile = "app/Dockerfile" tags = ["myapp/app:latest"] } The app target will inherit all parameters from the base and overwrite or supplement them with its own. 5. Functions In HCL, you can define functions for more flexible configuration: Shell function "tag" { params = [name, version] result = ["${name}:${version}"] } target "app" { tags = tag("myapp/app", "v1.0.0") } Installation and Setup Docker Bake is part of BuildKit, a modern engine for building Docker images. Starting with Docker 23.0, BuildKit is enabled by default, so most users don't need additional configuration. However, if you're using an older version of Docker or want to make sure BuildKit is activated, follow the instructions below. Checking the Docker Version Make sure you have an up-to-date version of Docker (23.0 or higher). You can check the version with the command: Plain Text docker --version If your Docker version is outdated, update it following the official documentation. Activating BuildKit (for old Docker versions) For Docker versions below 23.0, BuildKit needs to be activated manually. This can be done in one of the following ways: Via environment variable: Shell export DOCKER_BUILDKIT=1 Plain Text In the Docker configuration file: Edit the ~/.docker/config.json file and add the following parameters: JSON { "features": { "buildkit": true } } Via command line: When using the docker build or docker buildx bake command, you can explicitly specify the use of BuildKit: Shell DOCKER_BUILDKIT=1 docker buildx bake Installing Docker Buildx Docker Buildx is an extension of the Docker CLI that provides additional capabilities for building images, including support for multi-platform building. Starting with Docker 20.10, Buildx is included with Docker, but for full functionality, it's recommended to ensure it's installed and activated. Check Buildx Installation Shell docker buildx version If Buildx is not installed, follow the instructions below. Installing Buildx For Linux: Shell mkdir -p ~/.docker/cli-plugins curl -sSL https://github.com/docker/buildx/releases/latest/download/buildx-linux-amd64 -o ~/.docker/cli-plugins/docker-buildx chmod +x ~/.docker/cli-plugins/docker-buildx For macOS (using Homebrew): Shell brew install docker-buildx Creating and Using a Buildx Builder By default, Docker uses the built-in builder, but for full functionality, it's recommended to create a new builder: Shell docker buildx create --use --name my-builder Check that the builder is active: Shell docker buildx ls Docker Bake Basics Configuration Files Docker Bake uses configuration files that can be written in HCL (default), JSON, or YAML formats. Standard names for these files: docker-bake.hcldocker-bake.json You can also use docker-compose.yml with some extensions. HCL File Structure A typical Docker Bake configuration file has the following structure: Shell // Defining variables variable "TAG" { default = "latest" } // Defining groups group "default" { targets = ["app", "api"] } // Defining common settings target "docker-metadata-action" { tags = ["user/app:${TAG}"] } // Defining build targets target "app" { inherits = ["docker-metadata-action"] dockerfile = "Dockerfile.app" context = "./app" } target "api" { inherits = ["docker-metadata-action"] dockerfile = "Dockerfile.api" context = "./api" } Executing the Build Build all targets from the default group: Plain Text docker buildx bake Build a specific target or group: Plain Text docker buildx bake app Pass variables: Plain Text TAG=v1.0.0 docker buildx bake Practical Examples Example 1: Simple Multi-Component Application Suppose we have an application consisting of a web frontend, API, and database service. Here's what a docker-bake.hcl file might look like: Shell variable "TAG" { default = "latest" } group "default" { targets = ["frontend", "api", "db"] } group "services" { targets = ["api", "db"] } target "base" { context = "." args = { BASE_IMAGE = "node:16-alpine" } } target "frontend" { inherits = ["base"] dockerfile = "frontend/Dockerfile" tags = ["myapp/frontend:${TAG}"] args = { API_URL = "http://api:3000" } } target "api" { inherits = ["base"] dockerfile = "api/Dockerfile" tags = ["myapp/api:${TAG}"] args = { DB_HOST = "db" DB_PORT = "5432" } } target "db" { context = "./db" dockerfile = "Dockerfile" tags = ["myapp/db:${TAG}"] } Example 2: Multi-Platform Building One of the powerful aspects of Docker Bake is the ease of setting up multi-platform building: Shell variable "TAG" { default = "latest" } group "default" { targets = ["app-all"] } target "app" { dockerfile = "Dockerfile" tags = ["myapp/app:${TAG}"] } target "app-linux-amd64" { inherits = ["app"] platforms = ["linux/amd64"] } target "app-linux-arm64" { inherits = ["app"] platforms = ["linux/arm64"] } target "app-all" { inherits = ["app"] platforms = ["linux/amd64", "linux/arm64"] } Example 3: Different Development Environments Docker Bake makes it easy to manage builds for different environments (e.g., development, testing, and production). For this, you can use variables that are overridden via the command line: Shell variable "ENV" { default = "dev" } group "default" { targets = ["app-${ENV}"] } target "app-base" { dockerfile = "Dockerfile" args = { BASE_IMAGE = "node:16-alpine" } } target "app-dev" { inherits = ["app-base"] tags = ["myapp/app:dev"] args = { NODE_ENV = "development" DEBUG = "true" } } target "app-stage" { inherits = ["app-base"] tags = ["myapp/app:stage"] args = { NODE_ENV = "production" API_URL = "https://api.stage.example.com" } } target "app-prod" { inherits = ["app-base"] tags = ["myapp/app:prod", "myapp/app:latest"] args = { NODE_ENV = "production" API_URL = "https://api.example.com" } } To build an image for a specific environment, use the command: Plain Text ENV=prod docker buildx bake Advanced Docker Bake Features Matrix Builds Docker Bake allows you to define matrices for creating multiple build variants based on parameter combinations: Shell variable "REGISTRY" { default = "docker.io/myorg" } target "matrix" { name = "app-${platform}-${version}" matrix = { platform = ["linux/amd64", "linux/arm64"] version = ["1.0", "2.0"] } dockerfile = "Dockerfile" tags = ["${REGISTRY}/app:${version}-${platform}"] platforms = ["${platform}"] args = { VERSION = "${version}" } } This code will create four image variants for each combination of platform and version. You can build them all with a single command. Using External Files and Functions Docker Bake allows you to use external files and functions for more flexible configuration: Shell // Import variables from a JSON file variable "settings" { default = {} } function "tag" { params = [name, tag] result = ["${name}:${tag}"] } target "app" { dockerfile = "Dockerfile" tags = tag("myapp/app", "v1.0.0") args = { CONFIG = "${settings.app_config}" } } Then you can pass a settings file: Plain Text docker buildx bake --file settings.json Integration With Docker Compose Docker Bake can be integrated with Docker Compose, which is especially convenient for existing projects: YAML # docker-compose.yml services: app: build: context: ./app dockerfile: Dockerfile args: VERSION: "1.0" image: myapp/app:latest api: build: context: ./api dockerfile: Dockerfile image: myapp/api:latest Shell # docker-bake.hcl target "default" { context = "." dockerfile-inline = <<EOT FROM docker/compose:1.29.2 WORKDIR /app COPY docker-compose.yml . RUN docker-compose build EOT } Conditional Logic For more complex scenarios, you can use conditional logic: Shell variable "DEBUG" { default = "false" } target "app" { dockerfile = "Dockerfile" tags = ["myapp/app:latest"] args = { DEBUG = "${DEBUG}" EXTRA_PACKAGES = DEBUG == "true" ? "vim curl htop" : "" } } Using Docker Bake in CI/CD Docker Bake is perfect for use in CI/CD pipelines. Here's an example of integration with GitHub Actions, using secrets for secure authentication in Docker Hub: Shell # .github/workflows/build.yml name: Build and Publish on: push: branches: [main] tags: ['v*'] jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Login to DockerHub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKERHUB_USERNAME } password: ${{ secrets.DOCKERHUB_TOKEN } - name: Docker Metadata id: meta uses: docker/metadata-action@v4 with: images: myapp/app tags: | type=ref,event=branch type=ref,event=pr type=semver,pattern={{version} - name: Build and push uses: docker/bake-action@v2 with: files: | ./docker-bake.hcl targets: app push: true set: | *.tags=${{ steps.meta.outputs.tags } Debugging and Monitoring Builds Docker Bake provides several useful options for debugging the build process: View configuration without building: Plain Text docker buildx bake --print Detailed logs: Plain Text docker buildx bake --progress=plain Export to JSON for analysis: Plain Text docker buildx bake --print | jq Comparison With Other Tools Docker Bake vs. Docker Compose FeatureDocker BakeDocker ComposeMain purposeBuilding imagesContainer managementParallel buildingYes, automaticallyLimitedMatrix buildsYesNoInheritanceYes, powerful systemLimited (extends)Multi-platformYes, integratedNoConfiguration formatHCL, JSONYAML Docker Bake vs. Build Scripts AspectDocker BakeBash/scriptsDeclarativenessHighLowMaintenance complexityLowHighReusabilitySimpleComplexParallelismAutomaticManualCI/CD integrationSimpleRequires effort Best Practices Organize targets into logical groups: Shell group "all" { targets = ["app", "api", "worker"] } group "backend" { targets = ["api", "worker"] } Use inheritance for common settings: Shell target "common" { context = "." args = { BASE_IMAGE = "node:16-alpine" } } target "app" { inherits = ["common"] dockerfile = "app/Dockerfile" } Organize complex configurations into multiple files: Shell docker buildx bake \ -f ./common.hcl \ -f ./development.hcl \ app-dev Use variables for flexibility: Shell variable "REGISTRY" { default = "docker.io/myorg" } target "app" { tags = ["${REGISTRY}/app:latest"] } Apply matrices for complex build scenarios: Shell target "matrix" { matrix = { env = ["dev", "prod"] platform = ["linux/amd64", "linux/arm64"] } name = "app-${env}-${platform}" tags = ["myapp/app:${env}-${platform}"] } Common Problems and Solutions Problem 1: Cache is Not Used Efficiently Solution Properly structure your Dockerfile, placing layers that change less frequently at the beginning of the file: Dockerfile FROM node:16-alpine # First copy only dependency files COPY package.json package-lock.json ./ RUN npm install # Then copy the source code COPY . . Problem 2: Environment variable conflicts Solution Use explicit values in Docker Bake: Shell target "app" { args = { NODE_ENV = "production" } } Problem 3: Difficult to debug builds Solution Use detailed logs and inspection: Plain Text docker buildx bake --progress=plain --print app Conclusion Docker Bake provides a powerful, flexible, and declarative approach to organizing Docker image building. It solves many problems that teams face when using traditional build approaches, especially in complex multi-component projects. The main advantages of Docker Bake: Declarative approachEfficient cache usageParallel and multi-platform buildingPowerful variable and inheritance systemExcellent integration with CI/CD pipelines Implementing Docker Bake in your workflow can significantly simplify and speed up image-building processes, especially for teams working with microservice architecture or complex multi-component applications. Useful Resources Official Docker Bake documentationBuildKit GitHub repositoryHCL documentationDocker Buildx GitHub repository
As discussed in my previous article about data architectures emphasizing emerging trends, data processing is one of the key components in the modern data architecture. This article discusses various alternatives to Pandas library for better performance in your data architecture. Data processing and data analysis are crucial tasks in the field of data science and data engineering. As datasets grow larger and more complex, traditional tools like pandas can struggle with performance and scalability. This has led to the development of several alternative libraries, each designed to address specific challenges in data manipulation and analysis. Introduction The following libraries have emerged as powerful tools for data processing: Pandas – The traditional workhorse for data manipulation in PythonDask – Extends pandas for large-scale, distributed data processingDuckDB – An in-process analytical database for fast SQL queriesModin – A drop-in replacement for pandas with improved performancePolars – A high-performance DataFrame library built on RustFireDucks – A compiler-accelerated alternative to pandasDatatable – A high-performance library for data manipulation Each of these libraries offers unique features and benefits, catering to different use cases and performance requirements. Let's explore each one in detail: Pandas Pandas is a versatile and well-established library in the data science community. It offers robust data structures (DataFrame and Series) and comprehensive tools for data cleaning and transformation. Pandas excels at data exploration and visualization, with extensive documentation and community support. However, it faces performance issues with large datasets, is limited to single-threaded operations, and can have high memory usage for large datasets. Pandas is ideal for smaller to medium-sized datasets (up to a few GB) and when extensive data manipulation and analysis are required. Dask Dask extends pandas for large-scale data processing, offering parallel computing across multiple CPU cores or clusters and out-of-core computation for datasets larger than available RAM. It scales pandas operations to big data and integrates well with the PyData ecosystem. However, Dask only supports a subset of the pandas API and can be complex to set up and optimize for distributed computing. It's best suited for processing extremely large datasets that don't fit in memory or require distributed computing resources. Python import dask.dataframe as dd import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} # Pandas benchmark start_time = time.time() df_pandas = pd.DataFrame(data) result_pandas = df_pandas.groupby('A').sum() pandas_time = time.time() - start_time # Dask benchmark start_time = time.time() df_dask = dd.from_pandas(df_pandas, npartitions=4) result_dask = df_dask.groupby('A').sum() dask_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"Dask time: {dask_time:.4f} seconds") print(f"Speedup: {pandas_time / dask_time:.2f}x") For better performance, load data with Dask using dd.from_dict(data, npartitions=4 in place of the Pandas dataframe dd.from_pandas(df_pandas, npartitions=4) Output Plain Text Pandas time: 0.0838 seconds Dask time: 0.0213 seconds Speedup: 3.93x DuckDB DuckDB is an in-process analytical database that offers fast analytical queries using a columnar-vectorized query engine. It supports SQL with additional features and has no external dependencies, making setup simple. DuckDB provides exceptional performance for analytical queries and easy integration with Python and other languages. However, it's not suitable for high-volume transactional workloads and has limited concurrency options. DuckDB excels in analytical workloads, especially when SQL queries are preferred. Python import duckdb import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} df = pd.DataFrame(data) # Pandas benchmark start_time = time.time() result_pandas = df.groupby('A').sum() pandas_time = time.time() - start_time # DuckDB benchmark start_time = time.time() duckdb_conn = duckdb.connect(':memory:') duckdb_conn.register('df', df) result_duckdb = duckdb_conn.execute("SELECT A, SUM(B) FROM df GROUP BY A").fetchdf() duckdb_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"DuckDB time: {duckdb_time:.4f} seconds") print(f"Speedup: {pandas_time / duckdb_time:.2f}x") Output Plain Text Pandas time: 0.0898 seconds DuckDB time: 0.1698 seconds Speedup: 0.53x Modin Modin aims to be a drop-in replacement for pandas, utilizing multiple CPU cores for faster execution and scaling pandas operations across distributed systems. It requires minimal code changes to adopt and offers potential for significant speed improvements on multi-core systems. However, Modin may have limited performance improvements in some scenarios and is still in active development. It's best for users looking to speed up existing pandas workflows without major code changes. Python import modin.pandas as mpd import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} # Pandas benchmark start_time = time.time() df_pandas = pd.DataFrame(data) result_pandas = df_pandas.groupby('A').sum() pandas_time = time.time() - start_time # Modin benchmark start_time = time.time() df_modin = mpd.DataFrame(data) result_modin = df_modin.groupby('A').sum() modin_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"Modin time: {modin_time:.4f} seconds") print(f"Speedup: {pandas_time / modin_time:.2f}x") Output Plain Text Pandas time: 0.1186 seconds Modin time: 0.1036 seconds Speedup: 1.14x Polars Polars is a high-performance DataFrame library built on Rust, featuring a memory-efficient columnar memory layout and a lazy evaluation API for optimized query planning. It offers exceptional speed for data processing tasks and scalability for handling large datasets. However, Polars has a different API from pandas, requiring some learning, and may struggle with extremely large datasets (100 GB+). It's ideal for data scientists and engineers working with medium to large datasets who prioritize performance. Python import polars as pl import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} # Pandas benchmark start_time = time.time() df_pandas = pd.DataFrame(data) result_pandas = df_pandas.groupby('A').sum() pandas_time = time.time() - start_time # Polars benchmark start_time = time.time() df_polars = pl.DataFrame(data) result_polars = df_polars.group_by('A').sum() polars_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"Polars time: {polars_time:.4f} seconds") print(f"Speedup: {pandas_time / polars_time:.2f}x") Output Plain Text Pandas time: 0.1279 seconds Polars time: 0.0172 seconds Speedup: 7.45x FireDucks FireDucks offers full compatibility with the pandas API, multi-threaded execution, and lazy execution for efficient data flow optimization. It features a runtime compiler that optimizes code execution, providing significant performance improvements over pandas. FireDucks allows for easy adoption due to its pandas API compatibility and automatic optimization of data operations. However, it's relatively new and may have less community support and limited documentation compared to more established libraries. Python import fireducks.pandas as fpd import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} # Pandas benchmark start_time = time.time() df_pandas = pd.DataFrame(data) result_pandas = df_pandas.groupby('A').sum() pandas_time = time.time() - start_time # FireDucks benchmark start_time = time.time() df_fireducks = fpd.DataFrame(data) result_fireducks = df_fireducks.groupby('A').sum() fireducks_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"FireDucks time: {fireducks_time:.4f} seconds") print(f"Speedup: {pandas_time / fireducks_time:.2f}x") Output Plain Text Pandas time: 0.0754 seconds FireDucks time: 0.0033 seconds Speedup: 23.14x Datatable Datatable is a high-performance library for data manipulation, featuring column-oriented data storage, native-C implementation for all data types, and multi-threaded data processing. It offers exceptional speed for data processing tasks, efficient memory usage, and is designed for handling large datasets (up to 100 GB). Datatable's API is similar to R's data.table. However, it has less comprehensive documentation compared to pandas, fewer features, and is not compatible with Windows. Datatable is ideal for processing large datasets on a single machine, particularly when speed is crucial. Python import datatable as dt import pandas as pd import time # Sample data data = {'A': range(1000000), 'B': range(1000000, 2000000)} # Pandas benchmark start_time = time.time() df_pandas = pd.DataFrame(data) result_pandas = df_pandas.groupby('A').sum() pandas_time = time.time() - start_time # Datatable benchmark start_time = time.time() df_dt = dt.Frame(data) result_dt = df_dt[:, dt.sum(dt.f.B), dt.by(dt.f.A)] datatable_time = time.time() - start_time print(f"Pandas time: {pandas_time:.4f} seconds") print(f"Datatable time: {datatable_time:.4f} seconds") print(f"Speedup: {pandas_time / datatable_time:.2f}x") Output Plain Text Pandas time: 0.1608 seconds Datatable time: 0.0749 seconds Speedup: 2.15x Performance Comparison Data loading: 34 times faster than pandas for a 5.7GB datasetData sorting: 36 times faster than pandasGrouping operations: 2 times faster than pandas Datatable excels in scenarios involving large-scale data processing, offering significant performance improvements over pandas for operations like sorting, grouping, and data loading. Its multi-threaded processing capabilities make it particularly effective for utilizing modern multi-core processors Conclusion In conclusion, the choice of library depends on factors such as dataset size, performance requirements, and specific use cases. While pandas remains versatile for smaller datasets, alternatives like Dask and FireDucks offer strong solutions for large-scale data processing. DuckDB excels in analytical queries, Polars provides high performance for medium-sized datasets, and Modin aims to scale pandas operations with minimal code changes. The bar diagram below shows the performance of the libraries, using the DataFrame for comparison. The data is normalized for showing the percentages. Benchmark: Performance comparison For the Python code that shows the above bar chart with normalized data, refer to the Jupyter Notebook. Use Google Colab as FireDucks is available only on Linux Comparison Chart LibraryPerformanceScalabilityAPI Similarity to PandasBest Use CaseKey StrengthsLimitationsPandasModerateLowN/A (Original)Small to medium datasets, data explorationVersatility, rich ecosystemSlow with large datasets, single-threadedDaskHighVery HighHighLarge datasets, distributed computingScales pandas operations, distributed processingComplex setup, partial pandas API supportDuckDBVery HighModerateLowAnalytical queries, SQL-based analysisFast SQL queries, easy integrationNot for transactional workloads, limited concurrencyModinHighHighVery HighSpeeding up existing pandas workflowsEasy adoption, multi-core utilizationLimited improvements in some scenariosPolarsVery HighHighModerateMedium to large datasets, performance-criticalExceptional speed, modern APILearning curve, struggles with very large dataFireDucksVery HighHighVery HighLarge datasets, pandas-like API with performanceAutomatic optimization, pandas compatibilityNewer library, less community supportDatatableVery HighHighModerateLarge datasets on single machineFast processing, efficient memory useLimited features, no Windows support This table provides a quick overview of each library's strengths, limitations, and best use cases, allowing for easy comparison across different aspects such as performance, scalability, and API similarity to pandas.
Non-Project Backlog Management for Software Engineering Teams
March 6, 2025 by
March 3, 2025
by
CORE
The Tree of DevEx: Branching Out and Growing the Developer Experience [Infographic]
February 27, 2025 by
Performance Optimization Techniques for Snowflake on AWS
March 6, 2025 by
Enhanced Query Caching Mechanism in Hibernate 6.3.0
March 6, 2025 by
How Explainable AI Is Building Trust in Everyday Products
March 6, 2025 by
Performance Optimization Techniques for Snowflake on AWS
March 6, 2025 by
Deploying and Validating Kata Containers
March 6, 2025 by
Enhanced Query Caching Mechanism in Hibernate 6.3.0
March 6, 2025 by
Performance Optimization Techniques for Snowflake on AWS
March 6, 2025 by
Deploying and Validating Kata Containers
March 6, 2025 by
Enhanced Query Caching Mechanism in Hibernate 6.3.0
March 6, 2025 by
Performance Optimization Techniques for Snowflake on AWS
March 6, 2025 by
Deploying and Validating Kata Containers
March 6, 2025 by
DevOps: The Key to Reliable AI Data and Governance
March 6, 2025 by
Enhanced Query Caching Mechanism in Hibernate 6.3.0
March 6, 2025 by
How Explainable AI Is Building Trust in Everyday Products
March 6, 2025 by
Automating Podcast Promotion With AI and Event-Driven Design
March 6, 2025 by