Which file format is used for storing Delta Lake Table?
Delta Lake tables use the Parquet format as their underlying storage format. Delta Lake enhances Parquet by adding a transaction log that keeps track of all the operations performed on the table. This allows features like ACID transactions, scalable metadata handling, and schema enforcement, making it an ideal choice for big data processing and management in environments like Databricks.
Reference: Databricks documentation on Delta Lake: Delta Lake Overview
Which query is performing a streaming hop from raw data to a Bronze table?
A)
B)
C)
D)
The query performing a streaming hop from raw data to a Bronze table is identified by using the Spark streaming read capability and then writing to a Bronze table. Let's analyze the options:
Option A: Utilizes .writeStream but performs a complete aggregation which is more characteristic of a roll-up into a summarized table rather than a hop into a Bronze table.
Option B: Also uses .writeStream but calculates an average, which again does not typically represent the raw to Bronze transformation, which usually involves minimal transformations.
Option C: This uses a basic .write with .mode('append') which is not a streaming operation, and hence not suitable for real-time streaming data transformation to a Bronze table.
Option D: It employs spark.readStream.load() to ingest raw data as a stream and then writes it out with .writeStream, which is a typical pattern for streaming data into a Bronze table where raw data is captured in real-time and minimal transformation is applied. This approach aligns with the concept of a Bronze table in a modern data architecture, where raw data is ingested continuously and stored in a more accessible format.
Reference: Databricks documentation on Structured Streaming: Structured Streaming in Databricks
What is stored in a Databricks customer's cloud account?
In a Databricks customer's cloud account, the primary elements stored include:
Data: This is the central type of content stored in the customer's cloud account. Data might include various datasets, tables, and files that are used and managed through Databricks platforms.
Notebooks: These are also stored within a customer's cloud account. Notebooks include scripts, notes, and other information necessary for data analysis and processing tasks.
Cluster management metadata is indeed managed through the cloud, but it's primarily handled by Databricks rather than stored directly in the customer's account. The Databricks web application itself is not stored within the customer's cloud account; rather, it's a service provided by Databricks.
Reference: Databricks documentation: Data in Databricks
A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:
CREATE TABLE jdbc_customer360
USING
OPTIONS (
url "jdbc:sqlite:/customers.db", dbtable "customer360"
)
Which line of code fills in the above blank to successfully complete the task?
To create a table in Databricks using data from an SQLite database, the correct syntax involves specifying the format of the data source. The format in the case of using JDBC (Java Database Connectivity) with SQLite is specified by the org.apache.spark.sql.jdbc format. This format allows Spark to interface with various relational databases through JDBC. Here is how the command should be structured:
CREATE TABLE jdbc_customer360
USING org.apache.spark.sql.jdbc
OPTIONS (
url 'jdbc:sqlite:/customers.db',
dbtable 'customer360'
)
The USING org.apache.spark.sql.jdbc line specifies that the JDBC data source is being used, enabling Spark to interact with the SQLite database via JDBC.
Reference: Databricks documentation on JDBC: Connecting to SQL Databases using JDBC
A data engineer wants to create a new table containing the names of customers who live in France.
They have written the following command:
CREATE TABLE customersInFrance
_____ AS
SELECT id,
firstName,
lastName
FROM customerLocations
WHERE country = 'FRANCE';
A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).
Which line of code fills in the above blank to successfully complete the task?
To include a property indicating that a table contains personally identifiable information (PII), the TBLPROPERTIES keyword is used in SQL to add metadata to a table. The correct syntax to define a table property for PII is as follows:
CREATE TABLE customersInFrance
USING DELTA
TBLPROPERTIES ('PII' = 'true')
AS
SELECT id,
firstName,
lastName
FROM customerLocations
WHERE country = 'FRANCE';
The TBLPROPERTIES ('PII' = 'true') line correctly sets a table property that tags the table as containing personally identifiable information. This is in accordance with organizational policies for handling sensitive information.
Reference: Databricks documentation on Delta Lake: Delta Lake on Databricks
Full Exam Access, Actual Exam Questions, Validated Answers, Anytime Anywhere, No Download Limits, No Practice Limits
Get All 100 Questions & Answers