Handsome Savings - Limited Time Offer 30% OFF - Ends In 0d 0h 0m 0s Coupon code: 50OFF
Welcome to QA4Exam
Logo

- Trusted Worldwide Questions & Answers

Databricks Databricks-Certified-Data-Engineer-Associate Exam Actual Questions

The questions for Databricks-Certified-Data-Engineer-Associate were last updated on Oct 1, 2024.
  • Viewing page 1 out of 20 pages.
  • Viewing questions 1-5 out of 100 questions
Unlock Access to All 100 Questions & Answers
Question No. 1

Which file format is used for storing Delta Lake Table?

Show Answer Hide Answer
Correct Answer: A

Delta Lake tables use the Parquet format as their underlying storage format. Delta Lake enhances Parquet by adding a transaction log that keeps track of all the operations performed on the table. This allows features like ACID transactions, scalable metadata handling, and schema enforcement, making it an ideal choice for big data processing and management in environments like Databricks.

Reference: Databricks documentation on Delta Lake: Delta Lake Overview


Question No. 2

Which query is performing a streaming hop from raw data to a Bronze table?

A)

B)

C)

D)

Show Answer Hide Answer
Correct Answer: D

The query performing a streaming hop from raw data to a Bronze table is identified by using the Spark streaming read capability and then writing to a Bronze table. Let's analyze the options:

Option A: Utilizes .writeStream but performs a complete aggregation which is more characteristic of a roll-up into a summarized table rather than a hop into a Bronze table.

Option B: Also uses .writeStream but calculates an average, which again does not typically represent the raw to Bronze transformation, which usually involves minimal transformations.

Option C: This uses a basic .write with .mode('append') which is not a streaming operation, and hence not suitable for real-time streaming data transformation to a Bronze table.

Option D: It employs spark.readStream.load() to ingest raw data as a stream and then writes it out with .writeStream, which is a typical pattern for streaming data into a Bronze table where raw data is captured in real-time and minimal transformation is applied. This approach aligns with the concept of a Bronze table in a modern data architecture, where raw data is ingested continuously and stored in a more accessible format.

Reference: Databricks documentation on Structured Streaming: Structured Streaming in Databricks


Question No. 3

What is stored in a Databricks customer's cloud account?

Show Answer Hide Answer
Correct Answer: A

In a Databricks customer's cloud account, the primary elements stored include:

Data: This is the central type of content stored in the customer's cloud account. Data might include various datasets, tables, and files that are used and managed through Databricks platforms.

Notebooks: These are also stored within a customer's cloud account. Notebooks include scripts, notes, and other information necessary for data analysis and processing tasks.

Cluster management metadata is indeed managed through the cloud, but it's primarily handled by Databricks rather than stored directly in the customer's account. The Databricks web application itself is not stored within the customer's cloud account; rather, it's a service provided by Databricks.

Reference: Databricks documentation: Data in Databricks


Question No. 4

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Show Answer Hide Answer
Correct Answer: B

To create a table in Databricks using data from an SQLite database, the correct syntax involves specifying the format of the data source. The format in the case of using JDBC (Java Database Connectivity) with SQLite is specified by the org.apache.spark.sql.jdbc format. This format allows Spark to interface with various relational databases through JDBC. Here is how the command should be structured:

CREATE TABLE jdbc_customer360

USING org.apache.spark.sql.jdbc

OPTIONS (

url 'jdbc:sqlite:/customers.db',

dbtable 'customer360'

)

The USING org.apache.spark.sql.jdbc line specifies that the JDBC data source is being used, enabling Spark to interact with the SQLite database via JDBC.

Reference: Databricks documentation on JDBC: Connecting to SQL Databases using JDBC


Question No. 5

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = 'FRANCE';

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Show Answer Hide Answer
Correct Answer: D

To include a property indicating that a table contains personally identifiable information (PII), the TBLPROPERTIES keyword is used in SQL to add metadata to a table. The correct syntax to define a table property for PII is as follows:

CREATE TABLE customersInFrance

USING DELTA

TBLPROPERTIES ('PII' = 'true')

AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = 'FRANCE';

The TBLPROPERTIES ('PII' = 'true') line correctly sets a table property that tags the table as containing personally identifiable information. This is in accordance with organizational policies for handling sensitive information.

Reference: Databricks documentation on Delta Lake: Delta Lake on Databricks


Product Image

Unlock All Questions for Databricks Databricks-Certified-Data-Engineer-Associate Exam

Full Exam Access, Actual Exam Questions, Validated Answers, Anytime Anywhere, No Download Limits, No Practice Limits

Get All 100 Questions & Answers