Limited-Time Offer: Enjoy 60% Savings! - Ends In 0d 00h 00m 00s Coupon code: 60OFF
Welcome to QA4Exam
Logo

- Trusted Worldwide Questions & Answers

Most Recent Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Questions & Answers


Prepare for the Databricks Certified Associate Developer for Apache Spark 3.0 exam with our extensive collection of questions and answers. These practice Q&A are updated according to the latest syllabus, providing you with the tools needed to review and test your knowledge.

QA4Exam focus on the latest syllabus and exam objectives, our practice Q&A are designed to help you identify key topics and solidify your understanding. By focusing on the core curriculum, These Questions & Answers helps you cover all the essential topics, ensuring you're well-prepared for every section of the exam. Each question comes with a detailed explanation, offering valuable insights and helping you to learn from your mistakes. Whether you're looking to assess your progress or dive deeper into complex topics, our updated Q&A will provide the support you need to confidently approach the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam and achieve success.

The questions for Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 were last updated on Dec 21, 2024.
  • Viewing page 1 out of 36 pages.
  • Viewing questions 1-5 out of 180 questions
Get All 180 Questions & Answers
Question No. 1

Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate

format for this kind of data?

Show Answer Hide Answer
Correct Answer: D

The schema passed into schema should be of type StructType or a string, so all entries in which a list is passed are incorrect.

In addition, since all numbers are whole numbers, the IntegerType() data type is the correct option here. NumberType() is not a valid data type and StringType() would fail, since the parquet file is

stored in the 'most appropriate format for this kind of data', meaning that it is most likely an IntegerType, and Spark does not convert data types if a schema is provided.

Also note that StructType accepts only a single argument (a list of StructFields). So, passing multiple arguments is invalid.

Finally, Spark needs to know which format the file is in. However, all of the options listed are valid here, since Spark assumes parquet as a default when no file format is specifically passed.

More info: pyspark.sql.DataFrameReader.schema --- PySpark 3.1.2 documentation and StructType --- PySpark 3.1.2 documentation


Question No. 2

Which of the following describes characteristics of the Spark UI?

Show Answer Hide Answer
Correct Answer: D

There is a place in the Spark UI that shows the property spark.executor.memory.

Correct, you can see Spark properties such as spark.executor.memory in the Environment tab.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Wrong -- Jobs, Stages, Storage, Executors, and SQL are all tabs in the Spark UI. DAGs can be inspected in the 'Jobs' tab in the job details or in the Stages or SQL tab, but are not a separate tab.

Via the Spark UI, workloads can be manually distributed across distributors.

No, the Spark UI is meant for inspecting the inner workings of Spark which ultimately helps understand, debug, and optimize Spark transactions.

Via the Spark UI, stage execution speed can be modified.

No, see above.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

No, there is no Scheduler tab.


Question No. 3

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1. root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1. schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5. ])

6.

7. spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

Show Answer Hide Answer
Correct Answer: D

Correct code block:

schema = StructType([

StructField('itemId', IntegerType(), True),

StructField('attributes', ArrayType(StringType(), True), True),

StructField('supplier', StringType(), True)

])

spark.read.options(modifiedBefore='2029-03-20T05:44:46').schema(schema).parquet(filePath)

This Question: is more difficult than what you would encounter in the exam. In the exam, for this Question: type, only one error needs to be identified and not 'one or multiple' as in the

question.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

Correct! Columns in the schema definition should use the StructField type. Building a schema from pyspark.sql.types, as here using classes like StructType and StructField, is one of multiple ways

of expressing a schema in Spark. A StructType always contains a list of StructFields (see documentation linked below). So, nesting StructType and StructType as shown in the Question: is

wrong.

The modification date threshold should be specified by a keyword argument like options(modifiedBefore='2029-03-20T05:44:46') and not two consecutive non-keyword arguments as in the original

code block (see documentation linked below).

Spark cannot identify the file format correctly, because either it has to be specified by using the DataFrameReader.format(), as an argument to DataFrameReader.load(), or directly by calling, for

example, DataFrameReader.parquet().

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

No. If StructField would be used for the columns instead of StructType (see above), the third argument specified whether the column is nullable. The original schema shows that columns should be

nullable and this is specified correctly by the third argument being True in the schema in the code block.

It is correct, however, that the modification date threshold is specified incorrectly (see above).

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

Wrong. The attributes array is specified correctly, following the syntax for ArrayType (see linked documentation below). That Spark cannot identify the file format is correct, see correct answer

above. In addition, the DataFrameReader is called correctly through the SparkSession spark.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

Incorrect, the object types in the schema definition are correct and syntax of the call to Spark's DataFrameReader is correct.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

False. The data type of the schema is StructType and an accepted data type for the DataFrameReader.schema() method. It is correct however that the modification date threshold is specified

incorrectly (see correct answer above).


Question No. 4

The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.

Code block:

spark.collect(transactionsDf.select("storeId", "predError"))

Show Answer Hide Answer
Correct Answer: E

Correct code block:

transactionsDf.select('storeId', 'predError').collect()

collect() is a method of the DataFrame object.

More info: pyspark.sql.DataFrame.collect --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 2, Question: 24 (Databricks import instructions)


Question No. 5

Which of the following describes the difference between client and cluster execution modes?

Show Answer Hide Answer
Correct Answer: A

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

This is wrong, since execution modes do not specify whether workloads are run in the cloud or on-premise.

In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

Wrong, since in both cases executors run on worker nodes.

In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

Wrong -- in cluster mode, the driver runs on a worker node. In client mode, the driver runs on the client machine.

In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

No. In both modes, the cluster manager is typically on a separate node -- not on the same host as the driver. It only runs on the same host as the driver in local execution mode.

More info: Learning Spark, 2nd Edition, Chapter 1, and Spark: The Definitive Guide, Chapter 15. ()


Unlock All Questions for Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam

Full Exam Access, Actual Exam Questions, Validated Answers, Anytime Anywhere, No Download Limits, No Practice Limits

Get All 180 Questions & Answers