Handsome Savings - Limited Time Offer 30% OFF - Ends In 0d 0h 0m 0s Coupon code: 50OFF
Welcome to QA4Exam
Logo

- Trusted Worldwide Questions & Answers

Databricks Databricks-Machine-Learning-Associate Exam Actual Questions

The questions for Databricks-Machine-Learning-Associate were last updated on Oct 4, 2024.
  • Viewing page 1 out of 15 pages.
  • Viewing questions 1-5 out of 74 questions
Unlock Access to All 74 Questions & Answers
Question No. 1

Which statement describes a Spark ML transformer?

Show Answer Hide Answer
Correct Answer: A

In Spark ML, a transformer is an algorithm that can transform one DataFrame into another DataFrame. It takes a DataFrame as input and produces a new DataFrame as output. This transformation can involve adding new columns, modifying existing ones, or applying feature transformations. Examples of transformers in Spark MLlib include feature transformers like StringIndexer, VectorAssembler, and StandardScaler.


Databricks documentation on transformers: Transformers in Spark ML

Question No. 2

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Show Answer Hide Answer
Correct Answer: C

The pandas API on Spark DataFrames are made up of Spark DataFrames with additional metadata. The pandas API on Spark aims to provide the pandas-like experience with the scalability and distributed nature of Spark. It allows users to work with pandas functions on large datasets by leveraging Spark's underlying capabilities.


Databricks documentation on pandas API on Spark: pandas API on Spark

Question No. 3

A data scientist is using the following code block to tune hyperparameters for a machine learning model:

Which change can they make the above code block to improve the likelihood of a more accurate model?

Show Answer Hide Answer
Correct Answer: A

To improve the likelihood of a more accurate model, the data scientist can increase num_evals to 100. Increasing the number of evaluations allows the hyperparameter tuning process to explore a larger search space and evaluate more combinations of hyperparameters, which increases the chance of finding a more optimal set of hyperparameters for the model.


Databricks documentation on hyperparameter tuning: Hyperparameter Tuning

Question No. 4

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Show Answer Hide Answer
Correct Answer: A

To use the pandas API on Spark, the data scientist can run the following code block:

import pyspark.pandas as ps df = ps.DataFrame(spark_df)

This code imports the pandas API on Spark and converts the Spark DataFrame spark_df into a pandas-on-Spark DataFrame, allowing the data scientist to use familiar pandas functions for further feature engineering.


Databricks documentation on pandas API on Spark: pandas API on Spark

Question No. 5

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.

They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?

Show Answer Hide Answer
Correct Answer: A

To parallelize the inference of group-specific models using the Pandas Function API in PySpark, you can use the applyInPandas function. This function allows you to apply a Python function on each group of a DataFrame and return a DataFrame, leveraging the power of pandas UDFs (user-defined functions) for better performance.

prediction_df = ( df.groupby('device_id') .applyInPandas(apply_model, schema=apply_return_schema) )

In this code:

groupby('device_id'): Groups the DataFrame by the 'device_id' column.

applyInPandas(apply_model, schema=apply_return_schema): Applies the apply_model function to each group and specifies the schema of the return DataFrame.


PySpark Pandas UDFs Documentation

Product Image

Unlock All Questions for Databricks Databricks-Machine-Learning-Associate Exam

Full Exam Access, Actual Exam Questions, Validated Answers, Anytime Anywhere, No Download Limits, No Practice Limits

Get All 74 Questions & Answers