Prepare for the Google Cloud Certified Professional Data Engineer exam with our extensive collection of questions and answers. These practice Q&A are updated according to the latest syllabus, providing you with the tools needed to review and test your knowledge.
QA4Exam focus on the latest syllabus and exam objectives, our practice Q&A are designed to help you identify key topics and solidify your understanding. By focusing on the core curriculum, These Questions & Answers helps you cover all the essential topics, ensuring you're well-prepared for every section of the exam. Each question comes with a detailed explanation, offering valuable insights and helping you to learn from your mistakes. Whether you're looking to assess your progress or dive deeper into complex topics, our updated Q&A will provide the support you need to confidently approach the Google Professional-Data-Engineer exam and achieve success.
You have one BigQuery dataset which includes customers' street addresses. You want to retrieve all occurrences of street addresses from the dataset. What should you do?
To retrieve all occurrences of street addresses from a BigQuery dataset, the most effective and comprehensive method is to use Cloud Data Loss Prevention (DLP). Here's why option A is the best choice:
Cloud Data Loss Prevention (DLP):
Cloud DLP is designed to discover, classify, and protect sensitive information. It includes pre-defined infoTypes for various kinds of sensitive data, including street addresses.
Using Cloud DLP ensures thorough and accurate detection of street addresses based on advanced pattern recognition and contextual analysis.
Deep Inspection Job:
A deep inspection job allows you to scan entire tables for sensitive information.
By creating an inspection template that includes the STREET_ADDRESS infoType, you can ensure that all instances of street addresses are detected across your dataset.
Scalability and Accuracy:
Cloud DLP is scalable and can handle large datasets efficiently.
It provides a high level of accuracy in identifying sensitive data, reducing the risk of missing any occurrences.
Steps to Implement:
Set Up Cloud DLP:
Enable the Cloud DLP API in your Google Cloud project.
Create an Inspection Template:
Create an inspection template in Cloud DLP that includes the STREET_ADDRESS infoType.
Run Deep Inspection Jobs:
Create and run a deep inspection job for each table in your dataset using the inspection template.
Review the inspection job results to retrieve all occurrences of street addresses.
Cloud DLP Documentation
Creating Inspection Jobs
You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity 'Movie' the property 'actors' and the property 'tags' have multiple values but the property 'date released' does not. A typical query would ask for all movies with actor=
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
One of your encryption keys stored in Cloud Key Management Service (Cloud KMS) was exposed. You need to re-encrypt all of your CMEK-protected Cloud Storage data that used that key. and then delete the compromised key. You also want to reduce the risk of objects getting written without customer-managed encryption key (CMEK protection in the future. What should you do?
To re-encrypt all of your CMEK-protected Cloud Storage data after a key has been exposed, and to ensure future writes are protected with a new key, creating a new Cloud KMS key and a new Cloud Storage bucket is the best approach. Here's why option C is the best choice:
Re-encryption of Data:
By creating a new Cloud Storage bucket and copying all objects from the old bucket to the new bucket while specifying the new Cloud KMS key, you ensure that all data is re-encrypted with the new key.
This process effectively re-encrypts the data, removing any dependency on the compromised key.
Ensuring CMEK Protection:
Creating a new bucket and setting the new CMEK as the default ensures that all future objects written to the bucket are automatically protected with the new key.
This reduces the risk of objects being written without CMEK protection.
Deletion of Compromised Key:
Once the data has been copied and re-encrypted, the old key can be safely deleted from Cloud KMS, eliminating the risk associated with the compromised key.
Steps to Implement:
Create a New Cloud KMS Key:
Create a new encryption key in Cloud KMS to replace the compromised key.
Create a New Cloud Storage Bucket:
Create a new Cloud Storage bucket and set the default CMEK to the new key.
Copy and Re-encrypt Data:
Use the gsutil tool to copy data from the old bucket to the new bucket while specifying the new CMEK key:
gsutil -o 'GSUtil:gs_json_api_version=2' cp -r gs://old-bucket/* gs://new-bucket/
Delete the Old Key:
After ensuring all data is copied and re-encrypted, delete the compromised key from Cloud KMS.
Cloud KMS Documentation
Cloud Storage Encryption
Re-encrypting Data in Cloud Storage
Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of dat
a. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)
Full Exam Access, Actual Exam Questions, Validated Answers, Anytime Anywhere, No Download Limits, No Practice Limits
Get All 375 Questions & Answers