Question No. 1

Problem Scenario 91 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json tile locally.

2. Load this tile on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

ASolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

BSolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
parq_data.registerTempTable('employee')
val allemployee = sqlContext.sql('SELeCT' FROM employee')
all_employee.show()
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

Show Answer

Correct Answer: B

Question No. 2

Problem Scenario 93 : You have to run your Spark application with locally 8 thread or locally on 8 cores. Replace XXX with correct values.

spark-submit --class com.hadoopexam.MyTask XXX \ -deploy-mode cluster SSPARK_HOME/lib/hadoopexam.jar 10

ASolution
XXX: -master local[8]
Notes : The master URL passed to Spark can be in one of the following formats:
Master URL Meaning
local Run Spark locally with one worker thread (i.e. no parallelism at all}.
local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
The port must be whichever one your is configured to use, which is 5050 by default.Or, for a Mesoscluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.
yarn Connect to a YARN cluster in client or cluster mode depending on the value of -deploy-mode. The cluster location will be found based onthe HADOOP CONF DIR or YARN CONF DIR variable.

BSolution
XXX: -master local[8]
Notes : The master URL passed to Spark can be in one of the following formats:
Master URL Meaning
local Run Spark locally with one worker thread (i.e. no parallelism at all}.
local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
local[*] Run Spark locally with as many worker threads as logical cores on your machine.
spark://HOST:PORT Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.
mesos://HOST:PORT Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default.Or, for a Mesoscluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.
yarn Connect to a YARN cluster in client or cluster mode depending on the value of -deploy-mode. The cluster location will be found based onthe HADOOP CONF DIR or YARN CONF DIR variable.

Show Answer

Correct Answer: B

Question No. 3

Problem Scenario 94 : You have to run your Spark application on yarn with each executor 20GB and number of executors should be 50.Please replace XXX, YYY, ZZZ

export HADOOP_CONF_DIR=XXX

./bin/spark-submit \

-class com.hadoopexam.MyTask \

xxx\

-deploy-mode cluster \ # can be client for client mode

YYY\

222 \

/path/to/hadoopexam.jar \

1000

ASolution
XXX: -master yarn
YYY : -executor-memory 20G
ZZZ: -num-executors 50

BSolution
XXX: -master yarn
YYY : -executor-memory 40G
ZZZ: -num-executors 80

Show Answer

Correct Answer: A

Question No. 4

Problem Scenario 95 : You have to run your Spark application on yarn with each executor Maximum heap size to be 512MB and Number of processorcores to allocate on each executor will be 1 and Your main application required three values as input arguments V1 V2 V3.

Please replace XXX, YYY, ZZZ

./bin/spark-submit -class com.hadoopexam.MyTask --master yarn-cluster--num-executors 3 --driver-memory 512m XXX YYY lib/hadoopexam.jarZZZ

ASolution
XXX: -executor-memory 512m YYY: -executor-cores 1
ZZZ : V1 V2 V3
Notes : spark-submit on yarn options Option Description
archives Comma-separated list of archives to be extracted into the working directory of each executor. The path must be globally visible inside your cluster; see Advanced Dependency Management.
executor-cores Number of processor cores to allocate on each executor. Alternatively, you can use the spark.executor.cores property, executor-memory Maximum heap size to allocate to each executor. Alternatively, you can use the spark.executor.memory-property. num-executors Total number of YARN containers to allocate for this application. Alternatively, you can use the spark.executor.instances property. queue YARN queue to submit to. For more information, see Assigning Applications and Queries to Resource Pools. Default: default.

BSolution
XXX: -executor-memory 510m YYY: -executor-cores 1
ZZZ : V2 V6 V1
Notes : spark-submit on yarn options Option Description
archives Comma-separated list of archives to be extracted into the working directory of each executor. The path must be globally visible inside your cluster; see Advanced Dependency Management.
executor-cores Number of processor cores to allocate on each executor. Alternatively, you can use the spark.executor.cores property, executor-memory Maximum heap size to allocate to each executor. Alternatively, you can use the spark.executor.memory-property. num-executors Total number of YARN containers to allocate for this application.

Show Answer

Correct Answer: A

Question No. 5

Problem Scenario 96 : Your spark application required extra Java options as below. -XX:+PrintGCDetails-XX:+PrintGCTimeStamps

Please replace the XXX values correctly

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=talse --conf XXX hadoopexam.jar

ASolution
XXX: Mspark.executoi\extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps'
Notes: ./bin/spark-submit \
--class <maln-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
-conf <key>=<value> \
... # other options

\
[application-arguments]
Here, conf is used to pass the Spark related contigs which are required for the application to run like any specific property(executor memory) or if you want to override the default property which is set in Spark-default.conf.

BSolution
XXX: Mspark.executoi\extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps'
Notes: ./bin/spark-submit \

\
[application-arguments]
Here, conf is used to pass the Spark related contigs which are required for the application to run like any specific property(executor memory) or if you want to override the default property which is set in Spark-default.conf.

Show Answer

Correct Answer: A

Cloudera CCA175 Exam Actual Questions

The questions for CCA175 were last updated on Oct 2, 2024.

Unlock All Questions for Cloudera CCA175 Exam