Hive Troubleshooting

Learn the cause and how to resolve the following error messages:

Disk space exceeded errors
Semantic exception errors

Disk Space Exceeded Errors

At time, Hive may run out of local space to process a job. When this happens you may encounter an error that looks like this:

Copy

Copied

Diagnostic Messages for this Task:
Error: Task exceeded the limits: 
    org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException: 
    too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143. 
    current size is 322234851654 the limit is 322122547200
Error: java.lang.RuntimeException: 
   org.apache.hadoop.hive.ql.metadata.HiveException:
   org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
   The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded: 
   quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB

Description

Hive processing will at times move data from S3 into local storage or HDFS storage on the Hadoop cluster nodes. Hive jobs fail when disk space is exhausted on Hadoop cluster nodes — either when one of the Hadoop nodes runs out of local space or when a job reaches its overall storage limit on HDFS.

In the past, under these conditions, a Treasure Data operator would be alerted to manually kill the job, and support would manually notify the customer and follow up. We are changing our handling of these situations to be more predictable and automated, and more in line with other job failures.

The new behavior is, such jobs will fail, and the job Output Log will contain a diagnostic message about the failure.

If the disk space is full on one Hadoop worker, the output log contains:

Copy

Copied

Diagnostic Messages for this Task:
Error: Task exceeded the limits: 
    org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException: 
    too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143. 
    current size is 322234851654 the limit is 322122547200

If the query exceeds the limit of HDFS storage, the output log contains:

Copy

Copied

Error: java.lang.RuntimeException: 
   org.apache.hadoop.hive.ql.metadata.HiveException:
   org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
   The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded: 
   quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB

Resolution

To reduce disk usage by Hadoop jobs, limit the amount of input data, such as by applying TD_TIME_RANGE to restrict the time period of any subqueries that scan data, or consider applying more restrictive conditions to JOINs. See Performance Tuning for Hive for more suggestions.