Hive Troubleshooting
Learn the cause and how to resolve the following error messages:
Disk Space Exceeded Errors
At time, Hive may run out of local space to process a job. When this happens you may encounter an error that looks like this:
Diagnostic Messages for this Task:
Error: Task exceeded the limits:
org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException:
too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143.
current size is 322234851654 the limit is 322122547200
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException:
The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded:
quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB
Description
Hive processing will at times move data from S3 into local storage or HDFS storage on the Hadoop cluster nodes. Hive jobs fail when disk space is exhausted on Hadoop cluster nodes — either when one of the Hadoop nodes runs out of local space or when a job reaches its overall storage limit on HDFS.
In the past, under these conditions, a Treasure Data operator would be alerted to manually kill the job, and support would manually notify the customer and follow up. We are changing our handling of these situations to be more predictable and automated, and more in line with other job failures.
The new behavior is, such jobs will fail, and the job Output Log will contain a diagnostic message about the failure.
If the disk space is full on one Hadoop worker, the output log contains:
Diagnostic Messages for this Task:
Error: Task exceeded the limits:
org.apache.hadoop.mapred.Task$TaskReporter$TaskLimitException:
too much data in local scratch dir=/mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/1/appcache/application_1522879910596_701143.
current size is 322234851654 the limit is 322122547200
If the query exceeds the limit of HDFS storage, the output log contains:
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException:
The DiskSpace quota of /mnt/hive/hive-1/0 is exceeded:
quota = 8246337208320 B = 7.50 TB but diskspace consumed = 8246728346553 B = 7.50 TB
Resolution
To reduce disk usage by Hadoop jobs, limit the amount of input data, such as by applying TD_TIME_RANGE to restrict the time period of any subqueries that scan data, or consider applying more restrictive conditions to JOINs. See Performance Tuning for Hive for more suggestions.
Invalid Table Alias or Column Reference
Sometimes a job will fail with the following error message
FAILED: SemanticException [Error 10004]: Line 3:29 Invalid table alias or column reference 'order_id': (possible column names are: _c0, _c1)
The cause may be a failed query similar to this:
SELECT
TD_DATE_TRUNC(
'day',
CAST(
CONV(SUBSTR(order_id,
0,
8),
16,
10) AS BIGINT
),
'America/Los_Angeles'
),
COUNT(*)
FROM
pj_db.pj_table
GROUP BY
1
ORDER BY
1
The issue is caused by specifying a column number for ORDER BY. Numbers or table aliases in an ORDER BY clause result in semantic analysis failure. To resolve the problem, use the column alias in ORDER BY instead of the column number
SELECT
TD_DATE_TRUNC(
'day',
CAST(
CONV(SUBSTR(order_id,
0,
8),
16,
10) AS BIGINT
),
'America/Los_Angeles'
),
COUNT(*)
FROM
pj_db.pj_table
GROUP BY
1
ORDER BY
ORDER_ID