Automation
pandas-construction-analysis
Comprehensive Pandas
---
slug: "pandas-construction-analysis"
display_name: "Pandas Construction Analysis"
description: "Comprehensive Pandas toolkit for construction data analysis. Filter, group, aggregate BIM elements, calculate quantities, merge datasets, and generate reports from structured construction data."
---
# Pandas Construction Data Analysis
## Overview
Based on DDC methodology (Chapter 2.3), this skill provides comprehensive Pandas operations for construction data processing. Pandas is the Swiss Army knife for data analysts - handling everything from simple data filtering to complex aggregations across millions of rows.
**Book Reference:** "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"
> "Используя Pandas, вы можете управлять и анализировать наборы данных, намного превосходящие возможности Excel. В то время как Excel способен обрабатывать до 1 миллиона строк данных, Pandas может без труда работать с наборами данных, содержащими десятки миллионов строк."
> — DDC Book, Chapter 2.3
## Quick Start
```python
import pandas as pd
# Read construction data
df = pd.read_excel("bim_export.xlsx")
# Basic operations
print(df.head()) # First 5 rows
print(df.info()) # Column types and memory
print(df.describe()) # Statistics for numeric columns
# Filter structural elements
structural = df[df['Category'] == 'Structural']
# Calculate total volume
total_volume = df['Volume'].sum()
print(f"Total volume: {total_volume:.2f} m³")
```
## DataFrame Fundamentals
### Creating DataFrames
```python
import pandas as pd
# From dictionary (construction elements)
elements = pd.DataFrame({
'ElementId': ['E001', 'E002', 'E003', 'E004'],
'Category': ['Wall', 'Floor', 'Wall', 'Column'],
'Material': ['Concrete', 'Concrete', 'Brick', 'Steel'],
'Volume_m3': [45.5, 120.0, 32.0, 8.5],
'Level': ['Level 1', 'Level 1', 'Level 2', 'Level 1']
})
# From CSV
df_csv = pd.read_csv("construction_data.csv")
# From Excel
df_excel = pd.read_excel("project_data.xlsx", sheet_name="Elements")
# From multiple Excel sheets
all_sheets = pd.read_excel("project.xlsx", sheet_name=None) # Dict of DataFrames
```
### Data Types in Construction
```python
# Common data types for construction
df = pd.DataFrame({
'element_id': pd.Series(['W001', 'W002'], dtype='string'),
'quantity': pd.Series([10, 20], dtype='int64'),
'volume': pd.Series([45.5, 32.0], dtype='float64'),
'is_structural': pd.Series([True, False], dtype='bool'),
'created_date': pd.to_datetime(['2024-01-15', '2024-01-16']),
'category': pd.Categorical(['Wall', 'Slab'])
})
# Check data types
print(df.dtypes)
# Convert types
df['quantity'] = df['quantity'].astype('float64')
df['volume'] = pd.to_numeric(df['volume'], errors='coerce')
```
## Filtering and Selection
### Basic Filtering
```python
# Single condition
walls = df[df['Category'] == 'Wall']
# Multiple conditions (AND)
large_concrete = df[(df['Material'] == 'Concrete') & (df['Volume_m3'] > 50)]
# Multiple conditions (OR)
walls_or_floors = df[(df['Category'] == 'Wall') | (df['Category'] == 'Floor')]
# Using isin for multiple values
structural = df[df['Category'].isin(['Wall', 'Column', 'Beam', 'Foundation'])]
# String contains
insulated = df[df['Description'].str.contains('insulated', case=False, na=False)]
# Null value filtering
incomplete = df[df['Cost'].isna()]
complete = df[df['Cost'].notna()]
```
### Advanced Selection
```python
# Select columns
volumes = df[['ElementId', 'Category', 'Volume_m3']]
# Query syntax (SQL-like)
result = df.query("Category == 'Wall' and Volume_m3 > 30")
# Loc and iloc
specific_row = df.loc[0] # By label
range_rows = df.iloc[0:10] # By position
specific_cell = df.loc[0, 'Volume_m3'] # Row and column
subset = df.loc[0:5, ['Category', 'Volume_m3']] # Range with columns
```
## Grouping and Aggregation
### GroupBy Operations
```python
# Basic groupby
by_category = df.groupby('Category')['Volume_m3'].sum()
# Multiple aggregations
summary = df.groupby('Category').agg({
'Volume_m3': ['sum', 'mean', 'count'],
'Cost': ['sum', 'mean']
})
# Named aggregations (cleaner output)
summary = df.groupby('Category').agg(
total_volume=('Volume_m3', 'sum'),
avg_volume=('Volume_m3', 'mean'),
element_count=('ElementId', 'count'),
total_cost=('Cost', 'sum')
).reset_index()
# Multiple grouping columns
by_level_cat = df.groupby(['Level', 'Category']).agg({
'Volume_m3': 'sum',
'Cost': 'sum'
}).reset_index()
```
### Pivot Tables
```python
# Create pivot table
pivot = pd.pivot_table(
df,
values='Volume_m3',
index='Level',
columns='Category',
aggfunc='sum',
fill_value=0,
margins=True, # Add totals
margins_name='Total'
)
# Multiple values
pivot_detailed = pd.pivot_table(
df,
values=['Volume_m3', 'Cost'],
index='Level',
columns='Category',
aggfunc={'Volume_m3': 'sum', 'Cost': 'mean'}
)
```
## Data Transformation
### Adding Calculated Columns
```python
# Simple calculation
df['Cost_Total'] = df['Volume_m3'] * df['Unit_Price']
# Conditional column
df['Size_Category'] = df['Volume_m3'].apply(
lambda x: 'Large' if x > 50 else ('Medium' if x > 20 else 'Small')
)
# Using np.where for binary conditions
import numpy as np
df['Is_Large'] = np.where(df['Volume_m3'] > 50, True, False)
# Using cut for binning
df['Volume_Bin'] = pd.cut(
df['Volume_m3'],
bins=[0, 10, 50, 100, float('inf')],
labels=['XS', 'S', 'M', 'L']
)
```
### String Operations
```python
# Extract from strings
df['Level_Number'] = df['Level'].str.extract(r'(\d+)').astype(int)
# Split and expand
df[['Building', 'Floor']] = df['Location'].str.split('-', expand=True)
# Clean strings
df['Category'] = df['Category'].str.strip().str.lower().str.title()
# Replace values
df['Material'] = df['Material'].str.replace('Reinforced Concrete', 'RC')
```
### Date Operations
```python
# Parse dates
df['Start_Date'] = pd.to_datetime(df['Start_Date'])
# Extract components
df['Year'] = df['Start_Date'].dt.year
df['Month'] = df['Start_Date'].dt.month
df['Week'] = df['Start_Date'].dt.isocalendar().week
df['DayOfWeek'] = df['Start_Date'].dt.day_name()
# Calculate duration
df['Duration_Days'] = (df['End_Date'] - df['Start_Date']).dt.days
# Filter by date range
recent = df[df['Start_Date'] >= '2024-01-01']
```
## Merging and Joining
### Merge DataFrames
```python
# Elements data
elements = pd.DataFrame({
'ElementId': ['E001', 'E002', 'E003'],
'Category': ['Wall', 'Floor', 'Column'],
'Volume_m3': [45.5, 120.0, 8.5]
})
# Unit prices
prices = pd.DataFrame({
'Category': ['Wall', 'Floor', 'Column', 'Beam'],
'Unit_Price': [150, 80, 450, 200]
})
# Inner join (only matching)
merged = elements.merge(prices, on='Category', how='inner')
# Left join (keep all elements)
merged = elements.merge(prices, on='Category', how='left')
# Join on different column names
result = df1.merge(df2, left_on='elem_id', right_on='ElementId')
```
### Concatenating DataFrames
```python
# Vertical concatenation (stacking)
all_floors = pd.concat([floor1_df, floor2_df, floor3_df], ignore_index=True)
# Horizontal concatenation
combined = pd.concat([quantities, costs, schedule], axis=1)
# Append new rows
new_elements = pd.DataFrame({'ElementId': ['E004'], 'Category': ['Beam']})
df = pd.concat([df, new_elements], ignore_index=True)
```
## Construction-Specific Analyses
### Quantity Take-Off (QTO)
```python
def generate_qto_report(df):
"""Generate Quantity Take-Off summary by category"""
qto = df.groupby(['Category', 'Material']).agg(
count=('ElementId', 'count'),
total_volume=('Volume_m3', 'sum'),
total_area=('Area_m2', 'sum'),
avg_volume=('Volume_m3', 'mean')
).round(2)
# Add percentage column
qto['volume_pct'] = (qto['total_volume'] /
qto['total_volume'].sum() * 100).round(1)
return qto.sort_values('total_volume', ascending=False)
# Usage
qto_report = generate_qto_report(df)
qto_report.to_excel("qto_report.xlsx")
```
### Cost Estimation
```python
def calculate_project_cost(elements_df, prices_df, markup=0.15):
"""Calculate total project cost with markup"""
# Merge with prices
df = elements_df.merge(prices_df, on='Category', how='left')
# Calculate base cost
df['Base_Cost'] = df['Volume_m3'] * df['Unit_Price']
# Apply markup
df['Total_Cost'] = df['Base_Cost'] * (1 + markup)
# Summary by category
summary = df.groupby('Category').agg(
volume=('Volume_m3', 'sum'),
base_cost=('Base_Cost', 'sum'),
total_cost=('Total_Cost', 'sum')
).round(2)
return df, summary, summary['total_cost'].sum()
# Usage
detailed, summary, total = calculate_project_cost(elements, prices)
print(f"Project Total: ${total:,.2f}")
```
### Material Summary
```python
def material_summary(df):
"""Summarize materials across project"""
summary = df.groupby('Material').agg({
'Volume_m3': 'sum',
'Weight_kg': 'sum',
'ElementId': 'nunique'
}).rename(columns={'ElementId': 'Element_Count'})
summary['Volume_Pct'] = (summary['Volume_m3'] /
summary['Volume_m3'].sum() * 100).round(1)
return summary.sort_values('Volume_m3', ascending=False)
```
### Level-by-Level Analysis
```python
def analyze_by_level(df):
"""Analyze construction quantities by building level"""
level_summary = df.pivot_table(
values=['Volume_m3', 'Cost'],
index='Level',
columns='Category',
aggfunc='su
... (truncated)
automation
By
Comments
Sign in to leave a comment