extract data from 1040 tax return1040 data extractionparse 1040 form

How to Extract Data from a 1040 Tax Return Automatically

January 15, 2026

Why 1040 Data Extraction Is Hard

Form 1040 is deceptively complex. The base form is two pages, but a complete return often includes multiple schedules—Schedule C for business income, Schedule D for capital gains, Schedule E for rental properties. For mortgage underwriters, lenders, and financial institutions, extracting the right income lines from a 1040 is critical—and doing it manually is both slow and error-prone.

The numbers that matter for income verification—AGI on Line 11, wages on Line 1a, business income from Schedule C—are spread across pages, use different formatting each year as the IRS tweaks the form, and often arrive as scanned PDFs or phone photos.

What You Need to Extract from a 1040

The data required depends on your use case:

  • Mortgage underwriting: AGI (Line 11), wages (Line 1a), business income (Schedule C), rental income (Schedule E), and 2-year income trends
  • Income verification for lending: Total income (Line 9), AGI, and specific income sources
  • Tax planning: All income lines, deductions, tax liability, and refund/owed amount
  • Bookkeeping reconciliation: Matching reported income to client records

How Automated 1040 Data Extraction Works

1040 Parser uses AI trained specifically on IRS Form 1040 to extract every relevant field automatically:

  1. Upload the tax return PDF (including attached schedules)
  2. System identifies the tax year and form variant (1040, 1040-SR, 1040-NR)
  3. AI extracts all income lines, deductions, payments, and refund information
  4. Attached schedules are parsed separately and included in the output
  5. You receive structured JSON within seconds

Sample JSON Output: Complete 1040 Extraction

{
  "tax_year": 2025,
  "form_variant": "1040",
  "filing_status": "Married Filing Jointly",
  "taxpayer_name": "Michael Chen",
  "taxpayer_ssn_last4": "4421",
  "spouse_name": "Jennifer Chen",
  "spouse_ssn_last4": "8832",
  "income": {
    "line1a_wages": 145000.00,
    "line2b_taxable_interest": 2341.00,
    "line3b_ordinary_dividends": 5812.00,
    "line7_capital_gain_loss": 8200.00,
    "line9_total_income": 161353.00,
    "line11_agi": 158853.00
  },
  "deductions": {
    "line12_standard_or_itemized": 29200.00,
    "line15_taxable_income": 129653.00
  },
  "tax_and_payments": {
    "line24_total_tax": 21847.00,
    "line25a_w2_federal_withholding": 24000.00,
    "line34_overpaid": 2153.00,
    "line35a_refund": 2153.00
  },
  "schedules": {
    "schedule_d_summary": {
      "short_term_gain_loss": -800.00,
      "long_term_gain_loss": 9000.00,
      "net_capital_gain_loss": 8200.00
    }
  }
}

API Integration

Basic Request

curl -X POST https://1040parser.com/api/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@2025-tax-return.pdf"

Python Integration

import requests

def extract_1040_data(pdf_path, api_key):
    with open(pdf_path, 'rb') as f:
        response = requests.post(
            'https://1040parser.com/api/extract',
            headers={'Authorization': 'Bearer ' + api_key},
            files={'file': f}
        )
    
    data = response.json()
    return {
        'tax_year': data['tax_year'],
        'agi': data['income']['line11_agi'],
        'wages': data['income']['line1a_wages'],
        'total_income': data['income']['line9_total_income'],
        'filing_status': data['filing_status']
    }

result = extract_1040_data('client-return-2025.pdf', 'your-api-key')
print("AGI:", result['agi'])

Two-Year Processing for Mortgage Underwriting

Most mortgage guidelines require two years of tax returns. Extract both at once:

curl -X POST https://1040parser.com/api/extract/batch \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "files=@2024-return.pdf" \
  -F "files=@2025-return.pdf"

The response includes both years with consistent JSON structure, making it easy to calculate average income and verify trends.

What Makes a Good 1040 Extraction Tool

  • Multi-year support: Returns from 2020 through current year should all work
  • Schedule extraction: Base form data alone isn't enough for many use cases
  • Scanned document support: Most returns arrive as photos or scans, not clean PDFs
  • SSN masking: Sensitive identifiers should be masked in output (last 4 only)
  • Consistent field naming: JSON keys should be stable across years

Try It Free

1040 Parser offers 3 free extractions with no credit card required. Upload a real return and see the JSON output before you commit. Paid plans start at $15 for 10 forms—no subscriptions, credits never expire.

Ready to automate document parsing?

Try 1040 Parser free - no credit card required.