- Published on
How to Upload, Parse, and Extract Emails from PDFs in Next.js v14
Introduction
I recently faced a challenge: I needed a way to extract email addresses from PDFs in my Next.js v14 app while ensuring only authenticated users with an active subscription could access the functionality. After trying several approaches, I settled on a solution that combines Supabase for auth, a secure API route for PDF processing, and pdf2json for parsing the PDF content.
In this post, I'll walk you through how to build a secure PDF email extractor—from setting up the authentication to processing PDFs on the server side.
📚 Prerequisites & Dependencies
Before diving in, make sure you have:
- A Next.js v14 project with App Router enabled
- Supabase configured for authentication with environment variables set up:
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
- The following dependencies installed:
npm install pdf2json uuid @supabase/supabase-js
Setting Up Authentication & Subscription Checks
The first step in our API route is to verify that the user is authenticated and has an active subscription. Here's how we implement these checks:
export async function POST(request: Request) {
// Initialize Supabase client
const supabase = createClient();
const {
data: { session },
} = await supabase.auth.getSession();
// Check for valid session
if (!session) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
// Verify subscription status
const subscription = await getUserSubscription();
if (!subscription?.isActive) {
return NextResponse.json({ error: "Subscription required" }, { status: 403 });
}
This code ensures that only authenticated users with active subscriptions can access our PDF processing functionality. If either check fails, we return an appropriate error response.
Handling File Upload & Validation
Once we've verified the user's access, we need to handle and validate the uploaded PDF file. We'll check both the file type and size:
const file = formData.get("pdf");
if (!file || typeof file === "string") {
return NextResponse.json({ error: "No file provided" }, { status: 400 });
}
if (file.type !== "application/pdf") {
return NextResponse.json({ error: "Only PDF files are allowed" }, { status: 400 });
}
if (file instanceof File && file.size > MAX_FILE_SIZE) {
return NextResponse.json({ error: "File size exceeds limit" }, { status: 400 });
}
This validation ensures we're only processing appropriate PDF files and helps prevent potential security issues or resource exhaustion.
Processing the PDF
After validation, we need to temporarily save the file and process it. We use uuid
to generate unique filenames and pdf2json
to extract the text content:
const fileName = uuidv4();
const tempFilePath = `/tmp/${fileName}.pdf`;
const fileBuffer = Buffer.from(await file.arrayBuffer());
await fs.writeFile(tempFilePath, fileBuffer);
const pdfParser = new (PDFParser as any)(null, 1);
const pdfData = await new Promise((resolve, reject) => {
pdfParser.on("pdfParser_dataError", reject);
pdfParser.on("pdfParser_dataReady", () => {
resolve(pdfParser.getRawTextContent());
});
pdfParser.loadPDF(tempFilePath);
});
Notice how we use event listeners to handle both successful parsing and potential errors. This ensures we can properly respond to any issues that might arise during PDF processing.
Extracting Email Addresses
Once we have the raw text content, we can extract email addresses using a regular expression. We also make sure to remove any duplicates:
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}/g;
const matches = (pdfData as string).match(emailRegex) || [];
const uniqueEmails = Array.from(new Set(matches));
await fs.unlink(tempFilePath); // Clean up temp file
return NextResponse.json({ emails: uniqueEmails });
The regex pattern matches standard email formats, and using Set
ensures we don't return duplicate addresses.
Error Handling & Cleanup
It's crucial to clean up temporary files, even if an error occurs during processing. Here's how we handle errors:
try {
// PDF processing code here
} catch (error) {
await fs.unlink(tempFilePath); // Ensure cleanup on error
return NextResponse.json({ error: "Error parsing PDF" }, { status: 500 });
}
This try-catch block ensures we don't leave any temporary files on the server, regardless of whether the processing succeeds or fails.
Implementing the Frontend
While the backend handles the heavy lifting, we need a user-friendly way to upload PDFs. Here's a simple upload component using shadcn-ui:
import { Upload } from "lucide-react"
import { Button } from "@/components/ui/button"
export function UploadButton() {
const handleUpload = async (event: React.ChangeEvent<HTMLInputElement>) => {
const file = event.target.files?.[0];
if (!file) return;
const formData = new FormData();
formData.append("pdf", file);
try {
const response = await fetch("/api/upload-pdf", {
method: "POST",
body: formData,
});
const data = await response.json();
if (data.emails) {
toast.success(`Found ${data.emails.length} email addresses!`);
}
} catch (error) {
toast.error("Error processing PDF");
}
};
return (
<Button variant="outline" size="sm">
<Upload className="mr-2 h-4 w-4" />
Upload PDF
<input
type="file"
accept=".pdf"
className="hidden"
onChange={handleUpload}
/>
</Button>
);
}
Wrapping Up
This solution provides a secure and efficient way to extract emails from PDFs in a Next.js application. By combining Supabase authentication, server-side PDF processing, and proper error handling, we've created a robust system that:
- Only allows authenticated users with active subscriptions to access the functionality
- Safely handles file uploads and processing
- Properly cleans up temporary files
- Provides a smooth user experience
The complete solution is production-ready and can be extended to handle additional use cases, such as processing multiple PDFs simultaneously or extracting different types of data.
I hope you found this guide helpful! If you have any questions or suggestions, feel free to reach out. Happy coding! 🚀