Term project materials from CEE 5-7303 Applied AI for Engineering and Science
The Space team has made the following datasets and collections publicly available. You must be a logged-in member of the Space to access all the datasets and collections.
Viewing most recent datasets View All Datasets
This dataset was derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, Table B25134 (2024). It contains detailed information on household water and sewer costs for Texas cities and counties, including the number of occupied housing units reporting charges within specific monthly cost ranges (<$125, $125–$249, $250–$499, $500–$749, $750–$999, and $1,000+). Each estimate is paired with a corresponding margin of error to indicate statistical reliability.
I am using this dataset to quantify the environmental and operational cost dimension of my Datacenter Location Planning (Texas) project. Water availability and cost are critical factors influencing datacenter cooling efficiency, sustainability, and long-term feasibility. By integrating this dataset with energy, broadband, and population data, my machine learning model can assess regional water-cost exposure and identify areas where datacenter operations would be both cost-effective and environmentally sustainable.
This data was obtained from the Texas Comptroller of Public Accounts and lists businesses participating in the State Sales and Use Tax Exemption Program for Qualifying Datacenters. It includes each facility’s name, effective date of exemption, owner, operator, and occupant registration details, as well as the exemption end date.
My original intention was to use this dataset to represent the economic and policy incentive dimension of my Datacenter Location Planning (Texas) project. The exemption program provides valuable context for understanding where fiscal incentives have encouraged datacenter development across Texas, and it can reveal how state-level tax policy aligns with broader infrastructure and environmental factors.
However, I ultimately did not incorporate this dataset into my modeling workflow. The exemption records are facility-specific rather than city-level, and they do not contain standardized geographic identifiers that would allow reliable merging with the rest of my integrated dataset. Additionally, the program applies only to a small subset of large, established datacenters, which introduces a strong selection bias and does not reflect the broader siting conditions faced by most Texas municipalities. For these reasons, the dataset serves better as a contextual reference rather than as a quantitative input to the model.
This dataset was obtained from the Federal Communications Commission (FCC) Broadband Data Collection (BDC) program, specifically the Fixed Broadband Summary by Geography (Place) release for October 2025. It summarizes broadband service availability across Texas cities and towns (FIPS 48) and includes metrics such as total broadband-serviceable locations, technology types (fiber, cable, DSL, fixed wireless), and the number of connections meeting speed tiers of 100/20 Mbps, 250/25 Mbps, and 1000/100 Mbps.
I am using this dataset to measure connectivity strength and fiber infrastructure readiness across different regions of Texas. These attributes serve as key input features in my Datacenter Location Planning (Texas) machine learning model, which integrates geospatial, environmental, and economic data to identify optimal datacenter development sites. By analyzing regional broadband capacity and technology distribution, this dataset helps evaluate network reliability, redundancy, and digital infrastructure maturity
Source: https://broadbandmap.fcc.gov/data-download/nationwide-data
This dataset contains info from the NCTCOG Transit Accessibility Tool and the 2024 Bond Infrastructure Priority Grid.
This dataset points are all City of Dallas located.
The attributes in this set of data are:
% Population 65 and over
% Population 14 and under
% Population below poverty
% Population with disability
% Veteran population
% No car population
% Total minority population
% Hispanic population
% Black population
% American Indian population
% Asian population
% Hawaiian Pacific Islander population
% Other population
% 2 or more races population
% Total limited English proficiency population
% Spanish English proficiency population
% IE English proficiency population
% Asian English proficiency population
% Other English proficiency population
Total Overlay Score (bond allocation priority score - up to 16)
This is the PDF of the final project memo for the in-person grad teams project on healthcare infrastructure in Dallas, TX.
This Data set includes all excel (.csv) files that the in-person grad team used to investigate existing healthcare infrastructure in Dallas, TX.
Attributes: Clinics (name and location), Hospitals (name and location), Urgent Cares (name and location), census block data (population, size, and location)
Credit: the Clinics, Hospitals, and Urgent Cares are from John's database; the Census data is from the US Census
Note: the flat files for Clowder are called Copy of Spatialjoin, with Spatialjoin being an earlier version
This file is our group's final presentation.
School Enrollment from Three different boundaries (Victory Meadows, Bishop Arts, and Deep Ellum) From 2010-2020.
There are no public collections associated with this Space.
The following datasets have been published through this Space and any affiliated Spaces.
| Collections | 11 |
| Datasets | 30 |
| Files: | 60 |
| Bytes: | 2.6 GB |
| Users: | 27 |
No External Links
PRIVATE
You are not authorized to access this Space