my Q-learning optimization is a little bit weird,

2 visualizaciones (últimos 30 días)
찬목
찬목 el 25 de Ag. de 2025
Editada: Cris LaPierre el 25 de Ag. de 2025
Cris LaPierre ha marcado con alerta este/a pregunta
Hi this is my optimization solution using Q-learning to optimize ESS chargning and discharging.
Objection is to reduce cost of total use, to reduce cost I try to charge ESS during the cheapest time and discharge during the most expensive time.
Action is charging and discharging rate and it will be updated to SOC(state of charge), there are some constraints that ESS SOC can not be lower than constraints.
every thing is good but i think updating process is something wrong because after discharging or charging ESS. SOC is not properly updated.
Can yon help me to deal with this problem?
clc; clear;
%% ===== 환경 파라미터 =====
T = 48;
eff_cha = 0.95;
eff_dch = 0.95;
SOC_min = 0.1;
SOC_max = 1;
SOC0 = 0.5;
P_ess_max = 1500; % ESS 최대 충/방전 전력 (kW)
ESS_cap = 3000; % ESS 용량 (kWh)
actions = linspace(-0.5,0.5,41); % 비율 [-0.5~0.5]
numActions = length(actions);
%% ===== 학습 파라미터 =====
alpha = 0.1;
gamma = 0.99; % 여기선 MC 방식이라 직접 사용 안 함
epsilon = 0.5;
epsilon_min = 0.05;
epsilon_decay = 0.995;
numEpisodes = 60000;
%% ===== 상태 공간 (이산화) =====
numSOCs = 101;
numPrices=3
numPrices = 3
Q = zeros(numSOCs, numPrices, T, numActions);
%% ===== 가격/부하 데이터 =====
price_real = 140.5*ones(1,24);
price_real(1:7) = 87.3;
price_real(22:24) = 87.3;
price_real(8:10) = 109.8;
price_real(12) = 109.8;
price_real(18:21) = 109.8;
price_real = [price_real, price_real]; % 48시간
load_real = table2array(readtable('48_consumption_6.1.xlsx'));
pv_real = table2array(readtable("PV_gen.xlsx"));
load_real = load_real - pv_real; % 순부하(kW)
%% ===== SOC 임계값 =====
p_crt_val = 0.03;
for p =1:24
p_crt(p) = p_crt_val*(24-p);
end
p_crt = [p_crt, p_crt] + 0.03*randn(1,48);
%% ===== 상태 이산화용 정규화 =====
price_norm = price_real / max(price_real);
discretizeState = @(x) min(max(floor(x * numPrices) + 1, 1), numPrices);
%% ===== Monte Carlo 학습 루프 =====
saving_history = NaN(1,numEpisodes); % 완주한 에피소드 절감액만 기록
completion_rate = zeros(1,numEpisodes); % 완주율 기록
for ep = 1:numEpisodes
SOC = SOC0; % 초기 SOC 비율
episode_memory = []; % [SOC_idx, price_idx, time, action_idx]
grid_before_ep = zeros(1,T);
grid_after_ep = zeros(1,T);
done_flag = true; % 완주 여부
for t = 1:T
% 상태
s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];
% ε-greedy 액션
if rand < epsilon
a_idx = randi(numActions);
else
[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));
end
a_kW = actions(a_idx) * P_ess_max;
% SOC 업데이트
if a_kW >= 0
SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;
else
SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;
end
% 하드 제약 위반 시 중단
if SOC_next > SOC_max
SOC_next = SOC_max; a_kW = 0;
elseif SOC_next < SOC_min
SOC_next = SOC_min; a_kW = 0;
elseif SOC_next < p_crt(t)
SOC_next = p_crt(t); a_kW = 0;
end
% 전력 기록
grid_before_ep(t) = load_real(t);
grid_after_ep(t) = load_real(t) + a_kW;
% 상태/행동 기록
episode_memory(end+1,:) = [s_idx, a_idx];
SOC = SOC_next;
end
% 완주한 경우만 Q 업데이트 & 기록
if done_flag && length(episode_memory) == T
cost_before_ep = sum(grid_before_ep .* price_real);
cost_after_ep = sum(grid_after_ep .* price_real);
saving_ep = cost_before_ep - cost_after_ep;
saving_history(ep) = saving_ep;
for step = 1:size(episode_memory,1)
s_idx = episode_memory(step,1:3);
a_idx = episode_memory(step,4);
Q(s_idx(1), s_idx(2), s_idx(3), a_idx) = ...
Q(s_idx(1), s_idx(2), s_idx(3), a_idx) + ...
alpha * (saving_ep - Q(s_idx(1), s_idx(2), s_idx(3), a_idx));
end
end
% ε 감소
if epsilon > epsilon_min
epsilon = epsilon * epsilon_decay;
end
% 완주율 기록
completion_rate(ep) = sum(~isnan(saving_history)) / ep;
if mod(ep,10000) == 0
fprintf("Episode %d: 완주=%d, 절감액=%.2f원, ε=%.3f\n", ...
ep, done_flag, saving_history(ep), epsilon);
end
end
Episode 10000: 완주=1, 절감액=425253.75원, ε=0.050 Episode 20000: 완주=1, 절감액=582866.25원, ε=0.050 Episode 30000: 완주=1, 절감액=670481.25원, ε=0.050 Episode 40000: 완주=1, 절감액=595275.00원, ε=0.050 Episode 50000: 완주=1, 절감액=678060.00원, ε=0.050 Episode 60000: 완주=1, 절감액=503115.00원, ε=0.050
%% ===== 학습 성과 시각화 =====
%% ===== 학습된 정책 시뮬레이션 =====
SOC = SOC0;
SOC_traj = zeros(1,T);
act_traj = zeros(1,T);
grid_power_before = zeros(1,T);
grid_power_after = zeros(1,T);
for t = 1:T
grid_power_before(t) = load_real(t);
s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];
[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));
a_kW = actions(a_idx) * P_ess_max;
if a_kW >= 0
SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;
else
SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;
end
if SOC_next > SOC_max
SOC_next = SOC_max; a_kW = 0;
elseif SOC_next < SOC_min
SOC_next = SOC_min; a_kW = 0;
elseif SOC_next < p_crt(t)
SOC_next = p_crt(t); a_kW = 0;
end
grid_power_after(t) = load_real(t) + a_kW;
SOC_traj(t) = SOC_next;
act_traj(t) = a_kW;
SOC = SOC_next;
end
%% ===== 최종 비용 계산 =====
cost_before = sum(grid_power_before .* price_real);
cost_after = sum(grid_power_after .* price_real);
saving = cost_before - cost_after;
fprintf('최종 ESS 미사용 전 전기비용: %.3f 원\n', cost_before);
최종 ESS 미사용 전 전기비용: 6278802.431 원
fprintf('최종 ESS 사용 후 전기비용: %.3f 원\n', cost_after);
최종 ESS 사용 후 전기비용: 5651716.181 원
fprintf('최종 절감 금액: %.3f 원 (절감률 %.2f%%)\n', saving, saving/cost_before*100);
최종 절감 금액: 627086.250 원 (절감률 9.99%)
%% ===== 시뮬레이션 결과 시각화 =====
figure;
plot(saving_history); title('Learning Curve'); xlabel('Episode'); ylabel('Total Reward'); yticks(-4e5:1e5:9e5); grid on;
figure;
plot(100*SOC_traj,'LineWidth',1); hold on; plot(100*p_crt, 'r','LineWidth',1); title('SOC Trajectory'); ylabel('SOC(%)');ylim([-5 105]);legend('SOC','Critical Load'); grid on;
figure;
stairs(act_traj, '-x'); title('Action Trajectory (kW)'); grid on;
figure;
stairs(price_real); title('Price'); xlabel('Time'); ylabel('Price');

Respuestas (0)

Categorías

Más información sobre FPGA, ASIC, and SoC Development en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by