my Q-learning optimization is a little bit weird,
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Cris LaPierre
ha marcado con alerta este/a pregunta
Hi this is my optimization solution using Q-learning to optimize ESS chargning and discharging.
Objection is to reduce cost of total use, to reduce cost I try to charge ESS during the cheapest time and discharge during the most expensive time.
Action is charging and discharging rate and it will be updated to SOC(state of charge), there are some constraints that ESS SOC can not be lower than constraints.
every thing is good but i think updating process is something wrong because after discharging or charging ESS. SOC is not properly updated.
Can yon help me to deal with this problem?
clc; clear;
%% ===== 환경 파라미터 =====
T = 48;
eff_cha = 0.95;
eff_dch = 0.95;
SOC_min = 0.1;
SOC_max = 1;
SOC0 = 0.5;
P_ess_max = 1500; % ESS 최대 충/방전 전력 (kW)
ESS_cap = 3000; % ESS 용량 (kWh)
actions = linspace(-0.5,0.5,41); % 비율 [-0.5~0.5]
numActions = length(actions);
%% ===== 학습 파라미터 =====
alpha = 0.1;
gamma = 0.99; % 여기선 MC 방식이라 직접 사용 안 함
epsilon = 0.5;
epsilon_min = 0.05;
epsilon_decay = 0.995;
numEpisodes = 60000;
%% ===== 상태 공간 (이산화) =====
numSOCs = 101;
numPrices=3
Q = zeros(numSOCs, numPrices, T, numActions);
%% ===== 가격/부하 데이터 =====
price_real = 140.5*ones(1,24);
price_real(1:7) = 87.3;
price_real(22:24) = 87.3;
price_real(8:10) = 109.8;
price_real(12) = 109.8;
price_real(18:21) = 109.8;
price_real = [price_real, price_real]; % 48시간
load_real = table2array(readtable('48_consumption_6.1.xlsx'));
pv_real = table2array(readtable("PV_gen.xlsx"));
load_real = load_real - pv_real; % 순부하(kW)
%% ===== SOC 임계값 =====
p_crt_val = 0.03;
for p =1:24
p_crt(p) = p_crt_val*(24-p);
end
p_crt = [p_crt, p_crt] + 0.03*randn(1,48);
%% ===== 상태 이산화용 정규화 =====
price_norm = price_real / max(price_real);
discretizeState = @(x) min(max(floor(x * numPrices) + 1, 1), numPrices);
%% ===== Monte Carlo 학습 루프 =====
saving_history = NaN(1,numEpisodes); % 완주한 에피소드 절감액만 기록
completion_rate = zeros(1,numEpisodes); % 완주율 기록
for ep = 1:numEpisodes
SOC = SOC0; % 초기 SOC 비율
episode_memory = []; % [SOC_idx, price_idx, time, action_idx]
grid_before_ep = zeros(1,T);
grid_after_ep = zeros(1,T);
done_flag = true; % 완주 여부
for t = 1:T
% 상태
s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];
% ε-greedy 액션
if rand < epsilon
a_idx = randi(numActions);
else
[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));
end
a_kW = actions(a_idx) * P_ess_max;
% SOC 업데이트
if a_kW >= 0
SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;
else
SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;
end
% 하드 제약 위반 시 중단
if SOC_next > SOC_max
SOC_next = SOC_max; a_kW = 0;
elseif SOC_next < SOC_min
SOC_next = SOC_min; a_kW = 0;
elseif SOC_next < p_crt(t)
SOC_next = p_crt(t); a_kW = 0;
end
% 전력 기록
grid_before_ep(t) = load_real(t);
grid_after_ep(t) = load_real(t) + a_kW;
% 상태/행동 기록
episode_memory(end+1,:) = [s_idx, a_idx];
SOC = SOC_next;
end
% 완주한 경우만 Q 업데이트 & 기록
if done_flag && length(episode_memory) == T
cost_before_ep = sum(grid_before_ep .* price_real);
cost_after_ep = sum(grid_after_ep .* price_real);
saving_ep = cost_before_ep - cost_after_ep;
saving_history(ep) = saving_ep;
for step = 1:size(episode_memory,1)
s_idx = episode_memory(step,1:3);
a_idx = episode_memory(step,4);
Q(s_idx(1), s_idx(2), s_idx(3), a_idx) = ...
Q(s_idx(1), s_idx(2), s_idx(3), a_idx) + ...
alpha * (saving_ep - Q(s_idx(1), s_idx(2), s_idx(3), a_idx));
end
end
% ε 감소
if epsilon > epsilon_min
epsilon = epsilon * epsilon_decay;
end
% 완주율 기록
completion_rate(ep) = sum(~isnan(saving_history)) / ep;
if mod(ep,10000) == 0
fprintf("Episode %d: 완주=%d, 절감액=%.2f원, ε=%.3f\n", ...
ep, done_flag, saving_history(ep), epsilon);
end
end
%% ===== 학습 성과 시각화 =====
%% ===== 학습된 정책 시뮬레이션 =====
SOC = SOC0;
SOC_traj = zeros(1,T);
act_traj = zeros(1,T);
grid_power_before = zeros(1,T);
grid_power_after = zeros(1,T);
for t = 1:T
grid_power_before(t) = load_real(t);
s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];
[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));
a_kW = actions(a_idx) * P_ess_max;
if a_kW >= 0
SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;
else
SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;
end
if SOC_next > SOC_max
SOC_next = SOC_max; a_kW = 0;
elseif SOC_next < SOC_min
SOC_next = SOC_min; a_kW = 0;
elseif SOC_next < p_crt(t)
SOC_next = p_crt(t); a_kW = 0;
end
grid_power_after(t) = load_real(t) + a_kW;
SOC_traj(t) = SOC_next;
act_traj(t) = a_kW;
SOC = SOC_next;
end
%% ===== 최종 비용 계산 =====
cost_before = sum(grid_power_before .* price_real);
cost_after = sum(grid_power_after .* price_real);
saving = cost_before - cost_after;
fprintf('최종 ESS 미사용 전 전기비용: %.3f 원\n', cost_before);
fprintf('최종 ESS 사용 후 전기비용: %.3f 원\n', cost_after);
fprintf('최종 절감 금액: %.3f 원 (절감률 %.2f%%)\n', saving, saving/cost_before*100);
%% ===== 시뮬레이션 결과 시각화 =====
figure;
plot(saving_history); title('Learning Curve'); xlabel('Episode'); ylabel('Total Reward'); yticks(-4e5:1e5:9e5); grid on;
figure;
plot(100*SOC_traj,'LineWidth',1); hold on; plot(100*p_crt, 'r','LineWidth',1); title('SOC Trajectory'); ylabel('SOC(%)');ylim([-5 105]);legend('SOC','Critical Load'); grid on;
figure;
stairs(act_traj, '-x'); title('Action Trajectory (kW)'); grid on;
figure;
stairs(price_real); title('Price'); xlabel('Time'); ylabel('Price');
0 comentarios
Respuestas (0)
Ver también
Categorías
Más información sobre FPGA, ASIC, and SoC Development en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!